clagrills.com
robots.txt

Robots Exclusion Standard data for clagrills.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	clagrills.com
Base Domain	clagrills.com
Scan Status	Ok
Last Scan	2024-09-08T10:32:29+00:00
Next Scan	2024-10-08T10:32:29+00:00

Last Scan

Scanned	2024-09-08T10:32:29+00:00
URL	https://clagrills.com/robots.txt
Domain IPs	66.117.4.4
Response IP	66.117.4.4
Found	Yes
Hash	2c971b37b363abfd9e01452b18cc49dd3877c28ddd0bd79f0519b5f01a437888
SimHash	3a95db13f34c

Groups

gigabot
ia_archiver-web.archive.org
ia_archiver
yandex
yandexbot
moget
ichiro
naverbot
yeti
baiduspider
baiduspider-video
baiduspider-image
sogou spider
youdaobot
yodaobot
ahrefsbot
sistrix
seokicks-robot
seokicks
mj12bot
searchmetricsbot
netseer
semrushbot
discoverybot
backlinkcrawler
ralocobot
yandeximages
a6-indexer
coccoc
apache-httpclient
curious george
webmastercoffee
spbot
whelanlabs
research-scanner
runet-research-crawler
corporatenewssearchengine
spiderling
w3clinemode
netresearchserver
surveybot
gimme60bot
curious george
analyticsseo
genieo
crazywebcrawler
findxbot
domainsigmacrawler
aihitbot
changedetect
changedetection
infominder
sogou
sogou web spider
toweyabot
domainappender
megaindex
deusu
grapeshotcrawler
wotbox
domain re-animator bot
domain re-animator
qwantify
istellabot

Product	Comment
gigabot	Gigabot is the name of Gigablast's robot
yandex	Russian search engine
coccoc	2-2015 Vietnamese browser
apache-httpclient	2-2015
curious george	2-2015
webmastercoffee	2-2015
spbot	2-2015
whelanlabs	2-2015
research-scanner	2-2015
runet-research-crawler	2-2015
corporatenewssearchengine	2-2015
spiderling	2-2015
w3clinemode	2-2015 HttpClient?
netresearchserver	2-2015
surveybot	2-2015

Product

Comment

gigabot

Gigabot is the name of Gigablast's robot

yandex

Russian search engine

coccoc

2-2015 Vietnamese browser

apache-httpclient

2-2015

curious george

2-2015

webmastercoffee

2-2015

spbot

2-2015

whelanlabs

2-2015

research-scanner

2-2015

runet-research-crawler

2-2015

corporatenewssearchengine

2-2015

spiderling

2-2015

w3clinemode

2-2015 HttpClient?

netresearchserver

2-2015

surveybot

2-2015

Rule	Path
Disallow	/

Rule

Path

Disallow

/

*

Product	Comment
*	Everybody else

Rule	Path	Comment
Disallow	/part-xref	MCM/MHP cross reference
Disallow	/stayout	Duh
Disallow	/pinnacle	Nothing much here
Allow	/	-

Rule

Path

Comment

Disallow

/part-xref

MCM/MHP cross reference

Disallow

/stayout

Duh

Disallow

/pinnacle

Nothing much here

Allow

/

-

Back to top

Comments

Robots.txt file
12-2014 Change philosophy. Block known bad guys. For everyone else, block image directories. Most of the bad guys
simply ignore robots.txt anyway.
12-2014 I'll block the bad guys like AmazonAws, Hackers, TopHosts and spammers in our firewall.
June 2012 Setup as a common robots.txt for all of my sites. Obviously, some of the directories don't exist
on all sites.
From Google at: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
Only one group of group-member records is valid for a particular crawler. The crawler must determine the correct group of
records by finding the group with the most specific user-agent that still matches. All other groups of records are
ignored by the crawler. The user-agent is non-case-sensitive. All non-matching text is ignored (for example, both
googlebot/1.2 and googlebot* are equivalent to googlebot). The order of the groups within the robots.txt file is irrelevant.
The start-of-group element user-agent is used to specify for which crawler the group is valid. Only one group of records is valid for a particular crawler.
Name the specific bot we don't want, they'll probably ignore this
6-2016 User-agent: msnbot-media # Don't steal our images
6-2016 User-agent: Googlebot-Image
6-2016 User-agent: yahoo-MMCrawler # Don't steal our images
6-2016 User-agent: yahoo-MMCrawler/3.x # Don't steal our images

Back to top

Warnings

1 invalid line.

Back to top

clagrills.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Comments

Warnings

clagrills.com
robots.txt