illustrationsource.com
robots.txt

Robots Exclusion Standard data for illustrationsource.com

Resource Scan

Scan Details

Site Domain illustrationsource.com
Base Domain illustrationsource.com
Scan Status Ok
Last Scan2024-06-17T07:16:19+00:00
Next Scan 2024-07-17T07:16:19+00:00

Last Scan

Scanned2024-06-17T07:16:19+00:00
URL https://www.illustrationsource.com/robots.txt
Domain IPs 66.175.215.172
Response IP 66.175.215.172
Found Yes
Hash 871ce837a3a65a9af13abc02ef0e0623c8004114d2a78f72e3a2e4655fae5a9d
SimHash ae3a5159ccf7

Groups

bitlybot

Rule Path
Disallow /

mediapartners-google

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

heritrix/3.3.0-snapshot-20140926-2021

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

ntentbot

Rule Path
Disallow /

gigablastopensource

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

mozilla/5.0 (compatible; genieo/x.x http://www.genieo.com/webfilter.html)

Rule Path
Disallow /

mozilla/5.0 (tweetmemebot/4.0; +http://datasift.com/bot.html) gecko/20100101 firefox/31.0

Rule Path
Disallow /

crawler4j

Rule Path
Disallow /

metauri

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

*

Rule Path
Disallow /media/
Disallow /login/
Disallow /account/
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

googlebot

Rule Path
Disallow /media/
Disallow /login/
Disallow /account/
Disallow */checkout/*

msnbot

Rule Path
Disallow /media
Disallow /login
Disallow /account
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

msnbot-newsblogs/1.1 (+http://search.msn.com/msnbot.htm)

Rule Path
Disallow /media
Disallow /login
Disallow /account
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

msnbot/2.0b (+http://search.msn.com/msnbot.htm)

Rule Path
Disallow /media
Disallow /login
Disallow /account
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

slurp

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

mozilla/5.0 (compatible; yahoo! slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

mozilla/5.0 (compatible; spbot/2.0.2; +http://www.seoprofiler.com/bot/ )

Rule Path
Disallow /

baiduspider

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

yandex

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

Comments

  • robots.txt
  • Tell "bitlybot" not to come here at all
  • From NYT.com - nobody seems to like this bot
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • Friendly, low-speed bots are welcome viewing pages.
  • GoogleBot
  • MSN Bot listens to Crawl-Delay
  • Yahoo/Inktomi listens to Crawl-Delay
  • Baiduspider
  • Yandex

Warnings

  • 2 invalid lines.
  • `host-load` is not a known field.