ai-ap.com
robots.txt

Robots Exclusion Standard data for ai-ap.com

Resource Scan

Scan Details

Site Domain ai-ap.com
Base Domain ai-ap.com
Scan Status Ok
Last Scan2024-06-01T02:03:42+00:00
Next Scan 2024-07-01T02:03:42+00:00

Last Scan

Scanned2024-06-01T02:03:42+00:00
URL https://www.ai-ap.com/robots.txt
Domain IPs 198.74.56.114
Response IP 198.74.56.114
Found Yes
Hash c28e5365790f6a76bc01096c92e48b37d81376c3480dd0e8f48b5243059bd88e
SimHash 2e3a5151ccf5

Groups

bitlybot

Rule Path
Disallow /

mediapartners-google

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

heritrix/3.3.0-snapshot-20140926-2021

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

ntentbot

Rule Path
Disallow /

gigablastopensource

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

mozilla/5.0 (compatible; genieo/x.x http://www.genieo.com/webfilter.html)

Rule Path
Disallow /

mozilla/5.0 (tweetmemebot/4.0; +http://datasift.com/bot.html) gecko/20100101 firefox/31.0

Rule Path
Disallow /

crawler4j

Rule Path
Disallow /

metauri

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

*

Rule Path
Disallow /media/
Disallow /publications/search/
Disallow /publications/advanced-search
Disallow /basie/
Disallow /login/
Disallow /publications/article/*/email/
Disallow /account/
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

googlebot

Rule Path
Disallow /media/
Disallow /publications/search/
Disallow /publications/advanced-search
Disallow /basie/
Disallow /login/
Disallow /publications/article/*/email/
Disallow /account/
Disallow */checkout/*

msnbot

Rule Path
Disallow /media
Disallow /publications/search
Disallow /publications/advanced-search
Disallow /basie
Disallow /login
Disallow /account
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

msnbot-newsblogs/1.1 (+http://search.msn.com/msnbot.htm)

Rule Path
Disallow /media
Disallow /publications/search
Disallow /publications/advanced-search
Disallow /basie
Disallow /login
Disallow /account
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

msnbot/2.0b (+http://search.msn.com/msnbot.htm)

Rule Path
Disallow /media
Disallow /publications/search
Disallow /publications/advanced-search/
Disallow /basie
Disallow /login
Disallow /account
Disallow */checkout/*

Other Records

Field Value
crawl-delay 5

slurp

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

mozilla/5.0 (compatible; yahoo! slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

mozilla/5.0 (compatible; spbot/2.0.2; +http://www.seoprofiler.com/bot/ )

Rule Path
Disallow /

baiduspider

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

yandex

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

Comments

  • robots.txt
  • Tell "bitlybot" not to come here at all
  • From NYT.com - nobody seems to like this bot
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • Friendly, low-speed bots are welcome viewing pages.
  • GoogleBot
  • MSN Bot listens to Crawl-Delay
  • Yahoo/Inktomi listens to Crawl-Delay
  • Baiduspider
  • Yandex

Warnings

  • 2 invalid lines.
  • `host-load` is not a known field.