arsenalnews.net
robots.txt

Robots Exclusion Standard data for arsenalnews.net

Resource Scan

Scan Details

Site Domain arsenalnews.net
Base Domain arsenalnews.net
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a server error.
Last Scan2024-08-23T06:05:48+00:00
Next Scan 2024-11-21T06:05:48+00:00

Last Successful Scan

Scanned2024-01-27T05:27:00+00:00
URL https://arsenalnews.net/robots.txt
Domain IPs 104.21.44.169, 172.67.201.151, 2606:4700:3032::ac43:c997, 2606:4700:3036::6815:2ca9
Response IP 172.67.201.151
Found Yes
Hash 7014bd55936923ad0b699cda87a67fb3c1b643ee39f6b5f18dff2fd4be80237a
SimHash ee97d336c19f

Groups

mediapartners-google

Rule Path
Disallow

*

Rule Path
Disallow /account/login/
Disallow /account/join/
Disallow /account/
Disallow /news/search
Disallow /news/search?*
Disallow /kick/
Disallow /link/
Allow /story/
Allow /news/
Allow /news?page*
Allow /news/latest
Allow /news/latest?page=*
Allow /news/popular
Allow /news/popular?page=*
Allow /sitemapxml?date=*
Allow /

slurp

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 2

spinn3r

Rule Path
Disallow /

yahoo pipes 1.0

Rule Path
Disallow /

moget
ichiro

Rule Path
Disallow /

naverbot
yeti

Rule Path
Disallow /

baiduspider
baiduspider-video
baiduspider-image

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 3

sogou spider

Rule Path
Disallow /

youdaobot

Rule Path
Disallow /

kscrawler

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

Other Records

Field Value
sitemap /sitemapxml

Comments

  • note that manual allows override the broader disallows specified earlier
  • for "/*?", refer to http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
  • this disallows indexing of any URL with a querystring
  • beware, the sections below WILL NOT INHERIT from the above!
  • http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
  • disallow adsense bot, as we no longer do adsense.
  • User-agent: Mediapartners-Google
  • Disallow: /
  • Yahoo bot is evil.
  • Disallow: /
  • Spinn3r is also evil.
  • Yahoo Pipes is for feeds not web pages.
  • Asian search engines we don't need to be indexed by
  • User-agent: Yandex
  • Disallow: /
  • Disallow: /
  • KSCrawler - we don't need help from you
  • MJ12bot - we don't need help from you
  • this technically isn't valid, since for some godforsaken reason
  • sitemap paths must be ABSOLUTE and not relative.