python.org
robots.txt

Robots Exclusion Standard data for python.org

Resource Scan

Scan Details

Site Domain python.org
Base Domain python.org
Scan Status Ok
Last Scan2024-09-19T20:24:44+00:00
Next Scan 2024-10-19T20:24:44+00:00

Last Scan

Scanned2024-09-19T20:24:44+00:00
URL https://python.org/robots.txt
Redirect https://www.python.org/robots.txt
Redirect Domain www.python.org
Redirect Base python.org
Domain IPs 151.101.0.223, 151.101.128.223, 151.101.192.223, 151.101.64.223, 2a04:4e42:200::223, 2a04:4e42:400::223, 2a04:4e42:600::223, 2a04:4e42::223
Redirect IPs 199.232.44.223, 2a04:4e42:48::223
Response IP 199.232.44.223
Found Yes
Hash 18cb4cd525df8528491845e76f3af26c29c6795d02ea8133974d3b341a2ddd9f
SimHash aa159b4a8570

Groups

httrack
puf
msiecrawler

Rule Path
Disallow /

krugle

Rule Path
Allow /
Disallow /~guido/orlijn/
Disallow /webstats/

nutch

Rule Path
Disallow /

*

Rule Path
Disallow /~guido/orlijn/
Disallow /webstats/

Comments

  • Directions for robots. See this URL:
  • http://www.robotstxt.org/robotstxt.html
  • for a description of the file format.
  • The Krugle web crawler (though based on Nutch) is OK.
  • No one should be crawling us with Nutch.
  • Hide old versions of the documentation and various large sets of files.