foldoc.org
robots.txt

Robots Exclusion Standard data for foldoc.org

Resource Scan

Scan Details

Site Domain foldoc.org
Base Domain foldoc.org
Scan Status Ok
Last Scan2025-11-06T23:30:30+00:00
Next Scan 2025-11-13T23:30:30+00:00

Last Scan

Scanned2025-11-06T23:30:30+00:00
URL https://foldoc.org/robots.txt
Domain IPs 104.21.9.104, 172.67.159.189, 2606:4700:3030::ac43:9fbd, 2606:4700:3032::6815:968
Response IP 172.67.159.189
Found Yes
Hash 1ace32ff04625011f4e1145e415f5f1521d07433f53180ea3b6430892217ef95
SimHash 5046c0758191

Groups

mediapartners-google

Rule Path
Disallow

*

Rule Path
Disallow /Dictionary$
Disallow /foldoc.tar.gz
Disallow /index.cgi
Disallow /junk_searches
Disallow /keys
Disallow /logs
Disallow /offsets
Disallow /showenv.cgi
Disallow /template.html

Other Records

Field Value
sitemap http://foldoc.org/sitemap.txt

Comments

  • http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1
  • http://www.dcs.ed.ac.uk/cgi/sxw/parserobots.pl?site=foldoc.org
  • http://www.kollar.com/robots.html
  • Absolute paths