jigsaw.w3.org
robots.txt

Robots Exclusion Standard data for jigsaw.w3.org

Resource Scan

Scan Details

Site Domain jigsaw.w3.org
Base Domain w3.org
Scan Status Ok
Last Scan2025-08-22T08:43:14+00:00
Next Scan 2025-09-21T08:43:14+00:00

Last Scan

Scanned2025-08-22T08:43:14+00:00
URL https://jigsaw.w3.org/robots.txt
Domain IPs 104.18.22.19, 104.18.23.19, 2606:4700::6812:1613, 2606:4700::6812:1713
Response IP 104.18.22.19
Found Yes
Hash b70526f634282834da1405e87c02ab28376263e333c7d8abdab13aa3c7a768bf
SimHash 7c1921d6abdb

Groups

*

Rule Path
Disallow /guest-demos/
Disallow /status/
Disallow /demos/
Disallow /HyperNews/
Disallow /cgi-bin/
Disallow /css-validator/docs/
Disallow /Friends/
Disallow /api/
Disallow /Benoit/Public/DVDDB/
Disallow /css-validator/validator
Disallow /css-validator/check

Comments

  • sample robots.txt file for Jigsaw
  • stupid bots! a 404 is a 404!... I have to disallow non
  • existent directories :(