clearchain.com
robots.txt

Robots Exclusion Standard data for clearchain.com

Resource Scan

Scan Details

Site Domain clearchain.com
Base Domain clearchain.com
Scan Status Ok
Last Scan2026-01-10T14:16:44+00:00
Next Scan 2026-01-17T14:16:44+00:00

Last Scan

Scanned2026-01-10T14:16:44+00:00
URL https://clearchain.com/robots.txt
Domain IPs 104.21.25.249, 172.67.134.242
Response IP 104.21.25.249
Found Yes
Hash 76d4231390781332b21b80cc2b1d3d13473deb6e67df5c3c35e4dbc1c23341a3
SimHash 7a723704ec57

Groups

*

Rule Path
Disallow /mailman/
Disallow /pipermail/
Disallow /~benjsc/temp

Other Records

Field Value
crawl-delay 0.5

wget

Rule Path
Disallow /

*

Rule Path
Disallow /blog/wp-admin/

Comments

  • $ClearChain: www/data/robots.txt,v 1.2 2004/02/06 02:30:47 benjsc Exp $
  • This file aids in providing web crawling software with restrictions on
  • the content they should index
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • Wiki Requests
  • Don't index non article wiki pages