resourcesnew.cleanclothes.org
robots.txt

Robots Exclusion Standard data for resourcesnew.cleanclothes.org

Resource Scan

Scan Details

Site Domain resourcesnew.cleanclothes.org
Base Domain cleanclothes.org
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't connect to server.
Last Scan2025-12-06T00:44:19+00:00
Next Scan 2026-03-06T00:44:19+00:00

Last Successful Scan

Scanned2024-02-07T22:04:14+00:00
URL https://resourcesnew.cleanclothes.org/robots.txt
Domain IPs 85.17.140.66
Response IP 85.17.140.66
Found Yes
Hash a75b90ae8f66ca4dd54872e199a592bd15f66ed8f486d1dfb6b3c7a8d5fc245f
SimHash ca1a495f0bd0

Groups

*

Rule Path
Disallow /filestore

Other Records

Field Value
crawl-delay 10

Comments

  • Sample robots.txt file - ensures that a Google Appliance can still access the spider page (if configured)
  • and assumes an installation in the site root. For sites in a subfolder you must move the robots.txt file
  • to the site root and alter the paths accordingly.