/.well-known/

Log In Sign Up

resourcesnew.cleanclothes.org
robots.txt

Robots Exclusion Standard data for resourcesnew.cleanclothes.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	resourcesnew.cleanclothes.org
Base Domain	cleanclothes.org
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Couldn't connect to server.
Last Scan	2025-12-06T00:44:19+00:00
Next Scan	2026-03-06T00:44:19+00:00

Last Successful Scan

Scanned	2024-02-07T22:04:14+00:00
URL	https://resourcesnew.cleanclothes.org/robots.txt
Domain IPs	85.17.140.66
Response IP	85.17.140.66
Found	Yes
Hash	a75b90ae8f66ca4dd54872e199a592bd15f66ed8f486d1dfb6b3c7a8d5fc245f
SimHash	ca1a495f0bd0

Groups

*

Rule

Path

Disallow

/filestore

Other Records

Field

Value

crawl-delay

10

Back to top

Comments

Sample robots.txt file - ensures that a Google Appliance can still access the spider page (if configured)
and assumes an installation in the site root. For sites in a subfolder you must move the robots.txt file
to the site root and alter the paths accordingly.

Back to top