/.well-known/

Log In Sign Up

501commons.org
robots.txt

Robots Exclusion Standard data for 501commons.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	501commons.org
Base Domain	501commons.org
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Couldn't establish SSL connection.
Last Scan	2025-04-09T13:42:02+00:00
Next Scan	2025-07-08T13:42:02+00:00

Last Successful Scan

Scanned	2022-08-20T15:12:05+00:00
URL	https://501commons.org/robots.txt
Redirect	https://www.501commons.org/robots.txt
Redirect Domain	www.501commons.org
Redirect Base	501commons.org
Response IP	104.131.130.40
Found	Yes
Hash	e76a6f0ac5ef8933685f826616b797efa990f5d76b0afa960dc39133781e6946
SimHash	ad510b554d65

Groups

*

Rule

Path

Disallow

googlebot

Rule

Path

Disallow

/*sendto_form$

Disallow

/*folder_factories$

Back to top

Other Records

Field

Value

sitemap

/sitemap.xml.gz

Back to top

Comments

Define access-restrictions for robots/spiders
http://www.robotstxt.org/wc/norobots.html
By default we allow robots to access all areas of our site
already accessible to anonymous users
Add Googlebot-specific syntax extension to exclude forms
that are repeated for each piece of content in the site
the wildcard is only supported by Googlebot
http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling

Back to top