501commons.org
robots.txt

Robots Exclusion Standard data for 501commons.org

Resource Scan

Scan Details

Site Domain 501commons.org
Base Domain 501commons.org
Scan Status Failed
Failure StageFetching resource.
Failure ReasonCouldn't establish SSL connection.
Last Scan2025-04-09T13:42:02+00:00
Next Scan 2025-07-08T13:42:02+00:00

Last Successful Scan

Scanned2022-08-20T15:12:05+00:00
URL https://501commons.org/robots.txt
Redirect https://www.501commons.org/robots.txt
Redirect Domain www.501commons.org
Redirect Base 501commons.org
Response IP 104.131.130.40
Found Yes
Hash e76a6f0ac5ef8933685f826616b797efa990f5d76b0afa960dc39133781e6946
SimHash ad510b554d65

Groups

*

Rule Path
Disallow

googlebot

Rule Path
Disallow /*sendto_form$
Disallow /*folder_factories$

Other Records

Field Value
sitemap /sitemap.xml.gz

Comments

  • Define access-restrictions for robots/spiders
  • http://www.robotstxt.org/wc/norobots.html
  • By default we allow robots to access all areas of our site
  • already accessible to anonymous users
  • Add Googlebot-specific syntax extension to exclude forms
  • that are repeated for each piece of content in the site
  • the wildcard is only supported by Googlebot
  • http://www.google.com/support/webmasters/bin/answer.py?answer=40367&ctx=sibling