werc.org
robots.txt

Robots Exclusion Standard data for werc.org

Resource Scan

Scan Details

Site Domain werc.org
Base Domain werc.org
Scan Status Ok
Last Scan2025-07-25T02:43:09+00:00
Next Scan 2025-08-24T02:43:09+00:00

Last Scan

Scanned2025-07-25T02:43:09+00:00
URL https://werc.org/robots.txt
Domain IPs 35.169.50.49
Response IP 35.169.50.49
Found Yes
Hash 9f437d5e406353f31e07ddff28d0c9f299c8d19ece61520636e664c0d0ba2952
SimHash ec941d42c3d9

Groups

*

Rule Path
Disallow /global_inc/
Allow /global_inc/*.css
Allow /global_inc/*.js

*

Rule Path
Disallow /global_engine/ajax/

Other Records

Field Value
sitemap https://werc.org/autositemapindex.xml

Comments

  • When crawlers hit the engine dir they sometimes publish confusing links to site content
  • in their search results so we exclude these specific engines from crawling it.
  • Note: Certain crawlers do need access to this directory so we do not want a blanket
  • exlude statment here.

Warnings

  • 18 invalid lines.