awg.org
robots.txt

Robots Exclusion Standard data for awg.org

Resource Scan

Scan Details

Site Domain awg.org
Base Domain awg.org
Scan Status Ok
Last Scan2025-10-03T16:14:50+00:00
Next Scan 2025-11-02T16:14:50+00:00

Last Scan

Scanned2025-10-03T16:14:50+00:00
URL https://awg.org/robots.txt
Redirect https://www.awg.org/robots.aspx
Redirect Domain www.awg.org
Redirect Base awg.org
Domain IPs 162.159.141.168, 172.66.1.164
Redirect IPs 162.159.141.168, 172.66.1.164
Response IP 162.159.141.168
Found Yes
Hash fb9701c831165af5a8b23ec4d9baecb708f9f48af7a01b007cd53f627119c125
SimHash ec941d42c1d8

Groups

*

Rule Path
Disallow /global_inc/
Allow /global_inc/*.css
Allow /global_inc/*.js

*

Rule Path
Disallow /global_engine/ajax/

Other Records

Field Value
sitemap https://www.awg.org/autositemapindex.xml

Comments

  • When crawlers hit the engine dir they sometimes publish confusing links to site content
  • in their search results so we exclude these specific engines from crawling it.
  • Note: Certain crawlers do need access to this directory so we do not want a blanket
  • exlude statment here.

Warnings

  • 18 invalid lines.