awg.org
robots.txt

Robots Exclusion Standard data for awg.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	awg.org
Base Domain	awg.org
Scan Status	Ok
Last Scan	2025-10-03T16:14:50+00:00
Next Scan	2025-11-02T16:14:50+00:00

Last Scan

Scanned	2025-10-03T16:14:50+00:00
URL	https://awg.org/robots.txt
Redirect	https://www.awg.org/robots.aspx
Redirect Domain	www.awg.org
Redirect Base	awg.org
Domain IPs	162.159.141.168, 172.66.1.164
Redirect IPs	162.159.141.168, 172.66.1.164
Response IP	162.159.141.168
Found	Yes
Hash	fb9701c831165af5a8b23ec4d9baecb708f9f48af7a01b007cd53f627119c125
SimHash	ec941d42c1d8

Groups

*

Rule	Path
Disallow	/global_inc/
Allow	/global_inc/*.css
Allow	/global_inc/*.js

Rule

Path

Disallow

/global_inc/

Allow

/global_inc/*.css

Allow

/global_inc/*.js

*

Rule	Path
Disallow	/global_engine/ajax/

Rule

Path

Disallow

/global_engine/ajax/

Back to top

Other Records

Field	Value
sitemap	https://www.awg.org/autositemapindex.xml

Field

Value

sitemap

https://www.awg.org/autositemapindex.xml

Back to top

Comments

When crawlers hit the engine dir they sometimes publish confusing links to site content
in their search results so we exclude these specific engines from crawling it.
Note: Certain crawlers do need access to this directory so we do not want a blanket
exlude statment here.

Back to top

Warnings

18 invalid lines.

Back to top

awg.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

*

Other Records

Comments

Warnings

awg.org
robots.txt