carnegieherald.com
robots.txt

Robots Exclusion Standard data for carnegieherald.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	carnegieherald.com
Base Domain	carnegieherald.com
Scan Status	Ok
Last Scan	2024-10-31T00:25:21+00:00
Next Scan	2024-11-07T00:25:21+00:00

Last Scan

Scanned	2024-10-31T00:25:21+00:00
URL	https://carnegieherald.com/robots.txt
Domain IPs	104.154.203.214
Response IP	104.154.203.214
Found	Yes
Hash	8bf2598980acf14f273b516bbd6a100d3bf74548bee2d2f029e0d51ffb180b81
SimHash	a00d1b45ac57

Groups

*

Rule	Path
Disallow	/?page=*
Disallow	/editions/*
Disallow	/users/*
Disallow	/feed
Disallow	/feeds
Disallow	/rss
Disallow	/?q=*

Rule

Path

Disallow

/*?*page=*

Disallow

/editions/*

Disallow

/users/*

Disallow

/feed

Disallow

/feeds

Disallow

/rss

Disallow

/*?*q=*

Other Records

Field	Value
crawl-delay	5

Field

Value

crawl-delay

5

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://s3.amazonaws.com/cjp-public-access/sitemaps/tch/sitemap.xml.gz

Field

Value

sitemap

https://s3.amazonaws.com/cjp-public-access/sitemaps/tch/sitemap.xml.gz

Back to top

Comments

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file

Back to top

carnegieherald.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

semrushbot

petalbot

Other Records

Comments

carnegieherald.com
robots.txt