reporterstatesman.com
robots.txt

Robots Exclusion Standard data for reporterstatesman.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	reporterstatesman.com
Base Domain	reporterstatesman.com
Scan Status	Ok
Last Scan	2024-10-26T18:52:35+00:00
Next Scan	2024-11-02T18:52:35+00:00

Last Scan

Scanned	2024-10-26T18:52:35+00:00
URL	https://reporterstatesman.com/robots.txt
Domain IPs	104.154.203.214
Response IP	104.154.203.214
Found	Yes
Hash	ba8b3c11e52cff16a4a92ecf7e0b295d97bc92b5551a7e186e99eff822bd2c55
SimHash	a20f1bc5e455

Groups

*

Rule	Path
Disallow	/?page=*
Disallow	/editions/*
Disallow	/users/*
Disallow	/feed
Disallow	/feeds
Disallow	/rss
Disallow	/?q=*

Rule

Path

Disallow

/*?*page=*

Disallow

/editions/*

Disallow

/users/*

Disallow

/feed

Disallow

/feeds

Disallow

/rss

Disallow

/*?*q=*

Other Records

Field	Value
crawl-delay	5

Field

Value

crawl-delay

5

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://s3.amazonaws.com/cjp-public-access/sitemaps/reporter_statesman_tx/sitemap.xml.gz

Field

Value

sitemap

https://s3.amazonaws.com/cjp-public-access/sitemaps/reporter_statesman_tx/sitemap.xml.gz

Back to top

Comments

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file

Back to top

reporterstatesman.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

semrushbot

petalbot

Other Records

Comments

reporterstatesman.com
robots.txt