newspapers.com
robots.txt

Robots Exclusion Standard data for newspapers.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	newspapers.com
Base Domain	newspapers.com
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	2024-11-08T07:10:52+00:00
Next Scan	2025-02-06T07:10:52+00:00

Last Successful Scan

Scanned	2024-04-13T06:53:53+00:00
URL	https://newspapers.com/robots.txt
Domain IPs	104.17.112.43, 104.17.113.43, 2606:4700::6811:702b, 2606:4700::6811:712b
Response IP	104.17.113.43
Found	Yes
Hash	bab3655582905083287c89046cd3884b6b361a277c41ddef21ba6e3c9dc2a8dc
SimHash	101e8b41d17b

Groups

*

Rule	Path
Disallow	/busy.html
Disallow	/error.html
Disallow	/error.php
Disallow	/download/
Disallow	/clippings/download/
Allow	/newspage/

Rule

Path

Disallow

/busy.html

Disallow

/error.html

Disallow

/error.php

Disallow

/download/

Disallow

/clippings/download/

Allow

/newspage/

ahrefsbot

Rule	Path
Disallow	/busy.html
Disallow	/error.html
Disallow	/error.php

Rule

Path

Disallow

/busy.html

Disallow

/error.html

Disallow

/error.php

googlebot-image

Rule	Path
Allow	/*

Rule

Path

Allow

/*

applebot

Rule	Path
Allow	/*

Rule

Path

Allow

/*

facebot

Rule	Path
Allow	/*

Rule

Path

Allow

/*

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Comments

Slow Bots see https://ahrefs.com/robot for more info
Updated 1/10/2024

Back to top

newspapers.comrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

ahrefsbot

googlebot-image

applebot

facebot

google-extended

gptbot

Comments

newspapers.com
robots.txt