newspapers.com
robots.txt

Robots Exclusion Standard data for newspapers.com

Resource Scan

Scan Details

Site Domain newspapers.com
Base Domain newspapers.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-11-08T07:10:52+00:00
Next Scan 2025-02-06T07:10:52+00:00

Last Successful Scan

Scanned2024-04-13T06:53:53+00:00
URL https://newspapers.com/robots.txt
Domain IPs 104.17.112.43, 104.17.113.43, 2606:4700::6811:702b, 2606:4700::6811:712b
Response IP 104.17.113.43
Found Yes
Hash bab3655582905083287c89046cd3884b6b361a277c41ddef21ba6e3c9dc2a8dc
SimHash 101e8b41d17b

Groups

*

Rule Path
Disallow /busy.html
Disallow /error.html
Disallow /error.php
Disallow /download/
Disallow /clippings/download/
Allow /newspage/

ahrefsbot

Rule Path
Disallow /busy.html
Disallow /error.html
Disallow /error.php

googlebot-image

Rule Path
Allow /*

applebot

Rule Path
Allow /*

facebot

Rule Path
Allow /*

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

Comments

  • Slow Bots see https://ahrefs.com/robot for more info
  • Updated 1/10/2024