newspapers.com
robots.txt
Robots Exclusion Standard data for newspapers.com
Resource Scan
Scan Details
Site Domain | newspapers.com |
Base Domain | newspapers.com |
Scan Status | Failed |
Failure Stage | Fetching resource. |
Failure Reason | Server returned a client error. |
Last Scan | 2024-11-08T07:10:52+00:00 |
Next Scan | 2025-02-06T07:10:52+00:00 |
Last Successful Scan
Scanned | 2024-04-13T06:53:53+00:00 |
URL | https://newspapers.com/robots.txt |
Domain IPs | 104.17.112.43, 104.17.113.43, 2606:4700::6811:702b, 2606:4700::6811:712b |
Response IP | 104.17.113.43 |
Found | Yes |
Hash | bab3655582905083287c89046cd3884b6b361a277c41ddef21ba6e3c9dc2a8dc |
SimHash | 101e8b41d17b |
Groups
*
Rule | Path |
---|---|
Disallow | /busy.html |
Disallow | /error.html |
Disallow | /error.php |
Disallow | /download/ |
Disallow | /clippings/download/ |
Allow | /newspage/ |
Comments