newspaperarchive.com
robots.txt
Robots Exclusion Standard data for newspaperarchive.com
Resource Scan
Scan Details
Site Domain | newspaperarchive.com |
Base Domain | newspaperarchive.com |
Scan Status | Failed |
Failure Stage | Fetching resource. |
Failure Reason | Server returned a client error. |
Last Scan | 2024-09-05T08:34:59+00:00 |
Next Scan | 2024-12-04T08:34:59+00:00 |
Last Successful Scan
Scanned | 2023-04-18T13:14:26+00:00 |
URL | https://newspaperarchive.com/robots.txt |
Domain IPs | 172.66.40.104, 172.66.43.152, 2606:4700:3108::ac42:2868, 2606:4700:3108::ac42:2b98 |
Response IP | 172.66.40.104 |
Found | Yes |
Hash | 8d9f1bd834c05e450698a97eebf5d4a7e6e2cf98c1d29cbc4d09f6a30c47d1ad |
SimHash | c89a49d3c6f0 |
Groups
*
Rule | Path |
---|---|
Disallow | *qa.newspaperarchive.com |
Disallow | *access.newspaperarchive.com |
Disallow | /tags/* |
Disallow | /serverstatus/* |
Disallow | /cache/* |
Disallow | /IIPViewerWeb/* |
Disallow | /? |
Disallow | /profile/* |
Disallow | /Pubjpgimages/ |
Other Records
Field | Value |
---|---|
sitemap | https://newspaperarchive.com/sitemap.xml |