newspaperarchive.com
robots.txt

Robots Exclusion Standard data for newspaperarchive.com

Resource Scan

Scan Details

Site Domain newspaperarchive.com
Base Domain newspaperarchive.com
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-09-05T08:34:59+00:00
Next Scan 2024-12-04T08:34:59+00:00

Last Successful Scan

Scanned2023-04-18T13:14:26+00:00
URL https://newspaperarchive.com/robots.txt
Domain IPs 172.66.40.104, 172.66.43.152, 2606:4700:3108::ac42:2868, 2606:4700:3108::ac42:2b98
Response IP 172.66.40.104
Found Yes
Hash 8d9f1bd834c05e450698a97eebf5d4a7e6e2cf98c1d29cbc4d09f6a30c47d1ad
SimHash c89a49d3c6f0

Groups

*

Rule Path
Disallow *qa.newspaperarchive.com
Disallow *access.newspaperarchive.com
Disallow /tags/*
Disallow /serverstatus/*
Disallow /cache/*
Disallow /IIPViewerWeb/*
Disallow /?
Disallow /profile/*
Disallow /Pubjpgimages/

googlebot

Rule Path
Allow /Pubjpgimages/

googlebot

Rule Path
Allow /Pubjpgimages/

archive.is
sitecheck.internetseer.com
zealbot
sitesnagger
webstripper
webcopier
fetch
offline explorer
teleport
teleportpro
webzip
linko
httrack
xenu
larbin
libwww
zyborg
download ninja
myfamilybot
ia_archiver
yandex
ccbot
voltron
blexbot
googlebot-image

No rules defined. All paths allowed.

Other Records

Field Value
sitemap https://newspaperarchive.com/sitemap.xml