archives.gov
robots.txt

Robots Exclusion Standard data for archives.gov

Resource Scan

Scan Details

Site Domain archives.gov
Base Domain archives.gov
Scan Status Ok
Last Scan2024-09-22T16:57:09+00:00
Next Scan 2024-10-22T16:57:09+00:00

Last Scan

Scanned2024-09-22T16:57:09+00:00
URL https://archives.gov/robots.txt
Redirect https://www.archives.gov/robots.txt
Redirect Domain www.archives.gov
Redirect Base archives.gov
Domain IPs 2600:1f18:43e8:f301:9046:c05f:75e7:c481, 2600:1f18:43e8:f302:b470:d266:4d03:3ed8, 52.206.136.3, 52.44.89.206
Redirect IPs 2600:9000:2014:2c00:f:fd2b:b880:93a1, 2600:9000:2014:5800:f:fd2b:b880:93a1, 2600:9000:2014:6000:f:fd2b:b880:93a1, 2600:9000:2014:7200:f:fd2b:b880:93a1, 2600:9000:2014:9a00:f:fd2b:b880:93a1, 2600:9000:2014:ca00:f:fd2b:b880:93a1, 2600:9000:2014:d800:f:fd2b:b880:93a1, 2600:9000:2014:e400:f:fd2b:b880:93a1, 54.230.71.101, 54.230.71.127, 54.230.71.25, 54.230.71.30
Response IP 13.33.30.25
Found Yes
Hash 637513f759216025daef58ad327663dcf289c8f453ac4b1d44651649239c8ccb
SimHash b8169d0b4744

Groups

*

Rule Path
Disallow /citizen-archivist/history-hub/hh-test
Disallow /developer/artificial-intelligence-and-machine-learning-datasets
Disallow /developer/1940-census
Disallow /developer/national-archives-catalog-dataset

Other Records

Field Value
crawl-delay 10

usasearch

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 2

Other Records

Field Value
sitemap https://www.archives.gov/sitemap.xml
sitemap https://www.archives.gov/files/sitemap.xml
sitemap https://www.archives.gov/research/native-americans/bia/photos/sitemap.xml

Comments

  • robots.txt
  • This file is to prevent the crawling and indexing of certain parts
  • of your site by web crawlers and spiders run by sites like Yahoo!
  • and Google. By telling these "robots" where not to go on your site,
  • you save bandwidth and server resources.
  • This file will be ignored unless it is at the root of your host:
  • Used: http://example.com/robots.txt
  • Ignored: http://example.com/site/robots.txt
  • For more information about the robots.txt standard, see:
  • http://www.robotstxt.org/robotstxt.html