loc.gov
robots.txt

Robots Exclusion Standard data for loc.gov

Resource Scan

Scan Details

Site Domain loc.gov
Base Domain loc.gov
Scan Status Ok
Last Scan2024-11-06T21:59:26+00:00
Next Scan 2024-11-20T21:59:26+00:00

Last Scan

Scanned2024-11-06T21:59:26+00:00
URL https://loc.gov/robots.txt
Redirect https://www.loc.gov/robots.txt
Redirect Domain www.loc.gov
Redirect Base loc.gov
Domain IPs 104.17.6.58, 104.18.64.82, 2606:4700::6811:63a, 2606:4700::6812:4052
Redirect IPs 104.17.6.58, 104.18.64.82, 2606:4700::6811:63a, 2606:4700::6812:4052
Response IP 104.17.6.58
Found Yes
Hash 994f1660f526307cafb387d72d060c4691731bc3dea614961dc3e0cc2a5c9121
SimHash 2d1bd2d4c2f3

Groups

baiduspider

Rule Path
Disallow /

baiduspider-image

Rule Path
Disallow /

*

Rule Path
Disallow /cgi-bin/
Disallow /web_arch/
Disallow /rr/mopic/staff
Disallow /loc/volunteers
Disallow /ficmanagers
Disallow /preserv/extranet/
Disallow /myloc
Disallow /nationalfilmregistry
Disallow /fedsearch
Disallow /search
Disallow /pictures/search
Disallow /pictures/related

Other Records

Field Value
crawl-delay 5

Other Records

Field Value
sitemap https://www.loc.gov/sitemap.xml

Comments

  • Baiduspider

Warnings

  • 2 invalid lines.