loc.gov
robots.txt
Robots Exclusion Standard data for loc.gov
Resource Scan
Scan Details
Site Domain | loc.gov |
Base Domain | loc.gov |
Scan Status | Ok |
Last Scan | 2024-11-06T21:59:26+00:00 |
Next Scan | 2024-11-20T21:59:26+00:00 |
Last Scan
Scanned | 2024-11-06T21:59:26+00:00 |
URL | https://loc.gov/robots.txt |
Redirect | https://www.loc.gov/robots.txt |
Redirect Domain | www.loc.gov |
Redirect Base | loc.gov |
Domain IPs | 104.17.6.58, 104.18.64.82, 2606:4700::6811:63a, 2606:4700::6812:4052 |
Redirect IPs | 104.17.6.58, 104.18.64.82, 2606:4700::6811:63a, 2606:4700::6812:4052 |
Response IP | 104.17.6.58 |
Found | Yes |
Hash | 994f1660f526307cafb387d72d060c4691731bc3dea614961dc3e0cc2a5c9121 |
SimHash | 2d1bd2d4c2f3 |
Groups
*
Rule | Path |
---|---|
Disallow | /cgi-bin/ |
Disallow | /web_arch/ |
Disallow | /rr/mopic/staff |
Disallow | /loc/volunteers |
Disallow | /ficmanagers |
Disallow | /preserv/extranet/ |
Disallow | /myloc |
Disallow | /nationalfilmregistry |
Disallow | /fedsearch |
Disallow | /search |
Disallow | /pictures/search |
Disallow | /pictures/related |
Other Records
Field | Value |
---|---|
crawl-delay | 5 |
Other Records
Field | Value |
---|---|
sitemap | https://www.loc.gov/sitemap.xml |
Warnings
- 2 invalid lines.
Comments