edworkforce.house.gov
robots.txt
Robots Exclusion Standard data for edworkforce.house.gov
Resource Scan
Scan Details
Site Domain | edworkforce.house.gov |
Base Domain | house.gov |
Scan Status | Ok |
Last Scan | 2024-10-27T06:30:18+00:00 |
Next Scan | 2024-11-26T06:30:18+00:00 |
Last Scan
Scanned | 2024-10-27T06:30:18+00:00 |
URL | https://edworkforce.house.gov/robots.txt |
Domain IPs | 23.210.99.194, 2600:1413:b000:792::12a8, 2600:1413:b000:799::12a8 |
Response IP | 104.103.151.111 |
Found | Yes |
Hash | dcc17af5991f7e254dd130cef54dfa1dae338d531c5808eea5b5a4bfbbba36eb |
SimHash | 697ad6714390 |
Groups
*
Rule | Path |
---|---|
Disallow | /news/documentprint.aspx* |
Allow | /components/util/documentsitemap.aspx |
Disallow | /printform/ |
Disallow | /thankyou/ |
Disallow | /showissue.aspx |
Disallow | /PRArticle.aspx |
Disallow | /archive/ |
Disallow | /press_releases.aspx |
Disallow | /fact_sheets.aspx |
Disallow | /Media/ |
Disallow | /showissue.aspx?* |
Disallow | /PRArticle.aspx?* |
Disallow | /press_releases.aspx?* |
Disallow | /fact_sheets.aspx?* |
Disallow | /Calendar/EventSingle.aspx?EventID=186937 |
Disallow | /*?*Preview= |
Warnings
- `noindex` is not a known field.
Comments