edworkforce.house.gov
robots.txt

Robots Exclusion Standard data for edworkforce.house.gov

Resource Scan

Scan Details

Site Domain edworkforce.house.gov
Base Domain house.gov
Scan Status Ok
Last Scan2024-05-30T06:28:47+00:00
Next Scan 2024-06-29T06:28:47+00:00

Last Scan

Scanned2024-05-30T06:28:47+00:00
URL https://edworkforce.house.gov/robots.txt
Domain IPs 23.210.99.194, 2600:1413:b000:78a::12a8, 2600:1413:b000:791::12a8
Response IP 104.110.73.179
Found Yes
Hash dcc17af5991f7e254dd130cef54dfa1dae338d531c5808eea5b5a4bfbbba36eb
SimHash 697ad6714390

Groups

*

Rule Path
Disallow /news/documentprint.aspx*
Allow /components/util/documentsitemap.aspx
Disallow /printform/
Disallow /thankyou/
Disallow /showissue.aspx
Disallow /PRArticle.aspx
Disallow /archive/
Disallow /press_releases.aspx
Disallow /fact_sheets.aspx
Disallow /Media/
Disallow /showissue.aspx?*
Disallow /PRArticle.aspx?*
Disallow /press_releases.aspx?*
Disallow /fact_sheets.aspx?*
Disallow /Calendar/EventSingle.aspx?EventID=186937
Disallow /*?*Preview=

gsa-crawler

Rule Path
Disallow /videos

siteimprovebot-crawler

Rule Path
Disallow /news/email/

Comments

  • News
  • Utilities
  • Responses
  • Custom

Warnings

  • `noindex` is not a known field.