water.usgs.gov
robots.txt

Robots Exclusion Standard data for water.usgs.gov

Resource Scan

Scan Details

Site Domain water.usgs.gov
Base Domain usgs.gov
Scan Status Ok
Last Scan2024-06-01T15:42:10+00:00
Next Scan 2024-07-01T15:42:10+00:00

Last Scan

Scanned2024-06-01T15:42:10+00:00
URL https://water.usgs.gov/robots.txt
Domain IPs 137.227.233.178, 2001:49c8:0:126c::76
Response IP 137.227.233.178
Found Yes
Hash cf37c5adee7f1698d6cce556f43117a9d345689365054862b17a9b5e18edf839
SimHash 614069659f75

Groups

*

Rule Path
Disallow /camera/
Disallow /cgi-bin/feedback_form
Disallow /cgi-bin/lookup
Disallow /icons/
Disallow /images/
Disallow /nawdex/
Disallow /nawqa-only/sparrowweb/
Disallow /nsip/nsipmaps/
Disallow /outreach/images/
Disallow /preview/
Disallow /project_alert/
Disallow /public/
Disallow /usgs_access/
Disallow /watuse/wuhuc/
Disallow /usgs/ogw/software-archive
Disallow /usgs/owq/software-archive

search.usgs.gov

Rule Path
Allow /usgs/

Comments

  • USGS Photo Competition (USGS and retirees)
  • old nawdex files
  • more than 65,000 files
  • Project Alert pages should not be indexed
  • how to get access to internal pages
  • Files for water use by HUC
  • Aug 10 request from Cian Dawson