wwwnc.cdc.gov
robots.txt

Robots Exclusion Standard data for wwwnc.cdc.gov

Resource Scan

Scan Details

Site Domain wwwnc.cdc.gov
Base Domain cdc.gov
Scan Status Ok
Last Scan2024-09-25T23:05:29+00:00
Next Scan 2024-10-25T23:05:29+00:00

Last Scan

Scanned2024-09-25T23:05:29+00:00
URL https://wwwnc.cdc.gov/robots.txt
Domain IPs 13.107.246.73, 2620:1ec:bdf::59
Response IP 13.107.246.59
Found Yes
Hash 3a3751185120ea6e64d14dc2aa7e5d0e5d234d1055363e04807e74dee34de404
SimHash 6458d2e7c957

Groups

roverbot

Rule Path
Disallow /

googlebot

Rule Path
Allow /travel
Disallow *.js
Disallow /

Other Records

Field Value
crawl-delay 20

googlebot

Rule Path
Allow /eid
Disallow *.js
Disallow /

Other Records

Field Value
crawl-delay 20

emailsiphon

Rule Path
Disallow /

mindspider

Rule Path
Disallow /

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 20

gsa-crawler

Rule Path
Disallow /

*

Rule Path
Disallow /eid/Scripts/EidMetrics.js

*

Rule Path
Disallow /eid/article/*.pptx
Disallow /eid/article/*-combined.pdf

Comments

  • Rover is a bad dog
  • GoogleBot
  • EmailSiphon is a hunter/gatherer which extracts email addresses for spam-mailers to use
  • Exclude MindSpider since it appears to be ill-behaved
  • Exclude script for custom EID metrics
  • Exclude resource intensive files for EID