repository.duke.edu
robots.txt

Robots Exclusion Standard data for repository.duke.edu

Resource Scan

Scan Details

Site Domain repository.duke.edu
Base Domain duke.edu
Scan Status Ok
Last Scan2025-10-31T04:27:12+00:00
Next Scan 2025-11-30T04:27:12+00:00

Last Scan

Scanned2025-10-31T04:27:12+00:00
URL https://repository.duke.edu/robots.txt
Domain IPs 152.3.80.200
Response IP 152.3.80.200
Found Yes
Hash 5d99f1b893115a340ed69e4576afc59e10abf6ad0857288d16663bd2707637f5
SimHash b68cede3f4c4

Groups

*

Rule Path
Disallow /catalog/facet/*
Disallow *f%5B*_facet_*%5D%5B%5D*
Disallow *f%5B*_facet_*%5D%5B%5D*
Disallow *f%5Bcollection_title_ssi%5D%5B%5D*
Disallow *f%5Bcollection_title_ssi%5D%5B%5D*

Other Records

Field Value
crawl-delay 10

semrushbot
petalbot

Rule Path
Disallow /

Comments

  • See https://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
  • To ban all spiders from the entire site uncomment the next two lines:
  • User-agent: *
  • Disallow: /
  • Prevent all bots from crawling most facet links.
  • Note that paths with some facet parameters are intentionally
  • permitted to be crawled, e.g., admin_set_title_ssi & common_model_name_ssi
  • Also, encoded chars in URL (%5B) might need to be decoded according to:
  • https://www.google.com/webmasters/tools/robots-testing-tool
  • Block specific bots