nicholas.duke.edu
robots.txt

Robots Exclusion Standard data for nicholas.duke.edu

Resource Scan

Scan Details

Site Domain nicholas.duke.edu
Base Domain duke.edu
Scan Status Ok
Last Scan2025-10-29T13:42:46+00:00
Next Scan 2025-11-28T13:42:46+00:00

Last Scan

Scanned2025-10-29T13:42:46+00:00
URL https://nicholas.duke.edu/robots.txt
Domain IPs 152.3.80.200
Response IP 152.3.80.200
Found Yes
Hash 9eb0b2aa43b5c201d66108c8ae8f0fecd847043456eb8f999127a8d5b2e09597
SimHash a09ea351efc4

Groups

*

Rule Path
Disallow /core/
Disallow /includes/
Disallow /misc/
Disallow /modules/
Disallow /profiles/
Disallow /scripts/
Disallow /themes/
Disallow /update.php
Disallow /install.php
Disallow /admin/
Disallow /user/login
Disallow /user/register
Disallow /cron.php
Disallow /xmlrpc.php
Disallow /marinelab/news/archives
Disallow /news/archives
Allow /

Other Records

Field Value
crawl-delay 1

Comments

  • =============================================================================
  • robots.txt for Drupal sites at Duke University
  • -----------------------------------------------------------------------------
  • To set max crawl rate ≈ 1 page/second for bots that support Crawl-delay.
  • Note: Googlebot ignores Crawl-delay; adjust Google’s crawl rate in Google Search Console.
  • -----------------------------------------------------------------------------
  • ---------------------------------------------------------------------
  • Disallow common Drupal system paths (adjust as needed per site)
  • ---------------------------------------------------------------------
  • ---------------------------------------------------------------------
  • Everything else is allowed
  • ---------------------------------------------------------------------