deepdyve.com
robots.txt

Robots Exclusion Standard data for deepdyve.com

Resource Scan

Scan Details

Site Domain deepdyve.com
Base Domain deepdyve.com
Scan Status Ok
Last Scan2026-03-06T19:41:47+00:00
Next Scan 2026-04-05T19:41:47+00:00

Last Scan

Scanned2026-03-06T19:41:47+00:00
URL https://deepdyve.com/robots.txt
Domain IPs 104.20.14.194, 104.20.15.194, 2606:4700:10::6814:ec2, 2606:4700:10::6814:fc2
Response IP 104.20.14.194
Found Yes
Hash 80c7e1bf966ffc0c923a40d0a33126f315026566b09da9c987d0802a08ca69a4
SimHash 6ff3d844453c

Groups

*

Rule Path
Disallow /cgi-bin/
Disallow /openurl
Disallow /search
Disallow /browse-wr/
Disallow /enterprise-free-trial
Disallow /rental-link
Disallow /timescited

Other Records

Field Value
crawl-delay 5

googlebot

Rule Path
Disallow /assets/images/doccover.png
Disallow /cgi-bin/
Disallow /openurl
Disallow /search
Disallow /browse-wr/
Disallow /enterprise-free-trial
Disallow /rental-link
Disallow /timescited

gptbot

Rule Path
Allow /

google-extended

Rule Path
Allow /

claude-web

Rule Path
Allow /

Other Records

Field Value
sitemap https://www.deepdyve.com/sitemaps/sitemap_index.xml

Comments

  • DeepDyve robots.txt
  • Updated: 2025-12-15
  • Sitemap architecture follows /sitemap-spec.md
  • ==================================================
  • Default Crawl Rules (All Bots)
  • ==================================================
  • ==================================================
  • Sitemap Index Reference
  • ==================================================
  • ==================================================
  • Googlebot-Specific Rules
  • ==================================================
  • ==================================================
  • LLM Crawler Permissions
  • Per sitemap-spec.md section 8.1
  • ==================================================
  • OpenAI GPT Crawler
  • Google Extended (Bard/Gemini training)
  • Anthropic Claude Crawler
  • ==================================================
  • Additional LLM Crawlers (Optional)
  • ==================================================
  • Common Crawl (used by many AI models)
  • User-agent: CCBot
  • Allow: /
  • Meta AI (Facebook/Instagram AI)
  • User-agent: FacebookBot
  • Allow: /
  • Perplexity AI
  • User-agent: PerplexityBot
  • Allow: /