deepdyve.com
robots.txt

Robots Exclusion Standard data for deepdyve.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	deepdyve.com
Base Domain	deepdyve.com
Scan Status	Ok
Last Scan	2026-03-06T19:41:47+00:00
Next Scan	2026-04-05T19:41:47+00:00

Last Scan

Scanned	2026-03-06T19:41:47+00:00
URL	https://deepdyve.com/robots.txt
Domain IPs	104.20.14.194, 104.20.15.194, 2606:4700:10::6814:ec2, 2606:4700:10::6814:fc2
Response IP	104.20.14.194
Found	Yes
Hash	80c7e1bf966ffc0c923a40d0a33126f315026566b09da9c987d0802a08ca69a4
SimHash	6ff3d844453c

Groups

*

Rule	Path
Disallow	/cgi-bin/
Disallow	/openurl
Disallow	/search
Disallow	/browse-wr/
Disallow	/enterprise-free-trial
Disallow	/rental-link
Disallow	/timescited

Rule

Path

Disallow

/cgi-bin/

Disallow

/openurl

Disallow

/search

Disallow

/browse-wr/

Disallow

/enterprise-free-trial

Disallow

/rental-link

Disallow

/timescited

Other Records

Field	Value
crawl-delay	5

Field

Value

crawl-delay

5

googlebot

Rule	Path
Disallow	/assets/images/doccover.png
Disallow	/cgi-bin/
Disallow	/openurl
Disallow	/search
Disallow	/browse-wr/
Disallow	/enterprise-free-trial
Disallow	/rental-link
Disallow	/timescited

Rule

Path

Disallow

/assets/images/doccover.png

Disallow

/cgi-bin/

Disallow

/openurl

Disallow

/search

Disallow

/browse-wr/

Disallow

/enterprise-free-trial

Disallow

/rental-link

Disallow

/timescited

gptbot

Rule	Path
Allow	/

Rule

Path

Allow

/

google-extended

Rule	Path
Allow	/

Rule

Path

Allow

/

claude-web

Rule	Path
Allow	/

Rule

Path

Allow

/

Back to top

Other Records

Field	Value
sitemap	https://www.deepdyve.com/sitemaps/sitemap_index.xml

Field

Value

sitemap

https://www.deepdyve.com/sitemaps/sitemap_index.xml

Back to top

Comments

DeepDyve robots.txt
Updated: 2025-12-15
Sitemap architecture follows /sitemap-spec.md
==================================================
Default Crawl Rules (All Bots)
==================================================
==================================================
Sitemap Index Reference
==================================================
==================================================
Googlebot-Specific Rules
==================================================
==================================================
LLM Crawler Permissions
Per sitemap-spec.md section 8.1
==================================================
OpenAI GPT Crawler
Google Extended (Bard/Gemini training)
Anthropic Claude Crawler
==================================================
Additional LLM Crawlers (Optional)
==================================================
Common Crawl (used by many AI models)
User-agent: CCBot
Allow: /
Meta AI (Facebook/Instagram AI)
User-agent: FacebookBot
Allow: /
Perplexity AI
User-agent: PerplexityBot
Allow: /

Back to top

deepdyve.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

googlebot

gptbot

google-extended

claude-web

Other Records

Comments

deepdyve.com
robots.txt