protocols.io
robots.txt

Robots Exclusion Standard data for protocols.io

Resource Scan

Scan Details

Site Domain protocols.io
Base Domain protocols.io
Scan Status Ok
Last Scan2026-02-14T09:56:36+00:00
Next Scan 2026-02-28T09:56:36+00:00

Last Scan

Scanned2026-02-14T09:56:36+00:00
URL https://protocols.io/robots.txt
Redirect https://www.protocols.io:443/robots.txt
Redirect Domain www.protocols.io
Redirect Base protocols.io
Domain IPs 18.223.137.131, 3.151.171.238
Redirect IPs 18.223.137.131, 3.151.171.238
Response IP 3.151.171.238
Found Yes
Hash 32fe5eee904730436a415434c9e5ecc55d5ebbf8c965c5ce4d7f6920ddcbbfc6
SimHash 73285b81eff7

Groups

*

Rule Path
Disallow /private/
Disallow /blind/
Disallow /api/
Disallow /download
Disallow /pubchase
Disallow /spectro
Disallow /neb
Disallow /career/
Disallow /essays
Disallow /editorials
Disallow /test
Disallow /flux

gptbot

Rule Path
Disallow /private/
Disallow /api/

anthropic-ai

Rule Path
Disallow /private/
Disallow /api/

ccbot

Rule Path
Disallow /private/
Disallow /api/

Other Records

Field Value
sitemap https://www.protocols.io/sitemaps/protocols_sitemap.xml

Comments

  • =========================================================
  • robots.txt for https://www.protocols.io
  • Purpose:
  • - Allow discovery of public scientific content
  • - Protect private, authenticated, and system areas
  • - Provide explicit guidance to AI crawlers
  • =========================================================
  • -------------------------
  • Default rule (all crawlers)
  • -------------------------
  • AI crawlers
  • -------------------------
  • Sitemap
  • -------------------------