datacouch.io
robots.txt

Robots Exclusion Standard data for datacouch.io

Resource Scan

Scan Details

Site Domain datacouch.io
Base Domain datacouch.io
Scan Status Ok
Last Scan2025-11-28T18:42:36+00:00
Next Scan 2025-12-28T18:42:36+00:00

Last Scan

Scanned2025-11-28T18:42:36+00:00
URL https://datacouch.io/robots.txt
Domain IPs 104.21.57.153, 172.67.146.222, 2606:4700:3036::6815:3999, 2606:4700:3036::ac43:92de
Response IP 104.21.57.153
Found Yes
Hash 2eaaa99990a408c654783f389c678a781e6bd2a7cc577e156b127d4c3c6f9b08
SimHash 29969a40a5e4

Groups

googlebot

Rule Path
Allow /

google-extended

Rule Path
Allow /

bingbot

Rule Path
Allow /

bingpreview

Rule Path
Allow /

gptbot

Rule Path
Allow /

claudebot

Rule Path
Allow /

perplexitybot

Rule Path
Allow /

ccbot

Rule Path
Allow /

Comments

  • =======================================
  • Robots.txt — Allow trusted crawlers
  • =======================================
  • Google (Search + Gemini)
  • Microsoft (Bing + Copilot + Edge)
  • ChatGPT (OpenAI)
  • Anthropic (Claude)
  • Perplexity AI
  • CCBot (Common Crawl, often used by AI systems)
  • =======================================
  • Optional: Sensitive paths (add as needed)
  • =======================================
  • Disallow: /admin/
  • Disallow: /api/
  • Disallow: /staging/
  • Disallow: /internal/