theshitbot.com
robots.txt

Robots Exclusion Standard data for theshitbot.com

Resource Scan

Scan Details

Site Domain theshitbot.com
Base Domain theshitbot.com
Scan Status Ok
Last Scan2026-01-10T16:42:00+00:00
Next Scan 2026-02-09T16:42:00+00:00

Last Scan

Scanned2026-01-10T16:42:00+00:00
URL https://theshitbot.com/robots.txt
Domain IPs 104.26.10.181, 104.26.11.181, 172.67.70.45, 2606:4700:20::681a:ab5, 2606:4700:20::681a:bb5, 2606:4700:20::ac43:462d
Response IP 172.67.70.45
Found Yes
Hash eb4d42e2502a70eb10606db620926d1c13524d84beb5649fdc0371331100c13a
SimHash 41b2d16a0527

Groups

*

Rule Path
Disallow /wp-admin/
Disallow /?s=
Disallow /search/
Allow /wp-admin/admin-ajax.php

Other Records

Field Value
crawl-delay 10

google-extended

Rule Path
Allow /

chatgpt-user

Rule Path
Allow /

claudebot

Rule Path
Allow /

perplexitybot

Rule Path
Allow /

barkrowler

Rule Path
Allow /

ccbot

Rule Path
Allow /

Comments

  • General crawler rules
  • ----------------------------
  • Allow access for AI crawlers
  • ----------------------------
  • Google’s AI (Gemini / AI Overviews)
  • OpenAI’s ChatGPT Browse / o1-preview models
  • Anthropic Claude’s AI crawler
  • Perplexity AI
  • Brave Search’s AI crawler
  • Common Crawl (used by many LLMs for pretraining)