theshitbot.com
robots.txt

Robots Exclusion Standard data for theshitbot.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	theshitbot.com
Base Domain	theshitbot.com
Scan Status	Ok
Last Scan	2026-01-10T16:42:00+00:00
Next Scan	2026-02-09T16:42:00+00:00

Last Scan

Scanned	2026-01-10T16:42:00+00:00
URL	https://theshitbot.com/robots.txt
Domain IPs	104.26.10.181, 104.26.11.181, 172.67.70.45, 2606:4700:20::681a:ab5, 2606:4700:20::681a:bb5, 2606:4700:20::ac43:462d
Response IP	172.67.70.45
Found	Yes
Hash	eb4d42e2502a70eb10606db620926d1c13524d84beb5649fdc0371331100c13a
SimHash	41b2d16a0527

Groups

*

Rule	Path
Disallow	/wp-admin/
Disallow	/?s=
Disallow	/search/
Allow	/wp-admin/admin-ajax.php

Rule

Path

Disallow

/wp-admin/

Disallow

/?s=

Disallow

/search/

Allow

/wp-admin/admin-ajax.php

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

10

google-extended

Rule	Path
Allow	/

Rule

Path

Allow

/

chatgpt-user

Rule	Path
Allow	/

Rule

Path

Allow

/

claudebot

Rule	Path
Allow	/

Rule

Path

Allow

/

perplexitybot

Rule	Path
Allow	/

Rule

Path

Allow

/

barkrowler

Rule	Path
Allow	/

Rule

Path

Allow

/

ccbot

Rule	Path
Allow	/

Rule

Path

Allow

/

Back to top

Comments

General crawler rules
----------------------------
Allow access for AI crawlers
----------------------------
Google’s AI (Gemini / AI Overviews)
OpenAI’s ChatGPT Browse / o1-preview models
Anthropic Claude’s AI crawler
Perplexity AI
Brave Search’s AI crawler
Common Crawl (used by many LLMs for pretraining)

Back to top

theshitbot.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

google-extended

chatgpt-user

claudebot

perplexitybot

barkrowler

ccbot

Comments

theshitbot.com
robots.txt