thesandbox.net
robots.txt

Robots Exclusion Standard data for thesandbox.net

Resource Scan

Scan Details

Site Domain thesandbox.net
Base Domain thesandbox.net
Scan Status Ok
Last Scan2025-07-31T10:24:23+00:00
Next Scan 2025-08-30T10:24:23+00:00

Last Scan

Scanned2025-07-31T10:24:23+00:00
URL https://thesandbox.net/robots.txt
Domain IPs 164.90.247.55
Response IP 164.90.247.55
Found Yes
Hash 5a5bba6032dcb38e869fb8c363f909e1dc524f77bc9775b3250ea0ae231633d5
SimHash 7bdc91502430

Groups

*

Rule Path
Disallow /newsletter/

googlebot

Rule Path
Disallow /newsletter/

gptbot
gptbot-user
chatgpt
chatgpt-user
oai-searchbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

anthropic-ai
claudebot
claude-web

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

awariorssbot
awariosmartbot

Rule Path
Disallow /

omgili
omgilibot
bytespider
dataforseobot
imagesiftbot
magpie-crawler
youbot
peer39_crawler
peer39_crawler/1.0

Rule Path
Disallow /

*

Rule Path
Disallow /posts-old/
Disallow /posts-old1/

sistrix crawler

Rule Path
Disallow /

sistrix

Rule Path
Disallow /

pimonster

Rule Path
Disallow /

surdotlybot

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

barkrowler

Rule Path
Disallow /

Comments

  • Code: https://github.com/ellie/notes
  • Source: https://darkvisitors.com/
  • Disallow newsletters
  • OpenAI, ChatGPT
  • https://platform.openai.com/docs/gptbot
  • Google AI (Bard, etc)
  • https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
  • Amazon
  • Block common crawl
  • I have mixed feelings on this one, but many models are trained on this data
  • It is also used to bootstrap new search indices though
  • https://commoncrawl.org/ccbot
  • Facebook
  • https://developers.facebook.com/docs/sharing/bot/
  • Cohere.ai
  • https://darkvisitors.com/agents/cohere-ai
  • Perplexity
  • https://docs.perplexity.ai/docs/perplexitybot
  • Anthropic
  • https://darkvisitors.com/agents/anthropic-ai
  • https://darkvisitors.com/agents/claudebot
  • Apple
  • Awario
  • Other AI companies
  • Old blog posts
  • Block SISTRIX
  • Block Uptime robot
  • User-agent: UptimeRobot/2.0
  • Disallow: /
  • Block Ezooms Robot
  • User-agent: Ezooms Robot
  • Disallow: /
  • Block Perl LWP
  • User-agent: Perl LWP
  • Disallow: /
  • Block netEstate NE Crawler (+http://www.website-datenbank.de/)
  • User-agent: netEstate NE Crawler (+http://www.website-datenbank.de/)
  • Disallow: /
  • Block WiseGuys Robot
  • User-agent: WiseGuys Robot
  • Disallow: /
  • Block Turnitin Robot
  • User-agent: Turnitin Robot
  • Disallow: /
  • Block Heritrix - used by Internet Archive
  • User-agent: Heritrix
  • Disallow: /
  • Block pricepi

Warnings

  • 4 invalid lines.