bw3.dev
robots.txt

Robots Exclusion Standard data for bw3.dev

Resource Scan

Scan Details

Site Domain bw3.dev
Base Domain bw3.dev
Scan Status Ok
Last Scan2025-08-08T16:47:48+00:00
Next Scan 2025-08-09T16:47:48+00:00

Last Scan

Scanned2025-08-08T16:47:48+00:00
URL https://bw3.dev/robots.txt
Domain IPs 150.136.217.7
Response IP 150.136.217.7
Found Yes
Hash 96616387c44e11d0b547252268b405296197f47c46ebd2e1e12c45afb2a87cbc
SimHash 30da9041cced

Groups

*

Rule Path
Disallow /followers
Disallow /following
Disallow /admin
Disallow /remote_interaction
Disallow /remote_follow

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

facebookexternalhit

Rule Path
Disallow /

facebookcatalog

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

Comments

  • https://commoncrawl.org/faq - Has been used by ChatGPT, Bard, and others for training a number of models.
  • The bot used when a ChatGPT user instructs it to reference your website.
  • The bot that OpenAI uses to collect bulk training data for ChatGPT.
  • Block Google from scraping your site for Bard and VertexAI.
  • Omgili sell data they scrape to others for their AI training.
  • Meta’s bot that crawls public web pages to improve language models for their speech recognition technology.
  • Apple very kindly told us how to block their scraper AFTER they'd scraped everything.
  • is used by used by Anthropic to gather data for their “AI” products, such as Claude
  • is another agent used by Anthropic that is more specifically related to Claude
  • is a somewhat dishonest scraping bot used to collect data to train LLMs. This is their default user-agent, but they make it easy for their clients to change it to something else and ignore your wishes
  • This is just getting stupid and I hope governments step in to wreck these tech-bro thieves.