fbievan.live
robots.txt

Robots Exclusion Standard data for fbievan.live

Resource Scan

Scan Details

Site Domain fbievan.live
Base Domain fbievan.live
Scan Status Ok
Last Scan2025-08-28T18:57:45+00:00
Next Scan 2025-09-11T18:57:45+00:00

Last Scan

Scanned2025-08-28T18:57:45+00:00
URL https://fbievan.live/robots.txt
Domain IPs 143.198.225.189
Response IP 143.198.225.189
Found Yes
Hash 09336821db4cf62725cd14eca73b59631cf03395c8f281c67ac13819c4d17395
SimHash 70341f43c475

Groups

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

img2dataset

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

applebot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

friendlycrawler

Rule Path
Disallow /

Comments

  • from https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
  • from https://github.com/healsdata/ai-training-opt-out
  • may not work, needs more research (see https://github.com/rom1504/img2dataset/issues/48)
  • AhrefsBot crawls for data for an "SEO Dataset"—one of their "products" based on this dataset is "AI Writing Tools"
  • from https://www.cyberciti.biz/web-developer/block-openai-bard-bing-ai-crawler-bots-using-robots-txt-file/
  • from https://netfuture.ch/2023/07/blocking-ai-crawlers-robots-txt-chatgpt/
  • from https://claytonerrington.com/blog/robots-and-ai/
  • from https://darkvisitors.com/
  • from https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler/