omada.cafe
robots.txt

Robots Exclusion Standard data for omada.cafe

Resource Scan

Scan Details

Site Domain omada.cafe
Base Domain omada.cafe
Scan Status Ok
Last Scan2024-10-04T01:25:39+00:00
Next Scan 2024-11-03T01:25:39+00:00

Last Scan

Scanned2024-10-04T01:25:39+00:00
URL https://omada.cafe/robots.txt
Domain IPs 167.86.91.171
Response IP 167.86.91.171
Found Yes
Hash 5771a2dd3eeaf5c0ec3598817791190bf0e9d132f56dca60f026782ceda1b8b2
SimHash 72b85d7924d5

Groups

*

Rule Path
Allow /$
Allow /

Other Records

Field Value
crawl-delay 2

ahrefsbot

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

semrushbot-sa

Rule Path
Disallow /

censysinspect

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

aspiegelbot

Rule Path
Disallow /

zoominfobot

Rule Path
Disallow /

yandex

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

dataforseobot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

turnitin

Rule Path
Disallow /

seekport crawler

Rule Path
Disallow /

serpstatbot

Rule Path
Disallow /

img2dataset

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

applebot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

Comments

  • Welcome to robots.txt, the place where shunning bots is encouraged.
  • Humans are welcome to read. Bots are welcome to follow.
  • Policy
  • Allowed:
  • - Search engine indexers (even google, though I hate it)
  • - RSS Aggreggators (unless too aggressive)
  • - Archival services
  • - Fediverse federation stuff
  • Disallowed:
  • - Marketing or SEO crawlers
  • - Agressive and annoying bots
  • - Honeypots
  • If your piece of sloppy code gets in this list, you contribute to the
  • enshittification of the web and you should fuck off. Also stay the fuck
  • away from me and my data, as well as from the users I host here.
  • If your piece of shit software doesn't respect robots.txt, your IP will be blocked.
  • If you have any questions, reach out to fluffery at autistici dot org.
  • file was originally made by getimiskon at disroot dot org
  • +-------------------+
  • | |
  • | HALL OF SHAME |
  • | |
  • +-------------------+
  • Marketing/SEO cancer
  • I swear, I have to block this one from my Nginx settings, Fuck you.
  • Search crawler
  • Marketing/SEO cancer
  • Marketing/SEO cancer
  • 'Threat hunting' bullshit
  • Marketing/SEO
  • Huwei something or another, badly behaved
  • Marketing/SEO
  • YandexBot is a dickhead, too aggressive
  • Marketing/SEO
  • Marketing/SEO
  • No
  • Does not respect * directives
  • Marketing
  • The example for img2dataset, although the default is *None*
  • Brandwatch - "AI to discover new trends"
  • webz.io - they sell data for training LLMs.
  • Items below were sourced from darkvisitors.com
  • Categories included: "AI Data Scraper", "AI Assistant", "AI Search Crawler", "Undocumented AI Agent"
  • AI Search Crawler
  • https://darkvisitors.com/agents/amazonbot
  • Undocumented AI Agent
  • https://darkvisitors.com/agents/anthropic-ai
  • AI Search Crawler
  • https://darkvisitors.com/agents/applebot
  • AI Data Scraper
  • https://darkvisitors.com/agents/applebot-extended
  • AI Data Scraper
  • https://darkvisitors.com/agents/bytespider
  • AI Data Scraper
  • https://darkvisitors.com/agents/ccbot
  • AI Assistant
  • https://darkvisitors.com/agents/chatgpt-user
  • Undocumented AI Agent
  • https://darkvisitors.com/agents/claude-web
  • AI Data Scraper
  • https://darkvisitors.com/agents/claudebot
  • Undocumented AI Agent
  • https://darkvisitors.com/agents/cohere-ai
  • AI Data Scraper
  • https://darkvisitors.com/agents/diffbot
  • AI Data Scraper
  • https://darkvisitors.com/agents/facebookbot
  • AI Data Scraper
  • https://darkvisitors.com/agents/google-extended
  • AI Data Scraper
  • https://darkvisitors.com/agents/gptbot
  • AI Data Scraper
  • https://darkvisitors.com/agents/omgili
  • AI Search Crawler
  • https://darkvisitors.com/agents/perplexitybot
  • AI Search Crawler
  • https://darkvisitors.com/agents/youbot
  • ...................../´¯¯/)
  • ...................,/¯.../ +----------------------------------------+
  • .................../..../ | |
  • .............../´¯/'..'/´¯¯`·¸ | To the creators of the shitbots above: |
  • .........../'/.../..../....../¨¯\ | |
  • ..........('(....´...´... ¯~/'..') | FUCK YOU. |
  • ...........\..............'...../ | TOTAL COMMERCIAL WEB DEATH. |
  • ............\....\.........._.·´ | |
  • .............\..............( +----------------------------------------+
  • ..............\..............\
  • The thing is that you know online hosting is NOT free.
  • Yet you send requests to our servers and scraping our data without consent.
  • By doing so, you add a lot of unnecessary work for us to block your bots.
  • You're a disgrace. You are the reason the web is shit.
  • You made the people being afraid of expressing themselves online.
  • Congratulations. Enjoy your enshittified web until it collapses.
  • This file is loosely based on the robots.txt file of sr.ht
  • based off the robots.txt belonging to getimiskon
  • additions from https://github.com/healsdata/ai-training-opt-out/blob/main/robots.txt and https://darkvisitors.com/
  • to all of you: thank you