omada.cafe
robots.txt

Robots Exclusion Standard data for omada.cafe

Archived Snapshots

Resource Scan

Scan Details

Site Domain	omada.cafe
Base Domain	omada.cafe
Scan Status	Ok
Last Scan	2024-10-04T01:25:39+00:00
Next Scan	2024-11-03T01:25:39+00:00

Last Scan

Scanned	2024-10-04T01:25:39+00:00
URL	https://omada.cafe/robots.txt
Domain IPs	167.86.91.171
Response IP	167.86.91.171
Found	Yes
Hash	5771a2dd3eeaf5c0ec3598817791190bf0e9d132f56dca60f026782ceda1b8b2
SimHash	72b85d7924d5

Groups

*

Rule	Path
Allow	/$
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	2

Field

Value

crawl-delay

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot-sa

Rule	Path
Disallow	/

Rule

Path

Disallow

censysinspect

Rule	Path
Disallow	/

Rule

Path

Disallow

rogerbot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

aspiegelbot

Rule	Path
Disallow	/

Rule

Path

Disallow

zoominfobot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

dataforseobot

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitin

Rule	Path
Disallow	/

Rule

Path

Disallow

seekport crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

serpstatbot

Rule	Path
Disallow	/

Rule

Path

Disallow

img2dataset

Rule	Path
Disallow	/

Rule

Path

Disallow

magpie-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

Welcome to robots.txt, the place where shunning bots is encouraged.
Humans are welcome to read. Bots are welcome to follow.
Policy
Allowed:
- Search engine indexers (even google, though I hate it)
- RSS Aggreggators (unless too aggressive)
- Archival services
- Fediverse federation stuff
Disallowed:
- Marketing or SEO crawlers
- Agressive and annoying bots
- Honeypots
If your piece of sloppy code gets in this list, you contribute to the
enshittification of the web and you should fuck off. Also stay the fuck
away from me and my data, as well as from the users I host here.
If your piece of shit software doesn't respect robots.txt, your IP will be blocked.
If you have any questions, reach out to fluffery at autistici dot org.
file was originally made by getimiskon at disroot dot org
+-------------------+
| |
| HALL OF SHAME |
| |
+-------------------+
Marketing/SEO cancer
I swear, I have to block this one from my Nginx settings, Fuck you.
Search crawler
Marketing/SEO cancer
Marketing/SEO cancer
'Threat hunting' bullshit
Marketing/SEO
Huwei something or another, badly behaved
Marketing/SEO
YandexBot is a dickhead, too aggressive
Marketing/SEO
Marketing/SEO
No
Does not respect * directives
Marketing
The example for img2dataset, although the default is *None*
Brandwatch - "AI to discover new trends"
webz.io - they sell data for training LLMs.
Items below were sourced from darkvisitors.com
Categories included: "AI Data Scraper", "AI Assistant", "AI Search Crawler", "Undocumented AI Agent"
AI Search Crawler
https://darkvisitors.com/agents/amazonbot
Undocumented AI Agent
https://darkvisitors.com/agents/anthropic-ai
AI Search Crawler
https://darkvisitors.com/agents/applebot
AI Data Scraper
https://darkvisitors.com/agents/applebot-extended
AI Data Scraper
https://darkvisitors.com/agents/bytespider
AI Data Scraper
https://darkvisitors.com/agents/ccbot
AI Assistant
https://darkvisitors.com/agents/chatgpt-user
Undocumented AI Agent
https://darkvisitors.com/agents/claude-web
AI Data Scraper
https://darkvisitors.com/agents/claudebot
Undocumented AI Agent
https://darkvisitors.com/agents/cohere-ai
AI Data Scraper
https://darkvisitors.com/agents/diffbot
AI Data Scraper
https://darkvisitors.com/agents/facebookbot
AI Data Scraper
https://darkvisitors.com/agents/google-extended
AI Data Scraper
https://darkvisitors.com/agents/gptbot
AI Data Scraper
https://darkvisitors.com/agents/omgili
AI Search Crawler
https://darkvisitors.com/agents/perplexitybot
AI Search Crawler
https://darkvisitors.com/agents/youbot
...................../Â´Â¯Â¯/)
...................,/Â¯.../ +----------------------------------------+
.................../..../ | |
.............../Â´Â¯/'..'/Â´Â¯Â¯`Â·Â¸ | To the creators of the shitbots above: |
.........../'/.../..../....../Â¨Â¯\ | |
..........('(....Â´...Â´... Â¯~/'..') | FUCK YOU. |
...........\..............'...../ | TOTAL COMMERCIAL WEB DEATH. |
............\....\.........._.Â·Â´ | |
.............\..............( +----------------------------------------+
..............\..............\
The thing is that you know online hosting is NOT free.
Yet you send requests to our servers and scraping our data without consent.
By doing so, you add a lot of unnecessary work for us to block your bots.
You're a disgrace. You are the reason the web is shit.
You made the people being afraid of expressing themselves online.
Congratulations. Enjoy your enshittified web until it collapses.
This file is loosely based on the robots.txt file of sr.ht
based off the robots.txt belonging to getimiskon
additions from https://github.com/healsdata/ai-training-opt-out/blob/main/robots.txt and https://darkvisitors.com/
to all of you: thank you

omada.caferobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

ahrefsbot

imagesiftbot

dotbot

dotbot

semrushbot

semrushbot-sa

censysinspect

rogerbot

blexbot

aspiegelbot

zoominfobot

yandex

mj12bot

dataforseobot

turnitinbot

turnitin

seekport crawler

serpstatbot

img2dataset

magpie-crawler

omgilibot

amazonbot

anthropic-ai

applebot

applebot-extended

bytespider

ccbot

chatgpt-user

claude-web

claudebot

cohere-ai

diffbot

facebookbot

google-extended

gptbot

omgili

perplexitybot

youbot

Comments

omada.cafe
robots.txt