papilio.cz
robots.txt

Robots Exclusion Standard data for papilio.cz

Archived Snapshots

Resource Scan

Scan Details

Site Domain	papilio.cz
Base Domain	papilio.cz
Scan Status	Ok
Last Scan	2026-03-04T01:51:59+00:00
Next Scan	2026-04-03T01:51:59+00:00

Last Scan

Scanned	2026-03-04T01:51:59+00:00
URL	https://papilio.cz/robots.txt
Redirect	https://www.papilio.cz/robots.txt
Redirect Domain	www.papilio.cz
Redirect Base	papilio.cz
Domain IPs	104.21.0.103, 172.67.150.223, 2606:4700:3032::6815:67, 2606:4700:3036::ac43:96df
Redirect IPs	104.21.0.103, 172.67.150.223, 2606:4700:3032::6815:67, 2606:4700:3036::ac43:96df
Response IP	172.67.150.223
Found	Yes
Hash	eefaebf9d882f8776de7d65399f1b0f607b8bfa7e870e175b33c230848d148ce
SimHash	641091489534

Groups

*

Rule	Path
Disallow

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot
claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

Disallow AI bots from crawling (adapted from https://seonorth.ca/ai/block-content-scrapers/ and https://darkvisitors.com)
Adapted from https://seonorth.ca/ai/block-content-scrapers/
Common Crawl's bot - Common Crawl is one of the largest public datasets used by AI for training, with ChatGPT, Bard and other large language models.
ChatGPT Bot - bot used when a ChatGPT user instructs it to reference your website.
OpenAI API - bot that OpenAI specifically uses to collect bulk training data from your website for ChatGPT.
Google Bard and VertexAI. This will not have an impact on Google Search indexing. This will not affect GoogleBot crawling.
Anthropic AI Bot
Claude Bot run by Anthropic
Cohere AI Bot - unconfirmed bot believed to be associated with Cohereâs chatbot.
OMGilibot - They sell data for training LLMs (large language models)
Omgili (Oh My God I Love It)
Perplexity AI
KUKA's youBot
Diffbot - somewhat dishonest scraping bot used to collect data to train LLMs.
Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok
ImagesiftBot is billed as a reverse image search tool, but it's associated with The Hive, a company that produces models for image generation.
Social Media Bots
Amazon Bot - enabling Alexa to answer even more questions for customers.
Apple Bot - collects website data for its Siri and Spotlight services.
Metaâs bot that crawls public web pages to improve language models for their speech recognition technology.

papilio.czrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ccbot

chatgpt-user

gptbot

google-extended

anthropic-ai

claudebotclaude-web

cohere-ai

omgilibot

omgili

perplexitybot

youbot

diffbot

bytespider

imagesiftbot

amazonbot

applebot

facebookbot

Comments

papilio.cz
robots.txt

claudebot
claude-web