papilio.cz
robots.txt

Robots Exclusion Standard data for papilio.cz

Resource Scan

Scan Details

Site Domain papilio.cz
Base Domain papilio.cz
Scan Status Ok
Last Scan2026-03-04T01:51:59+00:00
Next Scan 2026-04-03T01:51:59+00:00

Last Scan

Scanned2026-03-04T01:51:59+00:00
URL https://papilio.cz/robots.txt
Redirect https://www.papilio.cz/robots.txt
Redirect Domain www.papilio.cz
Redirect Base papilio.cz
Domain IPs 104.21.0.103, 172.67.150.223, 2606:4700:3032::6815:67, 2606:4700:3036::ac43:96df
Redirect IPs 104.21.0.103, 172.67.150.223, 2606:4700:3032::6815:67, 2606:4700:3036::ac43:96df
Response IP 172.67.150.223
Found Yes
Hash eefaebf9d882f8776de7d65399f1b0f607b8bfa7e870e175b33c230848d148ce
SimHash 641091489534

Groups

*

Rule Path
Disallow

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claudebot
claude-web

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

applebot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

Comments

  • Disallow AI bots from crawling (adapted from https://seonorth.ca/ai/block-content-scrapers/ and https://darkvisitors.com)
  • Adapted from https://seonorth.ca/ai/block-content-scrapers/
  • Common Crawl's bot - Common Crawl is one of the largest public datasets used by AI for training, with ChatGPT, Bard and other large language models.
  • ChatGPT Bot - bot used when a ChatGPT user instructs it to reference your website.
  • OpenAI API - bot that OpenAI specifically uses to collect bulk training data from your website for ChatGPT.
  • Google Bard and VertexAI. This will not have an impact on Google Search indexing. This will not affect GoogleBot crawling.
  • Anthropic AI Bot
  • Claude Bot run by Anthropic
  • Cohere AI Bot - unconfirmed bot believed to be associated with Cohere’s chatbot.
  • OMGilibot - They sell data for training LLMs (large language models)
  • Omgili (Oh My God I Love It)
  • Perplexity AI
  • KUKA's youBot
  • Diffbot - somewhat dishonest scraping bot used to collect data to train LLMs.
  • Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok
  • ImagesiftBot is billed as a reverse image search tool, but it's associated with The Hive, a company that produces models for image generation.
  • Social Media Bots
  • Amazon Bot - enabling Alexa to answer even more questions for customers.
  • Apple Bot - collects website data for its Siri and Spotlight services.
  • Meta’s bot that crawls public web pages to improve language models for their speech recognition technology.