hahn-schickard.de
robots.txt

Robots Exclusion Standard data for hahn-schickard.de

Resource Scan

Scan Details

Site Domain hahn-schickard.de
Base Domain hahn-schickard.de
Scan Status Ok
Last Scan2025-10-01T16:00:46+00:00
Next Scan 2025-10-31T16:00:46+00:00

Last Scan

Scanned2025-10-01T16:00:46+00:00
URL https://hahn-schickard.de/robots.txt
Redirect https://www.hahn-schickard.de/robots.txt
Redirect Domain www.hahn-schickard.de
Redirect Base hahn-schickard.de
Domain IPs 134.119.224.166, 2a00:116a:107:c540::
Redirect IPs 134.119.224.166, 2a00:116a:107:c540::
Response IP 134.119.224.166
Found Yes
Hash 56d6466ca4c06e2d9e5dc82dd9be2e0b5b4e4c3cb0a77b3daa39d62feb70113d
SimHash 6030f5428d36

Groups

alibababot

Rule Path
Disallow /

friendlycrawler

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

alphacode

Rule Path
Disallow /

claude

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

deepmindbot

Rule Path
Disallow /

chinchilla

Rule Path
Disallow /

flamingo

Rule Path
Disallow /

gopher

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

huggingfacebot

Rule Path
Disallow /

img2dataset

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

facebot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt

Rule Path
Disallow /

openai

Rule Path
Disallow /

gpt-3

Rule Path
Disallow /

gpt-4

Rule Path
Disallow /

gpt-5

Rule Path
Disallow /

peer39_crawler

Rule Path
Disallow /

peer39_crawler/1.0

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

tencentbot

Rule Path
Disallow /

hunyuanaide

Rule Path
Disallow /

twitterbot

Rule Path
Disallow /

xai

Rule Path
Disallow /

grok

Rule Path
Disallow /

grokbot

Rule Path
Disallow /

grokai

Rule Path
Disallow /

youbot

Rule Path
Disallow /

*

Rule Path
Disallow /logs/
Disallow /restricted/
Disallow /fileadmin/_temp_/
Disallow /fileadmin/user_upload/
Disallow /fileadmin/typoscript/
Disallow /fileadmin/yag/
Disallow /t3lib/
Disallow /typo3/
Disallow /typo3_src/
Disallow /typo3conf/
Disallow /typo3temp/
Disallow /clear.gif
Disallow *type%3D98
Disallow *type%3D0
Disallow /powermail/
Disallow *jumpurl%3D*
Allow /typo3/sysext/frontend/Resources/Public/*
Allow /typo3conf/ext/bb_templates/Resources/Public/*
Disallow /suche/
Disallow /*tx_powermail_pi1
Disallow /*tx_solr

Other Records

Field Value
sitemap https://www.hahn-schickard.de/sitemap.xml

Comments

  • Alibaba - Chinese e-commerce company investing in AI
  • Amazon - AI search crawler that collects data for Alexa
  • User-agent: Amazonbot
  • Disallow: /
  • Amazon - Bot with unknown purpose linked to Amazon
  • Anthropic - Claude bot used to collect training data for Anthropic LLMs
  • Anthropic - Other Anthropic related bots
  • Apple Bot - AI search crawler that collects website data for Apple, including Siri and Apple Intelligence services.
  • User-agent: Applebot
  • Disallow: /
  • Baidu - Chinese tech giant developing AI models like ERNIE
  • Bytespider - AI data scraper operated by TikTok's parent company ByteDance, and developer of the ChatGPT competitor Doubao.
  • Cohere AI Bot - AI data scraper bot for Cohere's AI chatbot
  • Common Crawl - AI data scraper for a large public dataset used for training LLMs
  • DeepMind - Models operated by AI research company DeepMind owned by Alphabet (Google)
  • Diffbot - AI data scraper bot used to collect and sell website data
  • Google - Google-Extended is an AI data scraper for Gemini and Vertex AI (Blocking this will not impact Google Search indexing)
  • Google - Bots for ads, media and potentially other AI projects.
  • User-agent: Mediapartners-Google
  • Disallow: /
  • User-agent: GoogleOther
  • Disallow: /
  • User-agent: AdsBot-Google
  • Disallow: /
  • User-agent: Googlebot-Image
  • Disallow: /
  • Hugging Face - Provider of open-source NLP models and tools
  • img2dataset
  • Used by SD, Midjourney, OpenAI, and others to scrape images
  • ImagesiftBot - Reverse image search tool and AI image generator (The Hive)
  • Meta (Facebook) - FacebookBot is an AI data scraper used to collect speech recognition training data
  • Meta (Facebook) - Other bots
  • Omgili (Oh My God I Love It) - AI data scraper from Webz.io that collects and sells data to train AI models
  • OpenAI - AI assistant bot used to gather responses to user prompts
  • OpenAI - AI data scraper that collects data for OpenAI tools like ChatGPT
  • OpenAI - Other bots potentially connected to ChatGPT and OpenAI.
  • Peer39 - Programmatic ad crawler
  • Perplexity AI - AI search crawler for Perplexity search results
  • PiplBot - People search and information aggregation bot
  • Tencent - Unconfirmed bots from Chinese tech conglomerate developing AI applications
  • X (Twitter) - Fetcher bot used to index the content of any given URL
  • X - Unconfirmed bots connected to X
  • YouBot - AI search crawler used by You.com to index search results