gew.de
robots.txt

Robots Exclusion Standard data for gew.de

Resource Scan

Scan Details

Site Domain gew.de
Base Domain gew.de
Scan Status Ok
Last Scan2025-11-23T14:31:47+00:00
Next Scan 2025-12-23T14:31:47+00:00

Last Scan

Scanned2025-11-23T14:31:47+00:00
URL https://gew.de/robots.txt
Redirect https://www.gew.de/robots.txt
Redirect Domain www.gew.de
Redirect Base gew.de
Domain IPs 134.119.0.57
Redirect IPs 134.119.0.57
Response IP 134.119.0.57
Found Yes
Hash 4671e92695e6b1e273c98252bc18822ac722ff3b7d3872aa169de4256f5c6cd4
SimHash 6012f5428536

Groups

alibababot

Rule Path
Disallow /

friendlycrawler

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

alphacode

Rule Path
Disallow /

claude

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

deepmindbot

Rule Path
Disallow /

chinchilla

Rule Path
Disallow /

flamingo

Rule Path
Disallow /

gopher

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

huggingfacebot

Rule Path
Disallow /

img2dataset

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

facebot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt

Rule Path
Disallow /

openai

Rule Path
Disallow /

gpt-3

Rule Path
Disallow /

gpt-4

Rule Path
Disallow /

gpt-5

Rule Path
Disallow /

peer39_crawler

Rule Path
Disallow /

peer39_crawler/1.0

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

tencentbot

Rule Path
Disallow /

hunyuanaide

Rule Path
Disallow /

twitterbot

Rule Path
Disallow /

xai

Rule Path
Disallow /

grok

Rule Path
Disallow /

grokbot

Rule Path
Disallow /

grokai

Rule Path
Disallow /

youbot

Rule Path
Disallow /

emailcollector

Rule Path
Disallow /

*

Rule Path
Disallow /logs/
Disallow /restricted/
Disallow /fileadmin/_temp_/
Disallow /fileadmin/user_upload/
Disallow /fileadmin/typoscript/
Disallow /fileadmin/yag/
Disallow /fileadmin/media/images/be
Disallow /fileadmin/media/images/bw
Disallow /fileadmin/media/images/by
Disallow /fileadmin/media/images/hb
Disallow /fileadmin/media/images/hv
Disallow /fileadmin/media/images/mv
Disallow /fileadmin/media/images/nds
Disallow /fileadmin/media/images/rlp
Disallow /fileadmin/media/images/sh
Disallow /fileadmin/media/images/sn
Disallow /fileadmin/media/images/th
Disallow /fileadmin/_processed_
Disallow /t3lib/
Disallow /typo3/
Disallow /typo3_src/
Disallow /typo3conf/
Disallow /typo3temp/
Disallow /reset.gif
Disallow *type%3D98
Disallow *type%3D0
Disallow /powermail/
Disallow /bayern/
Disallow *jumpurl%3D*
Disallow /suche/
Disallow /*tx_powermail_pi1
Disallow /*tx_solr
Disallow *FE_SESSION_KEY%3D*
Disallow *juhash*
Allow /typo3/sysext/frontend/Resources/Public/*

Other Records

Field Value
sitemap https://www.gew.de/sitemap.xml

Comments

  • Alibaba - Chinese e-commerce company investing in AI
  • Amazon - AI search crawler that collects data for Alexa
  • User-agent: Amazonbot
  • Disallow: /
  • Amazon - Bot with unknown purpose linked to Amazon
  • Anthropic - Claude bot used to collect training data for Anthropic LLMs
  • Anthropic - Other Anthropic related bots
  • Apple Bot - AI search crawler that collects website data for Apple, including Siri and Apple Intelligence services.
  • User-agent: Applebot
  • Disallow: /
  • Baidu - Chinese tech giant developing AI models like ERNIE
  • Bytespider - AI data scraper operated by TikTok's parent company ByteDance, and developer of the ChatGPT competitor Doubao.
  • Cohere AI Bot - AI data scraper bot for Cohere's AI chatbot
  • Common Crawl - AI data scraper for a large public dataset used for training LLMs
  • DeepMind - Models operated by AI research company DeepMind owned by Alphabet (Google)
  • Diffbot - AI data scraper bot used to collect and sell website data
  • Google - Google-Extended is an AI data scraper for Gemini and Vertex AI (Blocking this will not impact Google Search indexing)
  • Google - Bots for ads, media and potentially other AI projects.
  • User-agent: Mediapartners-Google
  • Disallow: /
  • User-agent: GoogleOther
  • Disallow: /
  • User-agent: AdsBot-Google
  • Disallow: /
  • User-agent: Googlebot-Image
  • Disallow: /
  • Hugging Face - Provider of open-source NLP models and tools
  • img2dataset
  • Used by SD, Midjourney, OpenAI, and others to scrape images
  • ImagesiftBot - Reverse image search tool and AI image generator (The Hive)
  • Meta (Facebook) - FacebookBot is an AI data scraper used to collect speech recognition training data
  • Meta (Facebook) - Other bots
  • Omgili (Oh My God I Love It) - AI data scraper from Webz.io that collects and sells data to train AI models
  • OpenAI - AI assistant bot used to gather responses to user prompts
  • OpenAI - AI data scraper that collects data for OpenAI tools like ChatGPT
  • OpenAI - Other bots potentially connected to ChatGPT and OpenAI.
  • Peer39 - Programmatic ad crawler
  • Perplexity AI - AI search crawler for Perplexity search results
  • PiplBot - People search and information aggregation bot
  • Tencent - Unconfirmed bots from Chinese tech conglomerate developing AI applications
  • X (Twitter) - Fetcher bot used to index the content of any given URL
  • X - Unconfirmed bots connected to X
  • YouBot - AI search crawler used by You.com to index search results