gew.de
robots.txt

Robots Exclusion Standard data for gew.de

Archived Snapshots

Resource Scan

Scan Details

Site Domain	gew.de
Base Domain	gew.de
Scan Status	Ok
Last Scan	2025-11-23T14:31:47+00:00
Next Scan	2025-12-23T14:31:47+00:00

Last Scan

Scanned	2025-11-23T14:31:47+00:00
URL	https://gew.de/robots.txt
Redirect	https://www.gew.de/robots.txt
Redirect Domain	www.gew.de
Redirect Base	gew.de
Domain IPs	134.119.0.57
Redirect IPs	134.119.0.57
Response IP	134.119.0.57
Found	Yes
Hash	4671e92695e6b1e273c98252bc18822ac722ff3b7d3872aa169de4256f5c6cd4
SimHash	6012f5428536

Groups

alibababot

Rule	Path
Disallow	/

Rule

Path

Disallow

friendlycrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

alphacode

Rule	Path
Disallow	/

Rule

Path

Disallow

claude

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

deepmindbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chinchilla

Rule	Path
Disallow	/

Rule

Path

Disallow

flamingo

Rule	Path
Disallow	/

Rule

Path

Disallow

gopher

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

huggingfacebot

Rule	Path
Disallow	/

Rule

Path

Disallow

img2dataset

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt

Rule	Path
Disallow	/

Rule

Path

Disallow

openai

Rule	Path
Disallow	/

Rule

Path

Disallow

gpt-3

Rule	Path
Disallow	/

Rule

Path

Disallow

gpt-4

Rule	Path
Disallow	/

Rule

Path

Disallow

gpt-5

Rule	Path
Disallow	/

Rule

Path

Disallow

peer39_crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

peer39_crawler/1.0

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

piplbot

Rule	Path
Disallow	/

Rule

Path

Disallow

tencentbot

Rule	Path
Disallow	/

Rule

Path

Disallow

hunyuanaide

Rule	Path
Disallow	/

Rule

Path

Disallow

twitterbot

Rule	Path
Disallow	/

Rule

Path

Disallow

xai

Rule	Path
Disallow	/

Rule

Path

Disallow

grok

Rule	Path
Disallow	/

Rule

Path

Disallow

grokbot

Rule	Path
Disallow	/

Rule

Path

Disallow

grokai

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

emailcollector

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/logs/
Disallow	/restricted/
Disallow	/fileadmin/_temp_/
Disallow	/fileadmin/user_upload/
Disallow	/fileadmin/typoscript/
Disallow	/fileadmin/yag/
Disallow	/fileadmin/media/images/be
Disallow	/fileadmin/media/images/bw
Disallow	/fileadmin/media/images/by
Disallow	/fileadmin/media/images/hb
Disallow	/fileadmin/media/images/hv
Disallow	/fileadmin/media/images/mv
Disallow	/fileadmin/media/images/nds
Disallow	/fileadmin/media/images/rlp
Disallow	/fileadmin/media/images/sh
Disallow	/fileadmin/media/images/sn
Disallow	/fileadmin/media/images/th
Disallow	/fileadmin/_processed_
Disallow	/t3lib/
Disallow	/typo3/
Disallow	/typo3_src/
Disallow	/typo3conf/
Disallow	/typo3temp/
Disallow	/reset.gif
Disallow	*type%3D98
Disallow	*type%3D0
Disallow	/powermail/
Disallow	/bayern/
Disallow	jumpurl%3D
Disallow	/suche/
Disallow	/*tx_powermail_pi1
Disallow	/*tx_solr
Disallow	FE_SESSION_KEY%3D
Disallow	juhash
Allow	/typo3/sysext/frontend/Resources/Public/*

Rule

Path

Disallow

/logs/

Disallow

/restricted/

Disallow

/fileadmin/_temp_/

Disallow

/fileadmin/user_upload/

Disallow

/fileadmin/typoscript/

Disallow

/fileadmin/yag/

Disallow

/fileadmin/media/images/be

Disallow

/fileadmin/media/images/bw

Disallow

/fileadmin/media/images/by

Disallow

/fileadmin/media/images/hb

Disallow

/fileadmin/media/images/hv

Disallow

/fileadmin/media/images/mv

Disallow

/fileadmin/media/images/nds

Disallow

/fileadmin/media/images/rlp

Disallow

/fileadmin/media/images/sh

Disallow

/fileadmin/media/images/sn

Disallow

/fileadmin/media/images/th

Disallow

/fileadmin/_processed_

Disallow

/t3lib/

Disallow

/typo3/

Disallow

/typo3_src/

Disallow

/typo3conf/

Disallow

/typo3temp/

Disallow

/reset.gif

Disallow

*type%3D98

Disallow

*type%3D0

Disallow

/powermail/

Disallow

/bayern/

Disallow

*jumpurl%3D*

Disallow

/suche/

Disallow

/*tx_powermail_pi1

Disallow

/*tx_solr

Disallow

*FE_SESSION_KEY%3D*

Disallow

*juhash*

Allow

/typo3/sysext/frontend/Resources/Public/*

Other Records

Field	Value
sitemap	https://www.gew.de/sitemap.xml

Field

Value

sitemap

https://www.gew.de/sitemap.xml

Comments

Alibaba - Chinese e-commerce company investing in AI
Amazon - AI search crawler that collects data for Alexa
User-agent: Amazonbot
Disallow: /
Amazon - Bot with unknown purpose linked to Amazon
Anthropic - Claude bot used to collect training data for Anthropic LLMs
Anthropic - Other Anthropic related bots
Apple Bot - AI search crawler that collects website data for Apple, including Siri and Apple Intelligence services.
User-agent: Applebot
Disallow: /
Baidu - Chinese tech giant developing AI models like ERNIE
Bytespider - AI data scraper operated by TikTok's parent company ByteDance, and developer of the ChatGPT competitor Doubao.
Cohere AI Bot - AI data scraper bot for Cohere's AI chatbot
Common Crawl - AI data scraper for a large public dataset used for training LLMs
DeepMind - Models operated by AI research company DeepMind owned by Alphabet (Google)
Diffbot - AI data scraper bot used to collect and sell website data
Google - Google-Extended is an AI data scraper for Gemini and Vertex AI (Blocking this will not impact Google Search indexing)
Google - Bots for ads, media and potentially other AI projects.
User-agent: Mediapartners-Google
Disallow: /
User-agent: GoogleOther
Disallow: /
User-agent: AdsBot-Google
Disallow: /
User-agent: Googlebot-Image
Disallow: /
Hugging Face - Provider of open-source NLP models and tools
img2dataset
Used by SD, Midjourney, OpenAI, and others to scrape images
ImagesiftBot - Reverse image search tool and AI image generator (The Hive)
Meta (Facebook) - FacebookBot is an AI data scraper used to collect speech recognition training data
Meta (Facebook) - Other bots
Omgili (Oh My God I Love It) - AI data scraper from Webz.io that collects and sells data to train AI models
OpenAI - AI assistant bot used to gather responses to user prompts
OpenAI - AI data scraper that collects data for OpenAI tools like ChatGPT
OpenAI - Other bots potentially connected to ChatGPT and OpenAI.
Peer39 - Programmatic ad crawler
Perplexity AI - AI search crawler for Perplexity search results
PiplBot - People search and information aggregation bot
Tencent - Unconfirmed bots from Chinese tech conglomerate developing AI applications
X (Twitter) - Fetcher bot used to index the content of any given URL
X - Unconfirmed bots connected to X
YouBot - AI search crawler used by You.com to index search results

gew.derobots.txt

Resource Scan

Scan Details

Last Scan

Groups

alibababot

friendlycrawler

claudebot

claude-web

anthropic-ai

alphacode

claude

baiduspider

bytespider

cohere-ai

ccbot

deepmindbot

chinchilla

flamingo

gopher

diffbot

google-extended

huggingfacebot

img2dataset

imagesiftbot

facebookbot

facebot

omgili

omgilibot

chatgpt-user

gptbot

chatgpt

openai

gpt-3

gpt-4

gpt-5

peer39_crawler

peer39_crawler/1.0

perplexitybot

piplbot

tencentbot

hunyuanaide

twitterbot

xai

grok

grokbot

grokai

youbot

emailcollector

*

Other Records

Comments

gew.de
robots.txt