web.de
robots.txt

Robots Exclusion Standard data for web.de

Resource Scan

Scan Details

Site Domain web.de
Base Domain web.de
Scan Status Ok
Last Scan2025-12-02T00:17:47+00:00
Next Scan 2025-12-09T00:17:47+00:00

Last Scan

Scanned2025-12-02T00:17:47+00:00
URL https://web.de/robots.txt
Domain IPs 82.165.229.138, 82.165.229.83
Response IP 82.165.229.138
Found Yes
Hash eb6a0824d451b64dbb4049184c98a166b92c099be75d92dcf252c9ed343ad041
SimHash f11a8b22c136

Groups

*

Rule Path
Disallow /deals/
Disallow /test/

googlebot-news

Rule Path
Disallow /
Disallow /magazine/*/thema/
Allow /magazine/
Allow /amp/
Allow /$

ai2bot
ai2bot
amazonbot
applebot-extended
ccbot
cincraw
claudebot
cohere-ai
diffbot
friendlycrawler
gptbot
imagesiftbot
img2dataset
meta-externalagent
petalbot
semanticbot
timpibot
velenpublicwebcrawler
yandex

Rule Path
Disallow /magazine/
Allow /magazine/in-eigener-sache/
Allow /magazine/unicef/
Allow /magazine/so-arbeitet-die-redaktion/