tageblatt.de
robots.txt

Robots Exclusion Standard data for tageblatt.de

Resource Scan

Scan Details

Site Domain tageblatt.de
Base Domain tageblatt.de
Scan Status Ok
Last Scan2024-05-29T09:46:50+00:00
Next Scan 2024-06-05T09:46:50+00:00

Last Scan

Scanned2024-05-29T09:46:50+00:00
URL https://tageblatt.de/robots.txt
Redirect https://www.tageblatt.de/robots.txt
Redirect Domain www.tageblatt.de
Redirect Base tageblatt.de
Domain IPs 213.182.13.36
Redirect IPs 51.77.171.42
Response IP 51.77.171.42
Found Yes
Hash 67ba98e69aeb89d1a5e622194fd26a2e21e5a9af1a84331714e3cc7c6d57c6ec
SimHash 135d5950c334

Groups

*

Rule Path
Disallow /User
Disallow /Dateien
Disallow /Nachrichten/Suche
Disallow /ScriptResource
Disallow /WebResource

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /
Disallow /

ccbot

Rule Path
Disallow /

Other Records

Field Value
crawl-delay 2

Comments

  • Robots.txt for crawler
  • Disallow Crawler
  • Crawler often creates invalid script/webresource resource request
  • Max crawler Time per page in sec
  • Sitemap
  • Sitemap: http://www.funkinform.de/Sitemap_Index.xml.gz

Warnings

  • `user agent` is not a known field.