tagblatt.de
robots.txt

Robots Exclusion Standard data for tagblatt.de

Resource Scan

Scan Details

Site Domain tagblatt.de
Base Domain tagblatt.de
Scan Status Failed
Failure StageFetching resource.
Failure ReasonServer returned a client error.
Last Scan2024-10-06T17:56:39+00:00
Next Scan 2024-10-13T17:56:39+00:00

Last Successful Scan

Scanned2024-09-28T17:56:24+00:00
URL https://www.tagblatt.de/robots.txt
Domain IPs 217.182.184.195
Response IP 217.182.184.195
Found Yes
Hash a2e66160ba0e05e8f61e1074b45e4e53b31033e08310e39a5ca2ca5abb559c4d
SimHash 51495d50c734

Groups

*

Rule Path
Disallow /User
Disallow /Dateien
Disallow /Nachrichten/Suche
Disallow /ScriptResource
Disallow /WebResource
Disallow /Verlag/Datenschutz
Disallow /Marktplatz
Disallow /Verlag/OAA-gesperrt

Other Records

Field Value
crawl-delay 2

Comments

  • Robots.txt for crawler
  • Disallow Crawler
  • Crawler often creates invalid script/webresource resource request
  • Max crawler Time per page in sec
  • Sitemap
  • Sitemap: https://www.tagblatt.de/Sitemap_Index.xml.gz