murrhardter-zeitung.de
robots.txt

Robots Exclusion Standard data for murrhardter-zeitung.de

Resource Scan

Scan Details

Site Domain murrhardter-zeitung.de
Base Domain murrhardter-zeitung.de
Scan Status Ok
Last Scan2024-10-06T12:28:20+00:00
Next Scan 2024-10-13T12:28:20+00:00

Last Scan

Scanned2024-10-06T12:28:20+00:00
URL https://murrhardter-zeitung.de/robots.txt
Redirect https://www.murrhardter-zeitung.de/robots.txt
Redirect Domain www.murrhardter-zeitung.de
Redirect Base murrhardter-zeitung.de
Domain IPs 217.182.184.199
Redirect IPs 217.182.184.199
Response IP 217.182.184.199
Found Yes
Hash b447d531dbcad76a54178598cd2fb2e98767f5a60d9b8f8c5e38e52a593a869e
SimHash 31495d10c72c

Groups

*

Rule Path
Disallow /User
Disallow /Dateien
Disallow /Nachrichten/Suche
Disallow /ScriptResource
Disallow /WebResource

Other Records

Field Value
crawl-delay 2

Other Records

Field Value
sitemap https://www.bkz.de/Sitemap_Index.xml.gz

Comments

  • Robots.txt for crawler
  • Disallow Crawler
  • Crawler often creates invalid script/webresource resource request
  • Max crawler Time per page in sec
  • Sitemap