gn-online.de
robots.txt

Robots Exclusion Standard data for gn-online.de

Resource Scan

Scan Details

Site Domain gn-online.de
Base Domain gn-online.de
Scan Status Ok
Last Scan2024-06-28T03:39:23+00:00
Next Scan 2024-07-05T03:39:23+00:00

Last Scan

Scanned2024-06-28T03:39:23+00:00
URL https://gn-online.de/robots.txt
Redirect https://www.gn-online.de/robots.txt
Redirect Domain www.gn-online.de
Redirect Base gn-online.de
Domain IPs 217.182.187.117
Redirect IPs 217.182.187.117
Response IP 217.182.187.117
Found Yes
Hash 07467a8ee547029093ae8b1ed1c609793ee99eff79090da41641e94b04dd6454
SimHash 33731d10cfaf

Groups

gptbot

Rule Path
Disallow /
Disallow /

ccbot

Rule Path
Disallow /

*

Rule Path
Disallow /User
Disallow /Dateien
Disallow /Nachrichten/Suche
Disallow /ScriptResource
Disallow /WebResource

Other Records

Field Value
crawl-delay 2

Other Records

Field Value
sitemap http://www.gn-online.de/Sitemap_Index.xml.gz

Comments

  • Robots.txt for crawler
  • Disallow Crawler
  • Crawler often creates invalid script/webresource resource request
  • Max crawler Time per page in sec
  • Sitemap
  • Legal notice: gn-online.de expressly reserves the right to use its content for commercial text and data mining (� 44b UrhG).
  • The use of robots or other automated means to access gn-online.de or collect or mine data without the express permission of gn-online.de is strictly prohibited.
  • If you would like to apply for permission to crawl gn-online.de, collect or use data, please contact datenschutz@gn-online.de

Warnings

  • `user agent` is not a known field.