tageblatt.de
robots.txt

Robots Exclusion Standard data for tageblatt.de

Archived Snapshots

Resource Scan

Scan Details

Site Domain	tageblatt.de
Base Domain	tageblatt.de
Scan Status	Ok
Last Scan	2024-05-29T09:46:50+00:00
Next Scan	2024-06-05T09:46:50+00:00

Last Scan

Scanned	2024-05-29T09:46:50+00:00
URL	https://tageblatt.de/robots.txt
Redirect	https://www.tageblatt.de/robots.txt
Redirect Domain	www.tageblatt.de
Redirect Base	tageblatt.de
Domain IPs	213.182.13.36
Redirect IPs	51.77.171.42
Response IP	51.77.171.42
Found	Yes
Hash	67ba98e69aeb89d1a5e622194fd26a2e21e5a9af1a84331714e3cc7c6d57c6ec
SimHash	135d5950c334

Groups

*

Rule	Path
Disallow	/User
Disallow	/Dateien
Disallow	/Nachrichten/Suche
Disallow	/ScriptResource
Disallow	/WebResource

Rule

Path

Disallow

/User

Disallow

/Dateien

Disallow

/Nachrichten/Suche

Disallow

/ScriptResource

Disallow

/WebResource

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

/

gptbot

Rule	Path
Disallow	/
Disallow	/

Rule

Path

Disallow

/

Disallow

/

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Other Records

Field	Value
crawl-delay	2

Field

Value

crawl-delay

2

Back to top

Comments

Robots.txt for crawler
Disallow Crawler
Crawler often creates invalid script/webresource resource request
Max crawler Time per page in sec
Sitemap
Sitemap: http://www.funkinform.de/Sitemap_Index.xml.gz

Back to top

Warnings

`user agent` is not a known field.

Back to top

tageblatt.derobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

google-extended

gptbot

ccbot

Other Records

Comments

Warnings

tageblatt.de
robots.txt