taz-bremen.de
robots.txt

Robots Exclusion Standard data for taz-bremen.de

Resource Scan

Scan Details

Site Domain taz-bremen.de
Base Domain taz-bremen.de
Scan Status Ok
Last Scan2024-09-28T04:44:04+00:00
Next Scan 2024-10-05T04:44:04+00:00

Last Scan

Scanned2024-09-28T04:44:04+00:00
URL https://taz-bremen.de/robots.txt
Redirect https://taz.de/robots.txt
Redirect Domain taz.de
Redirect Base taz.de
Domain IPs 193.104.220.23
Redirect IPs 193.104.220.23, 2001:67c:13c::7a2:de
Response IP 193.104.220.23
Found Yes
Hash d90390ae043e9c37c8217e9188ee651329867f568c2dec81c3eb87ddc5305f48
SimHash 70200d58c99d

Groups

slurp

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 60

*

Rule Path
Disallow /openads
Disallow /48f6543196cbbdefb88f247a0a8e4375

gptbot

Rule Path
Disallow /
Disallow /

Other Records

Field Value
sitemap https://taz.de/sitemap-google-news.xml
sitemap https://taz.de/sitemap-index.xml

Comments

  • as per https://platform.openai.com/docs/gptbot
  • Legal notice: taz.de expressly reserves the right to use its content for commercial text and data mining (ยง 44 b UrhG).
  • The use of robots or other automated means to access taz.de or collect or mine data without the express permission of taz.de is strictly prohibited.
  • taz.de may, in its discretion, permit certain automated access to certain taz.de pages.
  • If you would like to apply for permission to crawl taz.de, collect or use data, please email lizenzen@taz.de.

Warnings

  • `useragent` is not a known field.