tu.berlin
robots.txt

Robots Exclusion Standard data for tu.berlin

Resource Scan

Scan Details

Site Domain tu.berlin
Base Domain tu.berlin
Scan Status Ok
Last Scan2024-11-07T18:15:36+00:00
Next Scan 2024-12-07T18:15:36+00:00

Last Scan

Scanned2024-11-07T18:15:36+00:00
URL https://tu.berlin/robots.txt
Redirect https://www.tu.berlin/robots.txt
Redirect Domain www.tu.berlin
Redirect Base tu.berlin
Domain IPs 141.23.73.70
Redirect IPs 141.23.73.70
Response IP 141.23.73.70
Found Yes
Hash e5501c120138cb96ca976362e167ab5ddd91bbe2d8843b81c0099ee7a81eb73d
SimHash 700033b29760

Groups

*

Rule Path Comment
Allow / -
Disallow /typo3/ -
Disallow /typo3conf/ -
Allow /typo3conf/ext/ -
Allow /typo3temp/ -
Disallow /*?id=* non speaking URLs
Disallow /*%26id%3D* non speaking URLs
Disallow /*tx_solr* search parameters
Disallow /*cHash no cHash
Disallow /*tx_powermail_pi1 no powermail thanks pages
Disallow /*tx_tubdownloadlist* Download lists
Disallow /*tx_tubstudypaths_studypathlist* Studypath filtering
Disallow /*tx_tubevents_event Events filtering
Disallow /*tx_tubbasepackage_protectedpagelogin Protected page filtering
Disallow /Shibboleth.sso -

amazonbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

scrapy

Rule Path
Disallow /

Comments

  • folders
  • parameters
  • Shibboleth
  • Disallow rules for bots
  • https://developer.amazon.com/support/amazonbot
  • ByteDance
  • http://commoncrawl.org
  • http://openai.com/bot
  • http://openai.com/gptbot
  • ClaudeBot
  • https://brandwatch.com/legal/magpie-crawler/
  • https://webz.io/
  • https://www.zyte.com/