koeln.de
robots.txt

Robots Exclusion Standard data for koeln.de

Resource Scan

Scan Details

Site Domain koeln.de
Base Domain koeln.de
Scan Status Ok
Last Scan2024-11-09T20:54:44+00:00
Next Scan 2024-11-16T20:54:44+00:00

Last Scan

Scanned2024-11-09T20:54:44+00:00
URL https://koeln.de/robots.txt
Redirect https://www.koeln.de/robots.txt
Redirect Domain www.koeln.de
Redirect Base koeln.de
Domain IPs 81.173.246.122
Redirect IPs 2001:4dd0:100:1023:80:15:0:1, 81.173.246.120
Response IP 81.173.246.120
Found Yes
Hash 90a9db5a32e3e4595dcf65c646dd7f3c96815b0be0a6bb09b720c61b5328f419
SimHash 249bb1102a75

Groups

googlebot
googlebot-mobile

Rule Path
Allow /*?page=
Allow /jobs/jobsuche.html?
Allow /misc/*.js
Allow /misc/*.css
Allow /misc/*.png
Allow /sites/*.js
Allow /sites/*.css
Disallow /*?
Disallow /aframe
Disallow /apps/kinokalender
Disallow /apps/wahlomat
Disallow /archiv
Disallow /artikel
Allow /bilder/data/pictures/*/normal
Allow /bilder/data/pictures/*/thumb
Disallow /bilder/data
Disallow /bilder/picture
Disallow /bilder/vorschau
Disallow /branchenfuehrer
Disallow /branchen/api
Disallow /cgi-bin
Disallow /cms
Disallow /content
Disallow /filter/
Disallow /forum
Disallow /hotspot
Disallow /index.php
Disallow /intern/
Disallow /_includes
Disallow /koeln/nachrichten/24hticker
Disallow /koeln/nachrichten/nrwticker
Disallow /koeln/nachrichten/wetter
Disallow /koeln/was_ist_los/kino/
Disallow /markt
Disallow /mein_koeln
Disallow /mobil
Disallow /nachrichten
Disallow /service/mail/support
Disallow /tourismus/stadtfuehrungen/rent-a-guide/booking-step*
Disallow /tourismus/stadtfuehrungen/rent-a-guide/tour/*/2*
Disallow /admin
Disallow /category
Disallow /comment
Disallow /contact
Disallow /database
Disallow /filter
Disallow /flag_content
Disallow /includes
Disallow /login
Disallow /logout
Disallow /misc
Disallow /modules
Disallow /node
Disallow /profiles
Disallow /scripts
Disallow /search
Disallow /suche
Disallow /algolia
Disallow /taxonomy
Disallow /updates
Disallow /user
Allow /

mediapartners-google*

Rule Path
Allow /*?page=
Disallow /*?
Disallow /admin
Disallow /markt
Disallow /mein_koeln
Disallow /forum
Disallow /admin
Disallow /apps/kinokalender
Disallow /apps/wahlomat
Disallow /archiv
Disallow /category
Disallow /comment
Disallow /contact
Disallow /database
Disallow /flag_content
Disallow /filter
Disallow /hotspot
Disallow /includes
Disallow /intern/
Disallow /login
Disallow /logout
Disallow /misc
Disallow /modules
Disallow /node
Disallow /profiles
Disallow /scripts
Disallow /search
Disallow /tourismus/stadtfuehrungen/rent-a-guide/booking-step*
Disallow /tourismus/stadtfuehrungen/rent-a-guide/tour/*/*
Disallow /taxonomy
Disallow /updates
Disallow /user

bingbot
msnbot
msnbot-newsblogs
msnbot-media
msnbot-udiscovery

Rule Path
Allow /*?page=
Allow /jobs/jobsuche.html?
Disallow /aframe
Disallow /artikel
Disallow /apps/kinokalender
Disallow /apps/strassen/strassensuche
Disallow /apps/wahlomat
Disallow /archiv
Disallow /bilder/data
Disallow /bilder/picture
Disallow /bilder/vorschau
Disallow /branchen/api
Disallow /branchenfuehrer
Disallow /cgi-bin
Disallow /cms
Disallow /content
Disallow /forum
Disallow /hotspot
Disallow /index.php
Disallow /intern/
Disallow /_includes
Disallow /markt
Disallow /mein_koeln
Disallow /koeln/nachrichten/24hticker
Disallow /koeln/nachrichten/nrwticker
Disallow /koeln/nachrichten/wetter
Disallow /mobil
Disallow /nachrichten
Disallow /service/mail/support
Disallow /admin
Disallow /category
Disallow /comment
Disallow /contact
Disallow /database
Disallow /filter
Disallow /flag_content
Disallow /includes
Disallow /login
Disallow /logout
Disallow /misc
Disallow /modules
Disallow /node
Disallow /profiles
Disallow /scripts
Disallow /search
Disallow /sites
Disallow /suche
Disallow /algolia
Disallow /taxonomy
Disallow /tourismus/stadtfuehrungen/rent-a-guide/booking-step1
Disallow /updates
Disallow /user
Disallow /*?

Other Records

Field Value
crawl-delay 20

yandex

Rule Path
Disallow /

mj12bot

Rule Path
Allow /branchen
Disallow /

proximic

Rule Path
Disallow /

unisterbot

Rule Path
Disallow /

*

Rule Path
Disallow /_includes
Disallow /aframe
Disallow /archiv
Disallow /apps/kinokalender
Disallow /apps/strassen/strassensuche
Disallow /apps/wahlomat
Disallow /artikel
Allow /bilder/data/pictures/*/normal/
Allow /bilder/data/pictures/*/thumb/
Disallow /bilder/data
Disallow /bilder/picture
Disallow /bilder/vorschau
Disallow /branchen/api
Disallow /branchenfuehrer
Disallow /category
Disallow /cgi-bin
Disallow /cms
Disallow /comment
Disallow /contact
Disallow /database
Disallow /filter
Disallow /five-star
Disallow /flag_content
Disallow /forum
Disallow /hotspot
Disallow /index.php
Disallow /includes
Disallow /intern/
Disallow /koeln/nachrichten/24hticker
Disallow /koeln/nachrichten/nrwticker
Disallow /koeln/nachrichten/wetter
Disallow /login
Disallow /logout
Disallow /markt
Disallow /mein_koeln
Disallow /misc
Disallow /modules
Disallow /nachrichten
Disallow /node
Disallow /profiles
Disallow /scripts
Disallow /search
Disallow /sites
Disallow /suche
Disallow /algolia
Disallow /taxonomy
Disallow /tourismus/stadtfuehrungen/rent-a-guide/booking-step1
Disallow /updates
Disallow /user
Disallow /*?

Other Records

Field Value
crawl-delay 2

Other Records

Field Value
sitemap https://www.koeln.de/sitemap.xml

Comments

  • robots.txt
  • $Id: robots.txt 5918 2024-03-26 15:21:36Z koelndeweb $
  • (alle generischen Anweisungen bei den spez. Useragents wiederholen)
  • Bei Updates: testen mit google webmastertools, Sitemapgenerator anpassen
  • for generic User-agent:* see below
  • google-spezifisch: mit Wildcard
  • http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
  • spezifisches zuerst, also erst page=... erlauben, dann Rest verbieten
  • wikiID GOOROBTXT:
  • Drupal-URLs:
  • b.a.w: alles unter /user
  • Disallow: /user/login
  • Disallow: /user/password
  • Disallow: /user/register
  • zur sicherheit :-)
  • wir erlauben alles, was nicht verboten ist
  • spezifische URLs via Wildcard f. Adsense zulassen
  • Drupal-URLs:
  • bingbot and its numerous siblings (which are, in part, not documented :-(( )
  • require specific (and slower) crawl rate settings because they simply ignore
  • the crawl delay settings. so if you actually want a crawl delay of, say, 2 then there
  • must be a crawl delay setting of 20 to 30 - according to bing support.
  • Drupal-URLs:
  • according to bing docs, wildcards are accepted ... let's see :
  • Bot Operators: If you are listed here, please contact us.
  • And yes, we know that multiple User-Agent-Lines with
  • identical rules could be folded. But honestly: who would trust
  • a rogue bot operator to implement parsing correctly?
  • Test 202208-202209: will Mj12bot accept Allow Syntax and will it behave
  • Next ones in line for getting blocked: nachtschatten, neofonie
  • And now the final fallback instructions for the rest of the bots
  • at least some bots might unterstand wildcards and crawl delay ...