blaetter.de
robots.txt

Robots Exclusion Standard data for blaetter.de

Archived Snapshots

Resource Scan

Scan Details

Site Domain	blaetter.de
Base Domain	blaetter.de
Scan Status	Ok
Last Scan	2024-11-17T02:06:30+00:00
Next Scan	2024-12-17T02:06:30+00:00

Last Scan

Scanned	2024-11-17T02:06:30+00:00
URL	https://blaetter.de/robots.txt
Redirect	https://www.blaetter.de/robots.txt
Redirect Domain	www.blaetter.de
Redirect Base	blaetter.de
Domain IPs	2a01:4f8:c01f:e::1, 78.47.204.102
Redirect IPs	2a01:4f8:c01f:e::1, 78.47.204.102
Response IP	78.47.204.102
Found	Yes
Hash	3220ff1719d262d489b6eafcbfada8b0073916005ead7e86ed2b35a45ae25500
SimHash	38d0ad00a568

Groups

*

Rule	Path
Allow	/core/*.css$
Allow	/core/*.css?
Allow	/core/*.js$
Allow	/core/*.js?
Allow	/core/*.gif
Allow	/core/*.jpg
Allow	/core/*.jpeg
Allow	/core/*.png
Allow	/core/*.svg
Allow	/profiles/*.css$
Allow	/profiles/*.css?
Allow	/profiles/*.js$
Allow	/profiles/*.js?
Allow	/profiles/*.gif
Allow	/profiles/*.jpg
Allow	/profiles/*.jpeg
Allow	/profiles/*.png
Allow	/profiles/*.svg
Disallow	/core/
Disallow	/profiles/
Disallow	/README.md
Disallow	/composer/Metapackage/README.txt
Disallow	/composer/Plugin/ProjectMessage/README.md
Disallow	/composer/Plugin/Scaffold/README.md
Disallow	/composer/Plugin/VendorHardening/README.txt
Disallow	/composer/Template/README.txt
Disallow	/modules/README.txt
Disallow	/sites/README.txt
Disallow	/themes/README.txt
Disallow	/web.config
Disallow	/admin/
Disallow	/comment/reply/
Disallow	/filter/tips
Disallow	/node/add/
Disallow	/search/
Disallow	/user/register
Disallow	/user/password
Disallow	/user/login
Disallow	/user/logout
Disallow	/media/oembed
Disallow	/*/media/oembed
Disallow	/index.php/admin/
Disallow	/index.php/comment/reply/
Disallow	/index.php/filter/tips
Disallow	/index.php/node/add/
Disallow	/index.php/search/
Disallow	/index.php/user/password
Disallow	/index.php/user/register
Disallow	/index.php/user/login
Disallow	/index.php/user/logout
Disallow	/index.php/media/oembed
Disallow	/index.php/*/media/oembed
Disallow	/ausgabe/2024/januar/amerika-vor-der-trump-diktatur

Rule

Path

Allow

/core/*.css$

Allow

/core/*.css?

Allow

/core/*.js$

Allow

/core/*.js?

Allow

/core/*.gif

Allow

/core/*.jpg

Allow

/core/*.jpeg

Allow

/core/*.png

Allow

/core/*.svg

Allow

/profiles/*.css$

Allow

/profiles/*.css?

Allow

/profiles/*.js$

Allow

/profiles/*.js?

Allow

/profiles/*.gif

Allow

/profiles/*.jpg

Allow

/profiles/*.jpeg

Allow

/profiles/*.png

Allow

/profiles/*.svg

Disallow

/core/

Disallow

/profiles/

Disallow

/README.md

Disallow

/composer/Metapackage/README.txt

Disallow

/composer/Plugin/ProjectMessage/README.md

Disallow

/composer/Plugin/Scaffold/README.md

Disallow

/composer/Plugin/VendorHardening/README.txt

Disallow

/composer/Template/README.txt

Disallow

/modules/README.txt

Disallow

/sites/README.txt

Disallow

/themes/README.txt

Disallow

/web.config

Disallow

/admin/

Disallow

/comment/reply/

Disallow

/filter/tips

Disallow

/node/add/

Disallow

/search/

Disallow

/user/register

Disallow

/user/password

Disallow

/user/login

Disallow

/user/logout

Disallow

/media/oembed

Disallow

/*/media/oembed

Disallow

/index.php/admin/

Disallow

/index.php/comment/reply/

Disallow

/index.php/filter/tips

Disallow

/index.php/node/add/

Disallow

/index.php/search/

Disallow

/index.php/user/password

Disallow

/index.php/user/register

Disallow

/index.php/user/login

Disallow

/index.php/user/logout

Disallow

/index.php/media/oembed

Disallow

/index.php/*/media/oembed

Disallow

/ausgabe/2024/januar/amerika-vor-der-trump-diktatur

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

sistrix

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix

Rule	Path
Disallow	/

Rule

Path

Disallow

seokicks-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

jobs.de-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

unisterbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

searchmetricsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

surveybot

Rule	Path
Disallow	/

Rule

Path

Disallow

seodiver

Rule	Path
Disallow	/

Rule

Path

Disallow

spbot

Rule	Path
Disallow	/

Rule

Path

Disallow

wotbox

Rule	Path
Disallow	/

Rule

Path

Disallow

meanpathbot

Rule	Path
Disallow	/

Rule

Path

Disallow

backlinkcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

magpie-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

obot

Rule	Path
Disallow	/

Rule

Path

Disallow

fr-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex.ru

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex.com

Rule	Path
Disallow	/

Rule

Path

Disallow

cloudservermarketspider

Rule	Path
Disallow	/

Rule

Path

Disallow

trendictionbot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

careerbot

Rule	Path
Disallow	/

Rule

Path

Disallow

lipperhey-kaus-australis

Rule	Path
Disallow	/

Rule

Path

Disallow

seoscanners.net

Rule	Path
Disallow	/

Rule

Path

Disallow

metajobbot

Rule	Path
Disallow	/

Rule

Path

Disallow

spiderbot

Rule	Path
Disallow	/

Rule

Path

Disallow

linkstats

Rule	Path
Disallow	/

Rule

Path

Disallow

jobboersebot

Rule	Path
Disallow	/

Rule

Path

Disallow

iccrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

plista

Rule	Path
Disallow	/

Rule

Path

Disallow

domain re-animator bot

Rule	Path
Disallow	/

Rule

Path

Disallow

lipperhey-kaus-australis

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

coccoc

Rule	Path
Disallow	/

Rule

Path

Disallow

um-ic

Rule	Path
Disallow	/

Rule

Path

Disallow

mindupbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sg-orbiter

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

qwantify

Rule	Path
Disallow	/

Rule

Path

Disallow

kraken

Rule	Path
Disallow	/

Rule

Path

Disallow

plukkie

Rule	Path
Disallow	/

Rule

Path

Disallow

safednsbot

Rule

Path

Disallow

haosouspider

Rule

Path

Disallow

rogerbot

Rule

Path

Disallow

openhosebot

Rule

Path

Disallow

screaming frog seo spider

Rule

Path

Disallow

thumbsniper

Rule

Path

Disallow

r6_commentreader

Rule

Path

Disallow

implisensebot

Rule

Path

Disallow

cliqzbot

Rule

Path

Disallow

aihitbot

Rule

Path

Disallow

trendictionbot

Rule

Path

Disallow

adscanner

Rule

Path

Disallow

crawler4j

Rule

Path

Disallow

wbsearchbot

Rule

Path

Disallow

python/3.5 aiohttp

Rule

Path

Disallow

toweya.com

Rule

Path

Disallow

netestate

Rule

Path

Disallow

bubing

Rule

Path

Disallow

linguee

Rule

Path

Disallow

semrushbot

Rule

Path

Disallow

semrushbot-sa

Rule

Path

Disallow

sentibot

Rule

Path

Disallow

sentibot

Rule

Path

Disallow

velenpublicwebcrawler

Rule

Path

Disallow

domaincrawler

Rule

Path

Disallow

rogerbot

Rule

Path

Disallow

indeedbot

Rule

Path

Disallow

garlikcrawler

Rule

Path

Disallow

gosign-security-crawler

Rule

Path

Disallow

siteliner

Rule

Path

Disallow

sabsimbot

Rule

Path

Disallow

ltx71

Rule

Path

Disallow

Comments

robots.txt
This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
This file will be ignored unless it is at the root of your host:
Used: http://example.com/robots.txt
Ignored: http://example.com/site/robots.txt
For more information about the robots.txt standard, see:
http://www.robotstxt.org/robotstxt.html
CSS, JS, Images
Directories
Files
Paths (clean URLs)
Paths (no clean URLs)
Disallow restricted content
www.robotstxt.org/
www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156449
Slow down bots
no user agent here as this is appended to the root robot.txt
Disallow: Sistrix
Disallow: Sistrix
Disallow: Sistrix
Disallow: SEOkicks-Robot
Disallow: jobs.de-Robot
Backlink Analysis
Bot der Leipziger Unister Holding GmbH
http://www.opensiteexplorer.org/dotbot
http://www.searchmetrics.com
http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
http://www.domaintools.com/webmasters/surveybot.php
http://www.seodiver.com/bot
http://openlinkprofiler.org/bot
http://www.wotbox.com/bot/
http://www.meanpath.com/meanpathbot.html
http://www.backlinktest.com/crawler.html
http://www.brandwatch.com/magpie-crawler/
http://filterdb.iss.net/crawler/
http://webmeup-crawler.com
https://megaindex.com/crawler
http://www.cloudservermarket.com
http://www.trendiction.de/de/publisher/bot
http://www.exalead.com
http://www.career-x.de/bot.html
https://www.lipperhey.com/en/about/
https://www.lipperhey.com/en/about/
https://turnitin.com/robot/crawlerinfo.html
http://help.coccoc.com/
ubermetrics-technologies.com
datenbutler.de
http://searchgears.de/uber-uns/crawling-faq.html
http://commoncrawl.org/faq/
https://www.qwant.com/
http://linkfluence.net/
http://www.botje.com/plukkie.htm
https://www.safedns.com/searchbot
http://www.haosou.com/help/help_3_2.html
http://www.haosou.com/help/help_3_2.html
http://www.moz.com/dp/rogerbot
http://www.openhose.org/bot.html
http://www.screamingfrog.co.uk/seo-spider/
http://thumbsniper.com
http://www.radian6.com/crawler
http://cliqz.com/company/cliqzbot
https://www.aihitdata.com/about
http://www.trendiction.com/en/publisher/bot
http://seocompany.store
https://github.com/yasserg/crawler4j/
http://warebay.com/bot.html
http://www.website-datenbank.de/
http://law.di.unimi.it/BUbiNG.html
http://www.linguee.com/bot; bot@linguee.com
https://www.semrush.com/bot/
www.sentibot.eu
http://velen.io
https://moz.com/help/guides/moz-procedures/what-is-rogerbot
http://www.garlik.com
https://www.gosign.de/typo3-extension/typo3-sicherheitsmonitor/
http://www.siteliner.com/bot
https://sabsim.com
http://ltx71.com/

Warnings

2 invalid lines.

blaetter.derobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

sistrix

sistrix crawler

sistrix

seokicks-robot

jobs.de-robot

ahrefsbot

unisterbot

dotbot

dotbot

searchmetricsbot

mj12bot

surveybot

seodiver

spbot

wotbox

meanpathbot

backlinkcrawler

magpie-crawler

obot

fr-crawler

blexbot

megaindex.ru

megaindex.com

cloudservermarketspider

trendictionbot

exabot

careerbot

lipperhey-kaus-australis

seoscanners.net

metajobbot

spiderbot

linkstats

jobboersebot

iccrawler

plista

domain re-animator bot

lipperhey-kaus-australis

turnitinbot

coccoc

um-ic

mindupbot

sg-orbiter

ccbot

qwantify

kraken

plukkie

safednsbot

haosouspider

rogerbot

openhosebot

screaming frog seo spider

thumbsniper

r6_commentreader

implisensebot

cliqzbot

aihitbot

trendictionbot

adscanner

crawler4j

wbsearchbot

python/3.5 aiohttp

toweya.com

netestate

bubing

linguee

semrushbot

semrushbot-sa

sentibot

sentibot

velenpublicwebcrawler

domaincrawler

rogerbot

indeedbot

blaetter.de
robots.txt