thueringer-allgemeine.de
robots.txt
Robots Exclusion Standard data for thueringer-allgemeine.de
Resource Scan
Scan Details
Site Domain | thueringer-allgemeine.de |
Base Domain | thueringer-allgemeine.de |
Scan Status | Ok |
Last Scan | 2024-11-09T16:43:29+00:00 |
Next Scan | 2024-11-16T16:43:29+00:00 |
Last Scan
Scanned | 2024-11-09T16:43:29+00:00 |
URL | https://thueringer-allgemeine.de/robots.txt |
Redirect | https://www.thueringer-allgemeine.de:443/robots.txt |
Redirect Domain | www.thueringer-allgemeine.de |
Redirect Base | thueringer-allgemeine.de |
Domain IPs | 18.185.81.127, 18.196.221.37, 3.72.121.83 |
Redirect IPs | 13.35.238.112, 13.35.238.43, 13.35.238.69, 13.35.238.72, 2600:9000:2085:3a00:0:747c:e140:93a1, 2600:9000:2085:3e00:0:747c:e140:93a1, 2600:9000:2085:5800:0:747c:e140:93a1, 2600:9000:2085:6600:0:747c:e140:93a1, 2600:9000:2085:7a00:0:747c:e140:93a1, 2600:9000:2085:b800:0:747c:e140:93a1, 2600:9000:2085:e400:0:747c:e140:93a1, 2600:9000:2085:fa00:0:747c:e140:93a1 |
Response IP | 13.35.238.72 |
Found | Yes |
Hash | e4301d5e1a3caabb66b9b3682bd3b5fd202a41d213d70f599390006f2bdd8625 |
SimHash | 5c0b9052c621 |
Groups
*
Rule | Path |
---|---|
Allow | /static/*/client.js |
Allow | /static/*/main.css |
Allow | /static/*/favicon.png |
Disallow | /stats/* |
Disallow | /*?config* |
Disallow | /*.xmli* |
Disallow | /*?service=Ajax |
Disallow | /*?service=ajax |
Disallow | /config/* |
Disallow | /test/* |
Disallow | /Test/* |
Disallow | /template/* |
Disallow | /*?*token=* |
Disallow | /*?*eventId=* |
Disallow | /static/* |
Disallow | /migration_import_no_section/* |
Disallow | /secure/ |
Disallow | /socialmedia/* |
Disallow | *reader_id%3DREADER_ID* |
Disallow | /suche/* |
Disallow | /*?widgetid= |
Disallow | /newsletter-result/ |
Disallow | *tpcc%3D* |
Disallow | /resources/ |
Disallow | /bin/ |
Disallow | /downloads/ |
Disallow | /service/newsletter-adconsent |
Disallow | /pagespeed_static/ |
Disallow | /resources/img/*icon*pagespeed |
semrushbot-sa
ahrefsbot
backlinkcrawler
linkchecker
dataforseobot
deepcrawl
majestic
majestic12
mj12bot
onpagebot
optimizer
rytebot
semrushbot
semrushbot-si
seobility
seodiver
seokicks
seokicks-robot
sistrix
openindexspider
openindexspider
sistrix optimizer
sistrix
sistrix crawler
siteauditbot
Rule | Path |
---|---|
Disallow | / |
amazonbot
anthropic-ai
applebot-extended
archive.org_bot
bytespider
ccbot
chatgpt-user
claudebot
claude-web
cohere-ai
diffbot
facebookbot
friendlycrawler
google-extended
googleother
gptbot
ia_archiver
img2dataset
omgili
omgilibot
peer39_crawler
peer39_crawler/1.0
perplexitybot
youbot
meta-externalagent
imagesiftbot
Rule | Path |
---|---|
Disallow | / |
arquivo-web-crawler
arquivo.pt
barkrowler
blexbot
browsertrix
brozzler
builtwith
cincraw
coccocbot
contao/crawler
dmbot
domainstatsbot
dotbot
dotbot
fluid
haosouspider
happywing
harsilbot
hatena antenna
heritrix
imagesiftbot
kazbtbot
kraken
linkdebot
linkfluence yak bot
mail.ru_bot
metajobbot
monsidobot
netestate
ogdwctcxcrawler
petalbot
researchbot
riddler
sentibot
rogerbot
semanticbot
semanticscholarbot
sirdatabot
spbot
special_archiver
splitsignalbot
tag-crawler
testcrawler
thinkers-bot
toplistbot
uipbot/1.0
urlsuma
user-agent
vsusearchspider
weborama-fetcher
wiseguys robot
wpbot
yeti
Rule | Path |
---|---|
Disallow | / |
Other Records
Field | Value |
---|---|
sitemap | https://www.thueringer-allgemeine.de/sitemaps/news.xml |
Comments