shop.hettich.com
robots.txt

Robots Exclusion Standard data for shop.hettich.com

Resource Scan

Scan Details

Site Domain shop.hettich.com
Base Domain hettich.com
Scan Status Ok
Last Scan2024-11-03T09:19:07+00:00
Next Scan 2024-12-03T09:19:07+00:00

Last Scan

Scanned2024-11-03T09:19:07+00:00
URL https://shop.hettich.com/robots.txt
Domain IPs 65.52.130.191
Response IP 65.52.130.191
Found Yes
Hash 1d25142442d5ffa1144e7d9672007a3941bf688acd9b5aedbb7ca7465216b610
SimHash 7d0467584a96

Groups

*

Rule Path
Disallow /*/login
Disallow /*/cart
Disallow /*/checkout
Disallow /*/my-account
Disallow /*/media/
Disallow /*/customers/
Disallow /*/*jsessionid*
Disallow /*/productlists/*
Disallow /*/comparison
Disallow /*/quickOrder
Disallow /*/import/csv/saved-cart

cazoodlebot
petalbot
ahrefsbot
ezbasebot
slurp
linkedinbot
python-urllib
python-requests
libwww-perl
httpunit
nutch
go-http-client
phpcrawl
msnbot
jyxobot
fast-webcrawler
fast enterprise crawler
biglotron
teoma
convera
seekbot
gigabot
gigablast
exabot
ia_archiver
gingercrawler
webmon
httrack
grub.org
usinenouvellecrawler
antibot
netresearchserver
speedy
fluffy
findlink
msrbot
panscient
yacybot
aisearchbot
ips-agent
tagoobot
mj12bot
woriobot
yanga
buzzbot
mlbot
purebot
linguee bot
cyberpatrol
voilabot
citeseerxbot
spbot
twengabot
postrank
turnitinbot
scribdbot
page2rss
sitebot
linkdex
adidxbot
ezooms
dotbot
mail.ru_bot
discobot
heritrix
findthatfile
europarchive.org
nerdbynature.bot
sistrix crawler
ahrefs
fuelbot
crunchbot
indeedbot
mappydata
woobot
zoominfobot
privacyawarebot
multiviewbot
swimgbot
grobbot
eright
apercite
semanticbot
aboundex
domaincrawler
wbsearchbot
summify
ccbot
edisterbot
seznambot
ec2linkfinder
gslfbot
aihitbot
intelium_bot
facebookexternalhit
yeti
retrevopageanalyzer
lb-spider
lssbot
careerbot
wotbox
wocbot
ichiro
lssrocketcrawler
drupact
webcompanycrawler
acoonbot
openindexspider
gnam gnam spider
web-archive-net.com.bot
backlinkcrawler
coccoc
integromedb
content crawler spider
toplistbot
it2media-domain-crawler
ip-web-crawler.com
siteexplorer.info
elisabot
proximic
changedetection
arabot
wesee:search
niki-bot
crystalsemanticsbot
rogerbot
psbot
interfaxscanbot
cc metadata scaper
g00g1e.net
grapeshotcrawler
urlappendbot
brainobot
fr-crawler
binlar
simplecrawler
twitterbot
cxensebot
smtbot
bnf.fr_bot
a6-indexer
admantx
facebot
orangebot
memorybot
advbot
megaindex
semanticscholarbot
ltx71
nerdybot
xovibot
bubing
qwantify
archive.org_bot
tweetmemebot
crawler4j
findxbot
semrushbot
yoozbot
lipperhey
y!j
domain re-animator bot
addthis
screaming frog seo spider
metauri
scrapy
livelap[bb]ot
openhosebot
capsulechecker
collection@infegy.com
istellabot
deusu
betabot
cliqzbot
mojeekbot
netestate ne crawler
safesearch microdata crawler
gluten free crawler
sonic
sysomos
trove
deadlinkchecker
slack-imgproxy
embedly
rankactivelinkbot
iskanie
safednsbot
skypeuripreview
veoozbot
slackbot
redditbot
datagnionbot
adbeat_bot
whatsapp
contxbot
pinterest.com.bot
electricmonk
garlikcrawler
bingpreview
vebidoobot
femtosearchbot
yahoo link preview
metajobbot
domainstatsbot
mindupbot
daum
jugendschutzprogramm-crawler
xenu link sleuth
pcore-http
moatbot
kosmiobot
pingdom
appinsights
phantomjs
gowikibot
piplbot
discordbot
telegrambot
jetslide
newsharecounts
james bot
bark[rr]owler
tineye
socialrankiobot
trendictionbot
ocarinabot
epicbot
primalbot
gnowitnewsbot
leikibot
linkarchiver
yak
paperlibot
digg deeper
dcrawl
snacktory
anderspinkbot
fyrebot
everyonesocialbot
mediatoolkitbot
luminator-robots
extlinksbot
surveybot
ning
okhttp
nuzzel
omgili
pocketparser
yisouspider
um-ln
toutiaospider
muckrack
jamie's spider
ahc
netcraftsurveyagent
laserlikebot
appengine-google
jetty
upflow
thinklab
traackr.com
twurly
mastodon
http_get
dnyzbot
botify
behloolbot
brandverity
check_http
bdcbot
zumbot
ezid
icc-crawler
archivebot
filterdb.iss.netcrawler
blp_bbot
bomborabot
buck
companybook-crawler
genieo
magpie-crawler
meltwaternews
moreover
newspaper
scoutjet
storygizebot
uptimerobot
outclicksbot
seoscanners
hatena
mauibot
alphabot
sbl-bot
ias crawler
adscanner
netvibes
acapbot
baidu-yunguance
bitlybot
blogmurabot
bot.araturka.com
bot-pge.chlooe.com
boxcarbot
btwebclient
contextad bot
digincore bot
disqus
feedly
fetch
fever
flamingo_searchengine
flipboardproxy
g2reader-bot
g2 web services
imrbot
k7mlwcbot
kemvibot
landau-media-spider
linkapediabot
vkshare
siteimprove.com
blexbot
dareboost
zuperlistbot
miniflux
feedspot
diffbot
seokicks
tracemyfile
nimbostratus-bot
zgrab
pr-cy.ru
adstxtcrawler
datafeedwatch
zabbix
tangibleebot
google-xrawler
axios
amazon cloudfront
pulsepoint
cloudflare-alwaysonline
wordupinfosearch
webdatastats
httpurlconnection
seekport crawler
zoombot
velenpublicwebcrawler
moodlebot
jpg-newsbot
outbrain
validator.nu
feedvalidator
blackboard
icbot
bazqux
twingly
rivva
experibot
awesomecrawler
dataprovider.com
grouphigh
theoldreader.com
anyevent
uptimebot.org
nmap scripting engine
clickagy
caliperbot
mbcrawler
online-webceo-bot
b2b bot
addsearchbot
hubspot
chrome-lighthouse
headlesschrome
checkmarknetwork
www.uptime.com
streamline3bot
serpstatbot
mixnodecache
simplescraper
rssingbot
jooblebot
fedoraplanet
friendica
nextcloud
tiny tiny rss
regionstuttgartbot
bytespider
datanyze
trendsmapresolver
tweetedtimes
ntentbot
gwene
simplepie
searchatlas
superfeedr
feedbot
ut-dorkbot
amazonbot
serendeputybot
eyeotabot
officestorebot
neticle crawler
surdotlybot
linkisbot
awariosmartbot
awariorssbot
rytebot
freewebmonitoring sitechecker
aspiegelbot
naver blog rssbot
zenback bot
sentibot
domains project
pandalytics
vkrobot
bidswitchbot
tigerbot
nixstatsbot
atom feed robot
curebot
pagepeeker
vigil
rssbot
startmebot
jobboersebot
seewithkids
ninja bot
cutbot
bublupbot
brandonbot
ridderbot
taboolabot
dubbotbot
finditanswersbot
infoobot
refindbot
blogtrafficd.d+ feed-fetcher
seobilitybot
cincraw
dragonbot
voluumdsp-content-bot
freshrss
bitbot

Rule Path
Disallow /

Comments

  • For all robots
  • Block access to specific groups of pages
  • Request-rate: 1/10 # maximum rate is one page every 10 seconds
  • Crawl-delay: 10 # 10 seconds between page requests
  • Visit-time: 0400-0800 # only visit between 04:00 and 08:00 UTC
  • Allow search crawlers to discover the sitemap
  • Sitemap: https://shop.hettich.com/sitemap.xml
  • Block the BadBots

Warnings

  • 8 invalid lines.