arb.com
robots.txt

Robots Exclusion Standard data for arb.com

Resource Scan

Scan Details

Site Domain arb.com
Base Domain arb.com
Scan Status Ok
Last Scan2024-09-17T00:50:29+00:00
Next Scan 2024-10-17T00:50:29+00:00

Last Scan

Scanned2024-09-17T00:50:29+00:00
URL https://arb.com/robots.txt
Redirect https://www.arb.com/robots.txt
Redirect Domain www.arb.com
Redirect Base arb.com
Domain IPs 172.66.41.7, 172.66.42.249, 2606:4700:3108::ac42:2907, 2606:4700:3108::ac42:2af9
Redirect IPs 172.66.41.7, 172.66.42.249, 2606:4700:3108::ac42:2907, 2606:4700:3108::ac42:2af9
Response IP 172.66.42.249
Found Yes
Hash 57077cd0c76b2b4fd2a89e796cc8c33f29f76db45d89a70ffed86baf906f2c2d
SimHash 6f3797d3d2dc

Groups

*
adsbot
alexa
alexa site audit
alexabot
aspiegelbot
ccbot
dataforseobot
dotbot
gsa-crawler
heritrix
ia_archiver
infotigerbot
magpie-crawler
majestic
majestic12
mauibot
mj12bot
mojeekbot
nutch
petalbot
seekport
semrushbot
semrushbot-ba
xovibot

Rule Path
Disallow /$
Disallow /
Disallow /*
Allow /robots.txt
Allow /robots.txt$

adidxbot
adsbot-google
adsbot-google-mobile
adsbot-google-mobile-apps
ahrefsbot
ahrefssiteaudit
apis-google
appengine-google
apple-pubsub
applebot
applenewsbot
aspiegelbot
baiduspider
baiduspider-ads
baiduspider-cpro
baiduspider-honeycomb
baiduspider-image
baiduspider-mobile
baiduspider-news
baiduspider-render
baiduspider-video
bingbot
bingpreview
bublupbot
ccbot
cliqzbot
coccoc
coccocbot-image
coccocbot-web
daumoa
dazoobot
deusu
discordbot
duckduckbot
duckduckgo-favicons-bot
duplexweb-google
euripbot
exabot
exploratodo
facebookexternalhit
facebot
feedfetcher-google
feedly
findxbot
google favicon
google-adwords-instant
google-read-aloud
google-speakr
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
haosouspider
ichiro
istellabot
jikespider
librabot
linkedinbot
loader.io
lycos
mail.ru
mail.ru_bot
mediapartners-google
msnbot
msnbot-media
msnbot-newsblogs
msnbot-udiscovery
naver
naverbot
neevabot
onpagebot
orangebot
pinterest
plukkie
qwantify
railgun
rambler
redditbot
rytebot
seznambot
slack-imgproxy
slackbot
slackbot-linkexpanding
slurp
sogou
sogou blog
sogou head spider
sogou inst spider
sogou news spider
sogou orion spider
sogou spider2
sogou web spider
sogou-test-spider
sosospider
sputnikbot
teoma
twitterbot
whatsapp
wotbox
yacybot
yadirectfetcher
yahoo-blogs
yahoo-mmcrawler
yandex
yandexaccessibilitybot
yandexblogs
yandexbot
yandexcalendar
yandexdirect
yandexdirectdyn
yandexfavicons
yandeximageresizer
yandeximages
yandexmarket
yandexmedia
yandexmetrika
yandexmobilebot
yandexnews
yandexpagechecker
yandexscreenshotbot
yandexsearchshop
yandexsitelinks
yandexverticals
yandexvertis
yandexvideo
yandexvideoparser
yandexwebmaster
yeti
yioopbot
yisouspider
yoozbot
youdaobot

Rule Path
Disallow
Allow /$
Allow /
Allow /*
Disallow *.env/*
Disallow *.env/$
Disallow *.env$
Disallow */.env/*
Disallow */.env/$
Disallow */.env$
Disallow */.git/*
Disallow */.git/$
Disallow */.git$
Disallow */.svn/*
Disallow */.svn/$
Disallow */.svn$
Disallow */*.env/*
Disallow */CVS/*
Disallow */CVS/$
Disallow */CVS$
Disallow /*__proto__*
Disallow /*.cfg$
Disallow /*.conf$
Disallow /*.config$
Disallow /*.csv$
Disallow /*.doc$
Disallow /*.docx$
Disallow /*.ppt$
Disallow /*.pptx$
Disallow /*.txt$
Disallow /*.xls$
Disallow /*.xlsx$
Disallow /account-confirmation/*
Disallow /account/*
Disallow /admin.php?*
Disallow /admin.php$
Disallow /admin/*
Disallow /administrator/*
Disallow /lost-password/*
Disallow /proxy.php$
Disallow /robomail/*
Disallow /robomail/$
Disallow /robomail$
Disallow /search/*
Disallow /wp-admin/*
Disallow /wp-content/plugins/dzs-videogallery/bridge.php$
Disallow /wp-content/plugins/dzs-videogallery/bridge.php*
Disallow /wp-content/plugins/vimeography/lib/shared/assets/
Disallow /wp-includes/wlwmanifest.xml*
Disallow /wp-login.php*
Disallow /wp-json/*
Disallow /wp-signup.php*
Disallow /xmlrpc.php*
Disallow /?*
Disallow /*?*
Disallow *?s=
Disallow *?*&s=
Disallow *?*&s=*&x=*&y=*
Allow /?faq-group=*
Allow /wp-admin/admin-ajax.php
Allow /wp-json/*.css*
Allow /wp-json/*.js*
Allow /ads.txt$
Allow /google*.html
Allow /naver*.html
Allow /seznam-wmt-*.txt
Allow /.well-known/*
Allow /humans.txt$
Allow /robots.txt$
Allow /sitemap.xml$
Allow /sitemap_index.xml$
Allow /*-sitemap.xml$

Other Records

Field Value
crawl-delay 1

Other Records

Field Value
sitemap https://www.arb.com/sitemap_index.xml

Comments

  • robots for https://www.arb.com
  • Remember, this only affects crawling, not indexing.
  • 1. Catch-all / Everything else (*)
  • 2. Explicitly Allowed Spiders, Named to be Apparent
  • 3. Sitemaps
  • 4. Directives not globally supported
  • 1. Prohibited Crawlers Catch All;
  • Grey bots that will obey
  • robots.txt
  • 2. Explicitly Allowed Crawlers
  • 3. Sitemaps
  • 4. Directives not globally supported

Warnings

  • 3 invalid lines.
  • `clean-param` is not a known field.
  • `host` is not a known field.
  • `request-rate` is not a known field.