arb.com
robots.txt

Robots Exclusion Standard data for arb.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	arb.com
Base Domain	arb.com
Scan Status	Ok
Last Scan	2024-09-17T00:50:29+00:00
Next Scan	2024-10-17T00:50:29+00:00

Last Scan

Scanned	2024-09-17T00:50:29+00:00
URL	https://arb.com/robots.txt
Redirect	https://www.arb.com/robots.txt
Redirect Domain	www.arb.com
Redirect Base	arb.com
Domain IPs	172.66.41.7, 172.66.42.249, 2606:4700:3108::ac42:2907, 2606:4700:3108::ac42:2af9
Redirect IPs	172.66.41.7, 172.66.42.249, 2606:4700:3108::ac42:2907, 2606:4700:3108::ac42:2af9
Response IP	172.66.42.249
Found	Yes
Hash	57077cd0c76b2b4fd2a89e796cc8c33f29f76db45d89a70ffed86baf906f2c2d
SimHash	6f3797d3d2dc

Groups

*
adsbot
alexa
alexa site audit
alexabot
aspiegelbot
ccbot
dataforseobot
dotbot
gsa-crawler
heritrix
ia_archiver
infotigerbot
magpie-crawler
majestic
majestic12
mauibot
mj12bot
mojeekbot
nutch
petalbot
seekport
semrushbot
semrushbot-ba
xovibot

Rule	Path
Disallow	/$
Disallow	/
Disallow	/*
Allow	/robots.txt
Allow	/robots.txt$

Rule

Path

Disallow

/$

Disallow

/

Disallow

/*

Allow

/robots.txt

Allow

/robots.txt$

adidxbot
adsbot-google
adsbot-google-mobile
adsbot-google-mobile-apps
ahrefsbot
ahrefssiteaudit
apis-google
appengine-google
apple-pubsub
applebot
applenewsbot
aspiegelbot
baiduspider
baiduspider-ads
baiduspider-cpro
baiduspider-honeycomb
baiduspider-image
baiduspider-mobile
baiduspider-news
baiduspider-render
baiduspider-video
bingbot
bingpreview
bublupbot
ccbot
cliqzbot
coccoc
coccocbot-image
coccocbot-web
daumoa
dazoobot
deusu
discordbot
duckduckbot
duckduckgo-favicons-bot
duplexweb-google
euripbot
exabot
exploratodo
facebookexternalhit
facebot
feedfetcher-google
feedly
findxbot
google favicon
google-adwords-instant
google-read-aloud
google-speakr
googlebot
googlebot-image
googlebot-mobile
googlebot-news
googlebot-video
haosouspider
ichiro
istellabot
jikespider
librabot
linkedinbot
loader.io
lycos
mail.ru
mail.ru_bot
mediapartners-google
msnbot
msnbot-media
msnbot-newsblogs
msnbot-udiscovery
naver
naverbot
neevabot
onpagebot
orangebot
pinterest
plukkie
qwantify
railgun
rambler
redditbot
rytebot
seznambot
slack-imgproxy
slackbot
slackbot-linkexpanding
slurp
sogou
sogou blog
sogou head spider
sogou inst spider
sogou news spider
sogou orion spider
sogou spider2
sogou web spider
sogou-test-spider
sosospider
sputnikbot
teoma
twitterbot
whatsapp
wotbox
yacybot
yadirectfetcher
yahoo-blogs
yahoo-mmcrawler
yandex
yandexaccessibilitybot
yandexblogs
yandexbot
yandexcalendar
yandexdirect
yandexdirectdyn
yandexfavicons
yandeximageresizer
yandeximages
yandexmarket
yandexmedia
yandexmetrika
yandexmobilebot
yandexnews
yandexpagechecker
yandexscreenshotbot
yandexsearchshop
yandexsitelinks
yandexverticals
yandexvertis
yandexvideo
yandexvideoparser
yandexwebmaster
yeti
yioopbot
yisouspider
yoozbot
youdaobot

Rule	Path
Disallow
Allow	/$
Allow	/
Allow	/*
Disallow	.env/
Disallow	*.env/$
Disallow	*.env$
Disallow	/.env/
Disallow	*/.env/$
Disallow	*/.env$
Disallow	/.git/
Disallow	*/.git/$
Disallow	*/.git$
Disallow	/.svn/
Disallow	*/.svn/$
Disallow	*/.svn$
Disallow	/.env/*
Disallow	/CVS/
Disallow	*/CVS/$
Disallow	*/CVS$
Disallow	/__proto__
Disallow	/*.cfg$
Disallow	/*.conf$
Disallow	/*.config$
Disallow	/*.csv$
Disallow	/*.doc$
Disallow	/*.docx$
Disallow	/*.ppt$
Disallow	/*.pptx$
Disallow	/*.txt$
Disallow	/*.xls$
Disallow	/*.xlsx$
Disallow	/account-confirmation/*
Disallow	/account/*
Disallow	/admin.php?*
Disallow	/admin.php$
Disallow	/admin/*
Disallow	/administrator/*
Disallow	/lost-password/*
Disallow	/proxy.php$
Disallow	/robomail/*
Disallow	/robomail/$
Disallow	/robomail$
Disallow	/search/*
Disallow	/wp-admin/*
Disallow	/wp-content/plugins/dzs-videogallery/bridge.php$
Disallow	/wp-content/plugins/dzs-videogallery/bridge.php*
Disallow	/wp-content/plugins/vimeography/lib/shared/assets/
Disallow	/wp-includes/wlwmanifest.xml*
Disallow	/wp-login.php*
Disallow	/wp-json/*
Disallow	/wp-signup.php*
Disallow	/xmlrpc.php*
Disallow	/?*
Disallow	/?
Disallow	*?s=
Disallow	?&s=
Disallow	?&s=&x=&y=*
Allow	/?faq-group=*
Allow	/wp-admin/admin-ajax.php
Allow	/wp-json/.css
Allow	/wp-json/.js
Allow	/ads.txt$
Allow	/google*.html
Allow	/naver*.html
Allow	/seznam-wmt-*.txt
Allow	/.well-known/*
Allow	/humans.txt$
Allow	/robots.txt$
Allow	/sitemap.xml$
Allow	/sitemap_index.xml$
Allow	/*-sitemap.xml$

Rule

Path

Disallow

Allow

/$

Allow

/

Allow

/*

Disallow

*.env/*

Disallow

*.env/$

Disallow

*.env$

Disallow

*/.env/*

Disallow

*/.env/$

Disallow

*/.env$

Disallow

*/.git/*

Disallow

*/.git/$

Disallow

*/.git$

Disallow

*/.svn/*

Disallow

*/.svn/$

Disallow

*/.svn$

Disallow

*/*.env/*

Disallow

*/CVS/*

Disallow

*/CVS/$

Disallow

*/CVS$

Disallow

/*__proto__*

Disallow

/*.cfg$

Disallow

/*.conf$

Disallow

/*.config$

Disallow

/*.csv$

Disallow

/*.doc$

Disallow

/*.docx$

Disallow

/*.ppt$

Disallow

/*.pptx$

Disallow

/*.txt$

Disallow

/*.xls$

Disallow

/*.xlsx$

Disallow

/account-confirmation/*

Disallow

/account/*

Disallow

/admin.php?*

Disallow

/admin.php$

Disallow

/admin/*

Disallow

/administrator/*

Disallow

/lost-password/*

Disallow

/proxy.php$

Disallow

/robomail/*

Disallow

/robomail/$

Disallow

/robomail$

Disallow

/search/*

Disallow

/wp-admin/*

Disallow

/wp-content/plugins/dzs-videogallery/bridge.php$

Disallow

/wp-content/plugins/dzs-videogallery/bridge.php*

Disallow

/wp-content/plugins/vimeography/lib/shared/assets/

Disallow

/wp-includes/wlwmanifest.xml*

Disallow

/wp-login.php*

Disallow

/wp-json/*

Disallow

/wp-signup.php*

Disallow

/xmlrpc.php*

Disallow

/?*

Disallow

/*?*

Disallow

*?s=

Disallow

*?*&s=

Disallow

*?*&s=*&x=*&y=*

Allow

/?faq-group=*

Allow

/wp-admin/admin-ajax.php

Allow

/wp-json/*.css*

Allow

/wp-json/*.js*

Allow

/ads.txt$

Allow

/google*.html

Allow

/naver*.html

Allow

/seznam-wmt-*.txt

Allow

/.well-known/*

Allow

/humans.txt$

Allow

/robots.txt$

Allow

/sitemap.xml$

Allow

/sitemap_index.xml$

Allow

/*-sitemap.xml$

Other Records

Field	Value
crawl-delay	1

Field

Value

crawl-delay

1

Back to top

Other Records

Field	Value
sitemap	https://www.arb.com/sitemap_index.xml

Field

Value

sitemap

https://www.arb.com/sitemap_index.xml

Back to top

Comments

robots for https://www.arb.com
Remember, this only affects crawling, not indexing.
1. Catch-all / Everything else (*)
2. Explicitly Allowed Spiders, Named to be Apparent
3. Sitemaps
4. Directives not globally supported
1. Prohibited Crawlers Catch All;
Grey bots that will obey
robots.txt
2. Explicitly Allowed Crawlers
3. Sitemaps
4. Directives not globally supported

Back to top

Warnings

3 invalid lines.
`clean-param` is not a known field.
`host` is not a known field.
`request-rate` is not a known field.

Back to top

arb.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*adsbotalexaalexa site auditalexabotaspiegelbotccbotdataforseobotdotbotgsa-crawlerheritrixia_archiverinfotigerbotmagpie-crawlermajesticmajestic12mauibotmj12botmojeekbotnutchpetalbotseekportsemrushbotsemrushbot-baxovibot

Other Records

Other Records

Comments

Warnings

arb.com
robots.txt

*
adsbot
alexa
alexa site audit
alexabot
aspiegelbot
ccbot
dataforseobot
dotbot
gsa-crawler
heritrix
ia_archiver
infotigerbot
magpie-crawler
majestic
majestic12
mauibot
mj12bot
mojeekbot
nutch
petalbot
seekport
semrushbot
semrushbot-ba
xovibot