harrysarmysurplus.net
robots.txt

Robots Exclusion Standard data for harrysarmysurplus.net

Archived Snapshots

Resource Scan

Scan Details

Site Domain	harrysarmysurplus.net
Base Domain	harrysarmysurplus.net
Scan Status	Failed
Failure Stage	Fetching resource.
Failure Reason	Server returned a client error.
Last Scan	2024-10-02T01:51:15+00:00
Next Scan	2024-12-31T01:51:15+00:00

Last Successful Scan

Scanned	2023-08-15T21:07:29+00:00
URL	https://harrysarmysurplus.net/robots.txt
Domain IPs	104.19.177.121, 104.19.178.121
Response IP	104.19.177.121
Found	Yes
Hash	d8c3e492e1a8351dee5ecd553df1f84467afa72a1a6561c6a1aa29b9b5dcf074
SimHash	8a5f75c0eb20

Groups

*

Rule	Path
Disallow	/checkout.asp
Disallow	/add_cart.asp
Disallow	/view_cart.asp
Disallow	/error.asp
Disallow	/shipquote.asp
Disallow	/rssfeed.asp
Disallow	/mobile/
Disallow	/AccountSettings.asp
Disallow	/checkout.asp
Disallow	/crm.asp
Disallow	/EmailaFriend.asp
Disallow	/Email_Me_When_Back_In_Stock.asp
Disallow	/giftregistry_home.asp
Disallow	/login.asp
Disallow	/myaccount.asp
Disallow	/reviewhelpful.asp
Disallow	/SearchResults.asp
Disallow	/ShoppingCart.asp
Disallow	/ticket_new.asp
Disallow	/view_cart.asp
Disallow	/stats/
Disallow	/3droi/
Disallow	/size-charts/

Rule

Path

Disallow

/checkout.asp

Disallow

/add_cart.asp

Disallow

/view_cart.asp

Disallow

/error.asp

Disallow

/shipquote.asp

Disallow

/rssfeed.asp

Disallow

/mobile/

Disallow

/AccountSettings.asp

Disallow

/checkout.asp

Disallow

/crm.asp

Disallow

/EmailaFriend.asp

Disallow

/Email_Me_When_Back_In_Stock.asp

Disallow

/giftregistry_home.asp

Disallow

/login.asp

Disallow

/myaccount.asp

Disallow

/reviewhelpful.asp

Disallow

/SearchResults.asp

Disallow

/ShoppingCart.asp

Disallow

/ticket_new.asp

Disallow

/view_cart.asp

Disallow

/stats/

Disallow

/3droi/

Disallow

/size-charts/

googlebot

Rule	Path
Disallow

Rule

Path

Disallow

googlebot-image

Rule	Path
Disallow

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mediapartners-google

Rule	Path
Disallow	/

Rule

Path

Disallow

proximic

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

yeti

Rule	Path
Disallow	/

Rule

Path

Disallow

nextgensearchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

picscout

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

tineye

Rule	Path
Disallow	/

Rule

Path

Disallow

sogou spider

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

nutch

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

python-urllib

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

seokicks-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

uptimerobot/2.0

Rule	Path
Disallow	/

Rule

Path

Disallow

ezooms robot

Rule	Path
Disallow	/

Rule

Path

Disallow

perl lwp

Rule	Path
Disallow	/

Rule

Path

Disallow

netestate ne crawler (+http://www.website-datenbank.de/)

Rule	Path
Disallow	/

Rule

Path

Disallow

wiseguys robot

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitin robot

Rule	Path
Disallow	/

Rule

Path

Disallow

heritrix

Rule	Path
Disallow	/

Rule

Path

Disallow

pimonster

Rule	Path
Disallow	/

Rule

Path

Disallow

pimonster

Rule	Path
Disallow	/

Rule

Path

Disallow

pi-monster

Rule	Path
Disallow	/

Rule

Path

Disallow

eccp/1.0 (search@eniro.com)

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider
baiduspider-video
baiduspider-image

Rule	Path
Disallow	/

Rule

Path

Disallow

sogou spider

Rule	Path
Disallow	/

Rule

Path

Disallow

psbot

Rule	Path
Disallow	/

Rule

Path

Disallow

youdaobot

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

naverbot
yeti

Rule	Path
Disallow	/

Rule

Path

Disallow

psbot

Rule	Path
Disallow	/

Rule

Path

Disallow

zbot

Rule	Path
Disallow	/

Rule

Path

Disallow

vagabondo

Rule	Path
Disallow	/

Rule

Path

Disallow

linkwalker

Rule	Path
Disallow	/

Rule

Path

Disallow

xenu link sleuth

Rule

Path

Disallow

simplepie

Rule

Path

Disallow

wget

Rule

Path

Disallow

pixray-seeker

Rule

Path

Disallow

boardreader

Rule

Path

Disallow

unknown bot

Rule

Path

Disallow

yandexdirect

Rule

Path

Disallow

yandexdirectdyn

Rule

Path

Disallow

yandexmedia

Rule

Path

Disallow

yandeximages

Rule

Path

Disallow

yadirectfetcher

Rule

Path

Disallow

yandexblogs

Rule

Path

Disallow

yandexnews

Rule

Path

Disallow

yandexpagechecker

Rule

Path

Disallow

yandexmetrika

Rule

Path

Disallow

yandexcalendar

Rule

Path

Disallow

ia_archiver

Rule

Path

Disallow

Other Records

Field

Value

sitemap

https://www.harrysarmysurplus.net/sitemap.xml

Comments

Disallow all crawlers access to certain pages.
Block Yandex from crawling site
Block Yeti
Block NextGenSearchBot
Block Baiduspider from crawling site
Block PicScout Crawler from crawling site
Block MJ12bot from crawling site
Block 008 from crawling site
Block BLEXBot Crawler from crawling site
Block TinEye from crawling site
Block Sogou Spider from crawling site
Block Exabot from crawling site
Block Nutch from crawling site
Block MJ12bot as it is just noise
Block Python-urllib
Block dotbot
Block SEOkicks
Block BlexBot
Block SISTRIX
Block Uptime robot
Block Ezooms Robot
Block Perl LWP
Block netEstate NE Crawler (+http://www.website-datenbank.de/)
Block WiseGuys Robot
Block Turnitin Robot
Block Heritrix
Block pricepi
Block Eniro
Block YandexBot
Block Baidu
Block SoGou
Block Psbot
Block Youdao
BLEXBot
Block NaverBot
Block Psbot
Block ZBot
Block Vagabondo
Block LinkWalker
Block Xenu Link Sleuth
Block SimplePie
Block Wget
Block Pixray-Seeker
Block BoardReader
Block Unknown Bot
Block
Block YandexDirectDyn
Block YandexMedia
Block YandexImages
Block YaDirectFetcher
Block YandexBlogs
Block YandexNews
Block YandexPagechecker
Block YandexMetrika
Block YandexCalendar
Block Archive Org

Warnings

2 invalid lines.

harrysarmysurplus.netrobots.txt

Resource Scan

Scan Details

Last Successful Scan

Groups

*

googlebot

googlebot-image

ccbot

ahrefsbot

mediapartners-google

proximic

semrushbot

petalbot

yandex

yeti

nextgensearchbot

baiduspider

picscout

mj12bot

blexbot crawler

tineye

sogou spider

exabot

nutch

mj12bot

python-urllib

dotbot

seokicks-robot

blexbot

sistrix crawler

uptimerobot/2.0

ezooms robot

perl lwp

netestate ne crawler (+http://www.website-datenbank.de/)

wiseguys robot

turnitin robot

heritrix

pimonster

pimonster

pi-monster

eccp/1.0 (search@eniro.com)

yandex

baiduspiderbaiduspider-videobaiduspider-image

sogou spider

psbot

youdaobot

blexbot

naverbotyeti

psbot

zbot

vagabondo

linkwalker

xenu link sleuth

simplepie

wget

pixray-seeker

boardreader

unknown bot

yandexdirect

yandexdirectdyn

yandexmedia

yandeximages

yadirectfetcher

yandexblogs

yandexnews

yandexpagechecker

yandexmetrika

yandexcalendar

ia_archiver

Other Records

Comments

Warnings

harrysarmysurplus.net
robots.txt

baiduspider
baiduspider-video
baiduspider-image

naverbot
yeti