ewhois.org
robots.txt

Robots Exclusion Standard data for ewhois.org

Resource Scan

Scan Details

Site Domain ewhois.org
Base Domain ewhois.org
Scan Status Ok
Last Scan2024-10-20T00:03:16+00:00
Next Scan 2024-11-19T00:03:16+00:00

Last Scan

Scanned2024-10-20T00:03:16+00:00
URL https://ewhois.org/robots.txt
Domain IPs 104.21.90.138, 172.67.156.196, 2606:4700:3031::ac43:9cc4, 2606:4700:3037::6815:5a8a
Response IP 172.67.156.196
Found Yes
Hash be4c06d36e89cb7fcbb2b8c1cb738005d6b19de7492a2f20285efd11bff7f432
SimHash 7f9453cbcdf4

Groups

ahrefsbot
ahrefssiteaudit
adbeat_bot
alexibot
appengine
aqua_products
asterias
b2w/0.1
backdoorbot/1.0
becomebot
blekkobot
blexbot
blowfish/1.0
bookmark search tool
botalot
builtbottough
bullseye/1.0
bunnyslippers
ccbot
cheesebot
cherrypicker
cherrypickerelite/1.0
cherrypickerse/1.0
chroot
copernic
copyrightcheck
cosmos
crescent
crescent internet toolpak http ole control v.1.0
dittospyder
dotbot
dumbot
emailcollector
emailsiphon
emailwolf
enterprise_search
enterprise_search/1.0
erocrawler
es
exabot
extractorpro
fairad client
flaming attackbot
foobot
gaisbot
getright/4.2
gigabot
grub
grub-client
go-http-client
harvest/1.5
hatena antenna
hloader
http://www.searchengineworld.com bot
http://www.webmasterworld.com bot
httplib
humanlinks
infonavirobot
iron33/1.0.2
jamesbot
jennybot
jetbot
jetbot/1.0
jorgee
kenjin spider
keyword density/0.9
lexibot
libweb/clshttp
linkextractorpro
linkpadbot
linkscan/8.1a unix
linkwalker
lnspiderguy
looksmart
lwp-trivial
lwp-trivial/1.34
mata hari
megalodon
microsoft url control
microsoft url control - 5.01.4511
microsoft url control - 6.00.8169
miixpc
miixpc/4.2
mister pix
moget
moget/2.1
naver
nerdybot
netants
netmechanic
nicerspro
nutch
openbot
openfind
openfind data gathere
oracle ultra search
perman
propowerbot/2.14
prowebwalker
psbot
python-urllib
queryn metasearch
radiation retriever 1.1
repomonkey
repomonkey bait & tackle/v1.01
rma
rogerbot
scooter
searchpreview
semrushbot
semrushbot
semrushbot-sa
seokicks-robot
sootle
spankbot
spanner
spbot
stanford
stanford comp sci
stanford compclub
stanford compsciclub
stanford spiderboys
surveybot
surveybot_ignoreip
suzuran
szukacz/1.4
szukacz/1.4
telesoft
teoma
the intraformant
thenomad
tocrawl/urldispatcher
true_robot
true_robot/1.0
turingos
typhoeus
url control
url_spider_pro
urly warning
vci
vci webviewer vci webviewer win32
web image collector
webauto
webbandit
webbandit/3.50
webenhancer
webmasterworld extractor
webmasterworldforumbot
websauger
website quester
webster pro
webvac
www-collector-e
zeus
zeus 32297 webster pro v2.9 win32
zeus link scout

Rule Path
Disallow /

ubicrawler
doc
zao
gsa-crawler

Rule Path
Disallow /

sitecheck.internetseer.com
zealbot
msiecrawler
sitesnagger
webstripper
webcopier
fetch
offline explorer
teleport
teleportpro
webzip
webzip/4.0
linko
httrack
microsoft.url.control
xenu
xenu's
xenu's link sleuth 1.1c
larbin
libwww
zyborg
download ninja

Rule Path
Disallow /

wget
wget/1.11.4
wget/1.13.4
wget/1.12
wget/1.5.3
wget/1.6

Rule Path
Disallow /

webreaper
cncdialer
maxthon
mj12bot
slurp
screaming frog seo spider

Rule Path
Disallow /

Comments

  • robots.txt
  • Specific bots settings
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy entire sites.
  • wget in its recursive mode is a frequent problem.
  • There is a wait option you can use to set the delay between hits for instance.
  • A capture bot, downloads gazillions of pages with no public benefit