dcc-servers.net
robots.txt

Robots Exclusion Standard data for dcc-servers.net

Archived Snapshots

Resource Scan

Scan Details

Site Domain	dcc-servers.net
Base Domain	dcc-servers.net
Scan Status	Ok
Last Scan	2024-09-09T03:27:53+00:00
Next Scan	2024-10-09T03:27:53+00:00

Last Scan

Scanned	2024-09-09T03:27:53+00:00
URL	https://dcc-servers.net/robots.txt
Redirect	https://www.dcc-servers.net/robots.txt
Redirect Domain	www.dcc-servers.net
Redirect Base	dcc-servers.net
Domain IPs	2001:470:1f05:10ed::49, 72.18.213.49
Redirect IPs	2001:470:1f05:10ed::49, 72.18.213.49
Response IP	72.18.213.49
Found	Yes
Hash	8e879278093ecedb52b6d2b633a58c77ab0986f72098f4272188399c73bf48a8
SimHash	ba9149108c72

Groups

*

Rule	Path
Disallow	/icons

Rule

Path

Disallow

/icons

*

Rule	Path
Disallow	/.well-known

Rule

Path

Disallow

/.well-known

*

Rule	Path
Disallow	/dcc/private

Rule

Path

Disallow

/dcc/private

*

Rule	Path
Disallow	/dcc-demo-cgi-bin

Rule

Path

Disallow

/dcc-demo-cgi-bin

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot-image

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/badbottrap

Rule

Path

Disallow

/badbottrap

purebot

Rule	Path
Disallow	/

Rule

Path

Disallow

ezooms

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

surveybot

Rule	Path
Disallow	/

Rule

Path

Disallow

domaintools

Rule	Path
Disallow	/

Rule

Path

Disallow

sitebot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotnetdotcom

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

solomonobot

Rule	Path
Disallow	/

Rule

Path

Disallow

zmeu

Rule	Path
Disallow	/

Rule

Path

Disallow

morfeus

Rule	Path
Disallow	/

Rule

Path

Disallow

snoopy

Rule	Path
Disallow	/

Rule

Path

Disallow

wbsearchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

findlinks

Rule	Path
Disallow	/

Rule

Path

Disallow

aihitbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dinoping

Rule	Path
Disallow	/

Rule

Path

Disallow

panopta.com

Rule	Path
Disallow	/

Rule

Path

Disallow

searchmetrics

Rule	Path
Disallow	/

Rule

Path

Disallow

lipperhey

Rule	Path
Disallow	/

Rule

Path

Disallow

dataprovider.com

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sosospider

Rule	Path
Disallow	/

Rule

Path

Disallow

discoverybot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

www.integromedb.org/crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

yamanalab-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

ip-web-crawler.com

Rule	Path
Disallow	/

Rule

Path

Disallow

aboundex

Rule	Path
Disallow	/

Rule

Path

Disallow

aboundexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yunyun

Rule	Path
Disallow	/

Rule

Path

Disallow

masscan

Rule	Path
Disallow	/

Rule

Path

Disallow

escan

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

typhoeus

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

/icons/ only causes noise in the error log
spiders don't use authentication
no need for Chinese or Russian searches
no need to index images
firewall anything that goes here
the following should also be in badbots
The editoral comments for each of the following entries are
only opinions provoked by the behavior of the associated
'spiders' as seen in local HTTP server logs.
stupid bot
seems to only search for non-existent pages.
See ezooms.bot@gmail.com and wowrack.com
http://www.majestic12.co.uk/bot.php?+ follows many bogus and corrupt links
and so generates a lot of error log noise.
It does us no good and is a waste of our bandwidth.
There is no need to waste bandwith on an outfit trying to monetize our
web pages. $50 for data scraped from the web is too much
never bothers fetching robots.txt
See http://www.domaintools.com
too many mangled links and implausible home page
cutsy story is years stale and no longer excuses bad crawling
cutsy story is years stale and no longer excuses bad crawling
At best another broken spider that thinks all URLs are at the top level.
At worst, a malware scanner.
Never fetches robots.txt, contrary to http://www.warebay.com/bot.html.
See SolomonoBot/1.02 (http://www.solomono.ru)
evil
evil
evil
Yet another claimed search engine that generates bad links from plain text.
It fetches and then ignores robots.txt
188.138.48.235 http://www.warebay.com/bot.html
monetizers of other people's bandwidth.
monetizers of other people's bandwidth.
monetizers of other people's bandwidth.
monetizer of other people's bandwidth.
It ignores robots.txt.
Yet another monetizer of other people's bandwidth that hits selected
pages every few seconds from about a dozen HTTP clients around the
world without let, leave, hindrance, or notice.
There is no apparent way to ask them to stop. One DinoPing agent at
support@edis.at responded to a request to stop with "just use iptables"
on 2012/08/13.
They're blind to the irony that one of their targets is
<A HREF="that-which-we-dont.html">http://www.rhyolite.com/anti-spam/that-which-we-dont.html</A>
unprovoked, unasked for "monitoring" and "checking"
"The World's Experts in Search Analytics"
is yet another SEO outfit that hammers HTTP servers without permission
and without benefit for at least some HTTP server operators.
claimed SEO; ignores robots.txt
claimed SEO
SEO
http://www.semrush.com/bot.html suggests its results are
for users:
"Well, the real question is why do you not want the bot visiting
your page? Most bots are both harmless and quite beneficial. Bots
like Googlebot discover sites by following links from page to page.
This bot is crawling your page to help parse the content, so that
the relevant information contained within your site is easily indexed
and made more readily available to users searching for the content
you provide."
ignores robots.txt
no apparent reason to spend bandwidth or attention on its bad URLs in logs
no need for Russian searches and they fetch but ignore robots.txt
no "biomedical, biochemical, drug, health and disease related data" here.
192.31.21.179 switch from www.integromedb.org/Crawler to "Java/1.6.0_20"
and "-" after integromedb was added to robots.txt
does not handle protocol relative links. It does not fetch robots.txt.
does not handle protocol relative links.
does not know the difference between a hyperlink <A HREF="..."></A> and
anchors that are not links such as <A NAME="..."></A>
ambulence chasers with stupid spider that hits the bad spider trap.
ignores rel="nofollow" in links
parses ...href='asdf' onclick='... (single quote (') instead of double (")
as if " onclick=..." were part of the URL.
It fetches robots.txt and then ignores it
fetches robots.txt for only some domains.
It searches for non-existent but often abused URLs such as .../contact.cgi
waste of bandwidth
waste of bandwidth
no need to "[assist] internet marketers", especially given the bad URLs
no need to allow site sucking or other tests from Kaspersky Lab
the preceding should also be in the badbots ACL

Warnings

4 invalid lines.

dcc-servers.netrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

*

*

*

baiduspider

googlebot-image

*

purebot

ezooms

mj12bot

surveybot

domaintools

sitebot

dotnetdotcom

dotbot

solomonobot

zmeu

morfeus

snoopy

wbsearchbot

exabot

findlinks

aihitbot

ahrefsbot

dinoping

panopta.com

searchmetrics

lipperhey

dataprovider.com

semrushbot

sosospider

discoverybot

yandex

www.integromedb.org/crawler

yamanalab-robot

ip-web-crawler.com

aboundex

aboundexbot

yunyun

masscan

escan

blexbot

typhoeus

Comments

Warnings

dcc-servers.net
robots.txt