dcc-servers.net
robots.txt

Robots Exclusion Standard data for dcc-servers.net

Resource Scan

Scan Details

Site Domain dcc-servers.net
Base Domain dcc-servers.net
Scan Status Ok
Last Scan2024-09-09T03:27:53+00:00
Next Scan 2024-10-09T03:27:53+00:00

Last Scan

Scanned2024-09-09T03:27:53+00:00
URL https://dcc-servers.net/robots.txt
Redirect https://www.dcc-servers.net/robots.txt
Redirect Domain www.dcc-servers.net
Redirect Base dcc-servers.net
Domain IPs 2001:470:1f05:10ed::49, 72.18.213.49
Redirect IPs 2001:470:1f05:10ed::49, 72.18.213.49
Response IP 72.18.213.49
Found Yes
Hash 8e879278093ecedb52b6d2b633a58c77ab0986f72098f4272188399c73bf48a8
SimHash ba9149108c72

Groups

*

Rule Path
Disallow /icons

*

Rule Path
Disallow /.well-known

*

Rule Path
Disallow /dcc/private

*

Rule Path
Disallow /dcc-demo-cgi-bin

baiduspider

Rule Path
Disallow /

googlebot-image

Rule Path
Disallow /

*

Rule Path
Disallow /badbottrap

purebot

Rule Path
Disallow /

ezooms

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

surveybot

Rule Path
Disallow /

domaintools

Rule Path
Disallow /

sitebot

Rule Path
Disallow /

dotnetdotcom

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

solomonobot

Rule Path
Disallow /

zmeu

Rule Path
Disallow /

morfeus

Rule Path
Disallow /

snoopy

Rule Path
Disallow /

wbsearchbot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

findlinks

Rule Path
Disallow /

aihitbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

dinoping

Rule Path
Disallow /

panopta.com

Rule Path
Disallow /

searchmetrics

Rule Path
Disallow /

lipperhey

Rule Path
Disallow /

dataprovider.com

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

sosospider

Rule Path
Disallow /

discoverybot

Rule Path
Disallow /

yandex

Rule Path
Disallow /

www.integromedb.org/crawler

Rule Path
Disallow /

yamanalab-robot

Rule Path
Disallow /

ip-web-crawler.com

Rule Path
Disallow /

aboundex

Rule Path
Disallow /

aboundexbot

Rule Path
Disallow /

yunyun

Rule Path
Disallow /

masscan

Rule Path
Disallow /

escan

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

typhoeus

Rule Path
Disallow /

Comments

  • /icons/ only causes noise in the error log
  • spiders don't use authentication
  • no need for Chinese or Russian searches
  • no need to index images
  • firewall anything that goes here
  • the following should also be in badbots
  • The editoral comments for each of the following entries are
  • only opinions provoked by the behavior of the associated
  • 'spiders' as seen in local HTTP server logs.
  • stupid bot
  • seems to only search for non-existent pages.
  • See ezooms.bot@gmail.com and wowrack.com
  • http://www.majestic12.co.uk/bot.php?+ follows many bogus and corrupt links
  • and so generates a lot of error log noise.
  • It does us no good and is a waste of our bandwidth.
  • There is no need to waste bandwith on an outfit trying to monetize our
  • web pages. $50 for data scraped from the web is too much
  • never bothers fetching robots.txt
  • See http://www.domaintools.com
  • too many mangled links and implausible home page
  • cutsy story is years stale and no longer excuses bad crawling
  • cutsy story is years stale and no longer excuses bad crawling
  • At best another broken spider that thinks all URLs are at the top level.
  • At worst, a malware scanner.
  • Never fetches robots.txt, contrary to http://www.warebay.com/bot.html.
  • See SolomonoBot/1.02 (http://www.solomono.ru)
  • evil
  • evil
  • evil
  • Yet another claimed search engine that generates bad links from plain text.
  • It fetches and then ignores robots.txt
  • 188.138.48.235 http://www.warebay.com/bot.html
  • monetizers of other people's bandwidth.
  • monetizers of other people's bandwidth.
  • monetizers of other people's bandwidth.
  • monetizer of other people's bandwidth.
  • It ignores robots.txt.
  • Yet another monetizer of other people's bandwidth that hits selected
  • pages every few seconds from about a dozen HTTP clients around the
  • world without let, leave, hindrance, or notice.
  • There is no apparent way to ask them to stop. One DinoPing agent at
  • support@edis.at responded to a request to stop with "just use iptables"
  • on 2012/08/13.
  • They're blind to the irony that one of their targets is
  • <A HREF="that-which-we-dont.html">http://www.rhyolite.com/anti-spam/that-which-we-dont.html</A>
  • unprovoked, unasked for "monitoring" and "checking"
  • "The World's Experts in Search Analytics"
  • is yet another SEO outfit that hammers HTTP servers without permission
  • and without benefit for at least some HTTP server operators.
  • claimed SEO; ignores robots.txt
  • claimed SEO
  • SEO
  • http://www.semrush.com/bot.html suggests its results are
  • for users:
  • "Well, the real question is why do you not want the bot visiting
  • your page? Most bots are both harmless and quite beneficial. Bots
  • like Googlebot discover sites by following links from page to page.
  • This bot is crawling your page to help parse the content, so that
  • the relevant information contained within your site is easily indexed
  • and made more readily available to users searching for the content
  • you provide."
  • ignores robots.txt
  • no apparent reason to spend bandwidth or attention on its bad URLs in logs
  • no need for Russian searches and they fetch but ignore robots.txt
  • no "biomedical, biochemical, drug, health and disease related data" here.
  • 192.31.21.179 switch from www.integromedb.org/Crawler to "Java/1.6.0_20"
  • and "-" after integromedb was added to robots.txt
  • does not handle protocol relative links. It does not fetch robots.txt.
  • does not handle protocol relative links.
  • does not know the difference between a hyperlink <A HREF="..."></A> and
  • anchors that are not links such as <A NAME="..."></A>
  • ambulence chasers with stupid spider that hits the bad spider trap.
  • ignores rel="nofollow" in links
  • parses ...href='asdf' onclick='... (single quote (') instead of double (")
  • as if " onclick=..." were part of the URL.
  • It fetches robots.txt and then ignores it
  • fetches robots.txt for only some domains.
  • It searches for non-existent but often abused URLs such as .../contact.cgi
  • waste of bandwidth
  • waste of bandwidth
  • no need to "[assist] internet marketers", especially given the bad URLs
  • no need to allow site sucking or other tests from Kaspersky Lab
  • the preceding should also be in the badbots ACL

Warnings

  • 4 invalid lines.