harlan.lib.ia.us
robots.txt

Robots Exclusion Standard data for harlan.lib.ia.us

Archived Snapshots

Resource Scan

Scan Details

Site Domain	harlan.lib.ia.us
Base Domain	harlan.lib.ia.us
Scan Status	Ok
Last Scan	2025-10-05T06:27:39+00:00
Next Scan	2025-11-04T06:27:39+00:00

Last Scan

Scanned	2025-10-05T06:27:39+00:00
URL	https://harlan.lib.ia.us/robots.txt
Domain IPs	20.221.230.93
Response IP	20.221.230.93
Found	Yes
Hash	55c1cac744f20bbe498223ff42fb993712b6b36dd3a53ac1fdb7558a5edc20be
SimHash	b69a51108cf2

Groups

*

Rule	Path
Disallow	/application/attributes
Disallow	/application/authentication
Disallow	/application/bootstrap
Disallow	/application/config
Disallow	/application/controllers
Disallow	/application/elements
Disallow	/application/helpers
Disallow	/application/jobs
Disallow	/application/languages
Disallow	/application/mail
Disallow	/application/models
Disallow	/application/page_types
Disallow	/application/single_pages
Disallow	/application/tools
Disallow	/application/views
Disallow	/ccm/system/captcha/picture

Rule

Path

Disallow

/application/attributes

Disallow

/application/authentication

Disallow

/application/bootstrap

Disallow

/application/config

Disallow

/application/controllers

Disallow

/application/elements

Disallow

/application/helpers

Disallow

/application/jobs

Disallow

/application/languages

Disallow

/application/mail

Disallow

/application/models

Disallow

/application/page_types

Disallow

/application/single_pages

Disallow

/application/tools

Disallow

/application/views

Disallow

/ccm/system/captcha/picture

bingbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

ahrefsbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

purebot

Rule	Path
Disallow	/

Rule

Path

Disallow

ezooms

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

surveybot

Rule	Path
Disallow	/

Rule

Path

Disallow

domaintools

Rule	Path
Disallow	/

Rule

Path

Disallow

sitebot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotnetdotcom

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

solomonobot

Rule	Path
Disallow	/

Rule

Path

Disallow

zmeu

Rule	Path
Disallow	/

Rule

Path

Disallow

morfeus

Rule	Path
Disallow	/

Rule

Path

Disallow

snoopy

Rule	Path
Disallow	/

Rule

Path

Disallow

wbsearchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

findlinks

Rule	Path
Disallow	/

Rule

Path

Disallow

aihitbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dinoping

Rule	Path
Disallow	/

Rule

Path

Disallow

panopta.com

Rule	Path
Disallow	/

Rule

Path

Disallow

linkchecker.sourceforge.net

Rule	Path
Disallow	/

Rule

Path

Disallow

linkcheck

Rule	Path
Disallow	/

Rule

Path

Disallow

searchmetrics

Rule	Path
Disallow	/

Rule

Path

Disallow

lipperhey

Rule	Path
Disallow	/

Rule

Path

Disallow

dataprovider.com

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sosospider

Rule	Path
Disallow	/

Rule

Path

Disallow

discoverybot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

www.integromedb.org/crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

yamanalab-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

ip-web-crawler.com

Rule	Path
Disallow	/

Rule

Path

Disallow

aboundex

Rule	Path
Disallow	/

Rule

Path

Disallow

aboundexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yunyun

Rule	Path
Disallow	/

Rule

Path

Disallow

masscan

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

no need for Chinese or Russian searches
the following should also be in badbots
The editoral comments for each of the following entries are
only opinions provoked by the behavior of the associated
'spiders' as seen in local HTTP server logs.
stupid bot
seems to only search for non-existent pages.
See ezooms.bot@gmail.com and wowrack.com
http://www.majestic12.co.uk/bot.php?+ follows many bogus and corrupt links
and so generates a lot of error log noise.
It does us no good and is a waste of our bandwidth.
There is no need to waste bandwith on an outfit trying to monetize our
web pages. $50 for data scraped from the web is too much
never bothers fetching robots.txt
See http://www.domaintools.com
too many mangled links and implausible home page
cutsy story is years stale and no longer excuses bad crawling
cutsy story is years stale and no longer excuses bad crawling
At best another broken spider that thinks all URLs are at the top level.
At worst, a malware scanner.
Never fetches robots.txt, contrary to http://www.warebay.com/bot.html
See SolomonoBot/1.02 (http://www.solomono.ru)
evil
evil
evil
Yet another supposed search engine that generates bad links from plain text
It fetches and then ignores robots.txt
188.138.48.235 http://www.warebay.com/bot.html
monetizers of other people's bandwidth.
monetizers of other people's bandwidth.
monetizers of other people's bandwidth.
monetizer of other people's bandwidth.
As is common with such, it ignores robots.txt.
Yet another monetizer of other people's bandwidth that hits selected
pages every few seconds from about a dozen HTTP clients around the
world without let, leave, hindrance, or notice.
There is no apparent way to ask them to stop. One DinoPing agent at
support@edis.at responded to a request to stop with "just use iptables"
on 2012/08/13.
They're blind to the irony that one of their targets is
<A HREF="that-which-we-dont.html">http://www.rhyolite.com/anti-spam/that-which-we-dont.html</A>
unprovoked, unasked for "monitoring" and "checking"
checks much too fast and too much including traps
There is no need for third parties to check our links, thank you very much.
"The World's Experts in Search Analytics"
is yet another SEO outfit that hammers HTTP servers without permission
and without benefit for at least some HTTP server operators.
(supposed) SEO
(supposed) SEO
SEO
http://www.semrush.com/bot.html suggests its results are
for users:
Well, the real question is why do you not want the bot visiting
your page? Most bots are both harmless and quite beneficial. Bots
like Googlebot discover sites by following links from page to page.
This bot is crawling your page to help parse the content, so that
the relevant information contained within your site is easily indexed
and made more readily available to users searching for the content
you provide.
ignores robots.txt
no apparent reason to spend bandwidth or attention on its bad URLs in logs
no need for Russian searches and they fetch but ignore robots.txt
no "biomedical, biochemical, drug, health and disease related data" here.
192.31.21.179 switch from www.integromedb.org/Crawler to "Java/1.6.0_20"
and "-" after integromedb was added to robots.txt
does not handle protocol relative links. Does not even fetch robots.txt
does not handle protocol relative links.
does not know the difference between a hyperlink <A HREF="..."></A> and
anchors that are not links such as <A NAME="..."></A>
ambulence chasers with stupid spider that hits the evil spider trap
ignores rel="nofollow" in links
as if " onclick=..." were part of the URL
fetches robots.txt and then ignores it
fetches robots.txt for only some domains
searches for non-existent but often abuse URLs such as .../contact.cgi
waste of bandwidth

Warnings

4 invalid lines.

harlan.lib.ia.usrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

bingbot

Other Records

ahrefsbot

Other Records

baiduspider

purebot

ezooms

mj12bot

surveybot

domaintools

sitebot

dotnetdotcom

dotbot

solomonobot

zmeu

morfeus

snoopy

wbsearchbot

exabot

findlinks

aihitbot

ahrefsbot

dinoping

panopta.com

linkchecker.sourceforge.net

linkcheck

searchmetrics

lipperhey

dataprovider.com

semrushbot

sosospider

discoverybot

yandex

www.integromedb.org/crawler

yamanalab-robot

ip-web-crawler.com

aboundex

aboundexbot

yunyun

masscan

Comments

Warnings

harlan.lib.ia.us
robots.txt