www.dspace.uce.edu.ec
robots.txt

Robots Exclusion Standard data for www.dspace.uce.edu.ec

Resource Scan

Scan Details

Site Domain www.dspace.uce.edu.ec
Base Domain uce.edu.ec
Scan Status Ok
Last Scan2025-02-18T00:44:45+00:00
Next Scan 2025-03-20T00:44:45+00:00

Last Scan

Scanned2025-02-18T00:44:45+00:00
URL https://www.dspace.uce.edu.ec/robots.txt
Response IP 54.39.87.130
Found Yes
Hash 1309ff16892542661938a838ae30362bc0e6a675fe037d84382df4e1ae82f058
SimHash ef14d53bc1b5

Groups

*

Rule Path
Disallow /search
Disallow /admin/*
Disallow /processes
Disallow /submit
Disallow /workspaceitems
Disallow /profile
Disallow /workflowitems
Disallow /simple-search

googlebot

Rule Path
Allow /

*

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

mediapartners-google*

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

fast

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

lockss

Rule Path
Disallow /

spide,

Rule Path
Disallow /

cfnetwork|checkbot

Rule Path
Disallow /

webmetrics

Rule Path
Disallow /

robot

Rule Path
Disallow /

yahoo

Rule Path
Disallow /

feedfetcher\-google

Rule Path
Disallow /

baiduspide,

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

slurp

Rule Path
Disallow /

crawl

Rule Path
Disallow /

crawle,

Rule Path
Disallow /

java

Rule Path
Disallow /

java\/

Rule Path
Disallow /

feedburne,

Rule Path
Disallow /

yandex

Rule Path
Disallow /

bspide,

Rule Path
Disallow /

python

Rule Path
Disallow /

ichiro

Rule Path
Disallow /

urllib

Rule Path
Disallow /

python\-urllib

Rule Path
Disallow /

alexa

Rule Path
Disallow /

urlaliasbuilde,

Rule Path
Disallow /

rss

Rule Path
Disallow /

sogou

Rule Path
Disallow /

exabot

Rule Path
Disallow /

scirus

Rule Path
Disallow /

msnbotnagios

Rule Path
Disallow /

libwww

Rule Path
Disallow /

libwww\-perl

Rule Path
Disallow /

bbot

Rule Path
Disallow /

wget

Rule Path
Disallow /

lwp

Rule Path
Disallow /

docomo

Rule Path
Disallow /

commons\-httpclient

Rule Path
Disallow /

robots

Rule Path
Disallow /

moto,

Rule Path
Disallow /

wordpress

Rule Path
Disallow /

lwp\:\:simple

Rule Path
Disallow /

ia_archive,

Rule Path
Disallow /

y!j

Rule Path
Disallow /

custo

Rule Path
Disallow /

mail.ru

Rule Path
Disallow /

linkcheck

Rule Path
Disallow /

voila

Rule Path
Disallow /

archive\.org_bot

Rule Path
Disallow /

core

Rule Path
Disallow /

yodaobot

Rule Path
Disallow /

lwp\-trivial

Rule Path
Disallow /

nutch

Rule Path
Disallow /

heritrix

Rule Path
Disallow /

ourbrowse,

Rule Path
Disallow /

jeeves

Rule Path
Disallow /

surveybot

Rule Path
Disallow /

arks

Rule Path
Disallow /

yahoofeedseeke,

Rule Path
Disallow /

daumoa

Rule Path
Disallow /

powermarks

Rule Path
Disallow /

linkbot

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

sunrise

Rule Path
Disallow /

ramble,

Rule Path
Disallow /

wanadoo

Rule Path
Disallow /

linkscan

Rule Path
Disallow /

yacy

Rule Path
Disallow /

httrack

Rule Path
Disallow /

linkchecke,

Rule Path
Disallow /

goldfire(\s|\+)serve,

Rule Path
Disallow /

xenu(\s|\+)link(\s|\+)sleuth

Rule Path
Disallow /

xenu

Rule Path
Disallow /

htmlparse,

Rule Path
Disallow /

findlinks

Rule Path
Disallow /

microsoft(\s|\+)url(\s|\+)control

Rule Path
Disallow /

msiecrawle,

Rule Path
Disallow /

ultraseek

Rule Path
Disallow /

larbin

Rule Path
Disallow /

dsurf

Rule Path
Disallow /

teoma

Rule Path
Disallow /

fetch(\s|\+)api(\s|\+)request

Rule Path
Disallow /

mediapartners\-google

Rule Path
Disallow /

isilox

Rule Path
Disallow /

webcopie,

Rule Path
Disallow /

spiderman

Rule Path
Disallow /

girafabot

Rule Path
Disallow /

alexandria(\s|\+)prototype(\s|\+)project

Rule Path
Disallow /

allentrack

Rule Path
Disallow /

arachmo

Rule Path
Disallow /

brutus\/aet

Rule Path
Disallow /

china\slocal\sbrowse\s2\.6

Rule Path
Disallow /

code\ssample\sweb\sclient

Rule Path
Disallow /

contentsmartz

Rule Path
Disallow /

datacha0s\/2\.0

Rule Path
Disallow /

demo\sbot

Rule Path
Disallow /

emailsiphon

Rule Path
Disallow /

emailwolf

Rule Path
Disallow /

fdm(\s|\+)1

Rule Path
Disallow /

getright

Rule Path
Disallow /

milbot

Rule Path
Disallow /

muscatferre

Rule Path
Disallow /

nabot

Rule Path
Disallow /

naverbot

Rule Path
Disallow /

offline(\s|\+)navigato,

Rule Path
Disallow /

readpape,

Rule Path
Disallow /

stride,

Rule Path
Disallow /

t\-h\-u\-n\-d\-e\-r\-s\-t\-o\-n\-e

Rule Path
Disallow /

teleport(\s|\+)pro

Rule Path
Disallow /

web(\s|\+)downloade,

Rule Path
Disallow /

webclone,

Rule Path
Disallow /

webreape,

Rule Path
Disallow /

webstrippe,

Rule Path
Disallow /

webzip

Rule Path
Disallow /

webinato,

Rule Path
Disallow /

acme\.spide,

Rule Path
Disallow /

almaden

Rule Path
Disallow /

appie

Rule Path
Disallow /

architext

Rule Path
Disallow /

asterias

Rule Path
Disallow /

atomz

Rule Path
Disallow /

autoemailspide,

Rule Path
Disallow /

awbot

Rule Path
Disallow /

biadu

Rule Path
Disallow /

biglotron

Rule Path
Disallow /

bjaaland

Rule Path
Disallow /

blaiz\-bee

Rule Path
Disallow /

bloglines

Rule Path
Disallow /

blogpulse

Rule Path
Disallow /

boitho\.com\-dc

Rule Path
Disallow /

bookmark\-manage,

Rule Path
Disallow /

bwh3_user_agent

Rule Path
Disallow /

celestial

Rule Path
Disallow /

combine

Rule Path
Disallow /

contentmatch

Rule Path
Disallow /

curso,

Rule Path
Disallow /

dtsearchspide,

Rule Path
Disallow /

dumbot

Rule Path
Disallow /

easydl

Rule Path
Disallow /

fast-webcrawle,

Rule Path
Disallow /

favorg

Rule Path
Disallow /

ferret

Rule Path
Disallow /

gaisbot

Rule Path
Disallow /

geturl

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

gnodspide,

Rule Path
Disallow /

grub

Rule Path
Disallow /

gullive,

Rule Path
Disallow /

harvest

Rule Path
Disallow /

hl_ftien_spide,

Rule Path
Disallow /

holmes

Rule Path
Disallow /

htdig

Rule Path
Disallow /

httpget\-5\.2\.2

Rule Path
Disallow /

httpget\?5\.2\.2

Rule Path
Disallow /

iktomi

Rule Path
Disallow /

ilse

Rule Path
Disallow /

internetsee,

Rule Path
Disallow /

intute

Rule Path
Disallow /

jobo

Rule Path
Disallow /

kyluka

Rule Path
Disallow /

lilina

Rule Path
Disallow /

linkwalke,

Rule Path
Disallow /

livejournal\.com

Rule Path
Disallow /

lmspide,

Rule Path
Disallow /

lwp\-request

Rule Path
Disallow /

lwp\-tivial

Rule Path
Disallow /

lycos[_+]

Rule Path
Disallow /

megite

Rule Path
Disallow /

milbot

Rule Path
Disallow /

mimas

Rule Path
Disallow /

mnogosearch

Rule Path
Disallow /

moget

Rule Path
Disallow /

mojeekbot

Rule Path
Disallow /

momspide,

Rule Path
Disallow /

myweb

Rule Path
Disallow /

netcraft

Rule Path
Disallow /

netluchs

Rule Path
Disallow /

ng\/2\.

Rule Path
Disallow /

no_user_agent

Rule Path
Disallow /

nomad

Rule Path
Disallow /

ocelli

Rule Path
Disallow /

onetszukaj

Rule Path
Disallow /

perman

Rule Path
Disallow /

pionee,

Rule Path
Disallow /

playmusic\.com

Rule Path
Disallow /

playstarmusic\.com

Rule Path
Disallow /

psbot

Rule Path
Disallow /

qihoobot

Rule Path
Disallow /

redalert|robozilla

Rule Path
Disallow /

scan4mail

Rule Path
Disallow /

scientificcommons

Rule Path
Disallow /

scoote,

Rule Path
Disallow /

seekbot

Rule Path
Disallow /

shoutcast

Rule Path
Disallow /

speedy

Rule Path
Disallow /

spiderview

Rule Path
Disallow /

superbot

Rule Path
Disallow /

tailrank

Rule Path
Disallow /

technoratibot

Rule Path
Disallow /

titan

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

twicele,

Rule Path
Disallow /

ucsd

Rule Path
Disallow /

virus[_+]detecto,

Rule Path
Disallow /

w3c\-checklink

Rule Path
Disallow /

webcollage

Rule Path
Disallow /

weblayers

Rule Path
Disallow /

webmirro,

Rule Path
Disallow /

webreape,

Rule Path
Disallow /

worm

Rule Path
Disallow /

yahoo\-mmcrawle,

Rule Path
Disallow /

yahooseeke,

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

zeus

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

jikespide,

Rule Path
Disallow /

Other Records

Field Value
sitemap http://www.dspace.uce.edu.ec/sitemap_index.xml
sitemap http://www.dspace.uce.edu.ec/sitemap_index.html

Comments

  • The FULL URL to the DSpace sitemaps
  • The https://www.dspace.uce.edu.ec
  • XML sitemap is listed first as it is preferred by most search engines
  • Sitemap: /sitemap_index.xml
  • Sitemap: /sitemap_index.html
  • Default Access Group
  • (NOTE: blank lines are not allowable in a group record)
  • Disable access to Discovery search and filters; admin pages; processes; submission; workspace; workflow & profile page
  • Optionally uncomment the following line ONLY if sitemaps are working
  • and you have verified that your site is being indexed correctly.
  • Disallow: /browse
  • If you have configured DSpace (Solr-based) Statistics to be publicly
  • accessible, then you may not want this content to be indexed
  • Disallow: /statistics
  • You also may wish to disallow access to the following paths, in order
  • to stop web spiders from accessing user-based content
  • Disallow: /contact
  • Disallow: /feedback
  • Disallow: /forgot
  • Disallow: /login
  • Disallow: /register
  • User-agent: SemrushBot
  • Crawl-Delay: 60
  • User-agent: AhrefsBot
  • Crawl-Delay: 60
  • Section for misbehaving bots
  • The following directives to block specific robots were borrowed from Wikipedia's robots.txt
  • advertising-related bots:
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Misbehaving: requests much too fast:
  • If your DSpace is going down because of someone using recursive wget,
  • you can activate the following rule.
  • If your own faculty is bringing down your dspace with recursive wget,
  • you can advise them to use the --wait option to set the delay between hits.
  • User-agent: wget
  • Disallow: /
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • User-agent:
  • Disallow: /
  • The rest on the projectcounter.org list.
  • A couple robots PLOS blocks, not on the counter list. (Added by rcave.)

Warnings

  • 4 invalid lines.