leganews.es
robots.txt

Robots Exclusion Standard data for leganews.es

Resource Scan

Scan Details

Site Domain leganews.es
Base Domain leganews.es
Scan Status Ok
Last Scan2024-11-09T03:39:58+00:00
Next Scan 2024-11-16T03:39:58+00:00

Last Scan

Scanned2024-11-09T03:39:58+00:00
URL https://leganews.es/robots.txt
Domain IPs 157.90.124.213
Response IP 157.90.124.213
Found Yes
Hash 1001dc2c9342702b569117c01575de1d696748e63b4b6842017d3e2780b2aeeb
SimHash 731629124cf1

Groups

*

Rule Path
Allow /wp-admin/admin-ajax.php
Disallow /wp-login
Disallow /wp-admin
Disallow /*/feed/
Disallow /*/attachment/
Disallow /author/
Disallow *?replytocom
Disallow /tag/*/page/
Disallow /tag/*/feed/
Disallow /comments/
Disallow /xmlrpc.php
Disallow /*?s=
Disallow /*/*/*/feed.xml
Disallow /?attachment_id*
Disallow /search

googlebot

Rule Path
Allow /*.css$
Allow /*.js$

*

Rule Path
Disallow /?s=
Disallow /search

*

Rule Path
Disallow /trackback
Disallow /*trackback
Disallow /*trackback*
Disallow /*/trackback

msiecrawler

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

libwww

Rule Path
Disallow /

orthogaffe

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

semrushbot/6~bl

Rule Path
Disallow /

grapeshotcrawler/2.0

Rule Path
Disallow /

trident/7.0

Rule Path
Disallow /

bingbot/2.0

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

semrushbot-sa

Rule Path
Disallow /

msnbot

Rule Path
Disallow /

bingbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

blexbot/1.0

Rule Path
Disallow /

facebookexternalhit/1.1

Rule Path
Disallow /

googlebot-image/1.0

Rule Path
Disallow /

aboundexbot

Rule Path
Disallow /

adbeat_bot

Rule Path
Disallow /

addthis

Rule Path
Disallow /

advbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

aihitbot

Rule Path
Disallow /

alphaseobot

Rule Path
Disallow /

alphaseobot-sa

Rule Path
Disallow /

audisto

Rule Path
Disallow /

audisto-essential

Rule Path
Disallow /

awariorssbot

Rule Path
Disallow /

awariosmartbot

Rule Path
Disallow /

backlinkcrawler

Rule Path
Disallow /

begunadvertising

Rule Path
Disallow /

betabot

Rule Path
Disallow /

bitlybot

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

booglebot

Rule Path
Disallow /

bubing

Rule Path
Disallow /

buckyohare

Rule Path
Disallow /

careerbot

Rule Path
Disallow /

ca-crawler

Rule Path
Disallow /

calculon

Rule Path
Disallow /

catchbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

checkmarknetwork/1.0 (+http://www.checkmarknetwork.com/spider.html)

Rule Path
Disallow /

cms crawler

Rule Path
Disallow /

compspybot

Rule Path
Disallow /

crawler4j

Rule Path
Disallow /

crazywebcrawler-spider

Rule Path
Disallow /

cukbot

Rule Path
Disallow /

dataprovider

Rule Path
Disallow /

deepcrawl

Rule Path
Disallow /

der-bot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

dlvr.it

Rule Path
Disallow /

domainappender

Rule Path
Disallow /

domainstatsbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

dubaiindex

Rule Path
Disallow /

electricmonk

Rule Path
Disallow /

envolk

Rule Path
Disallow /

evc-batch/2.0

Rule Path
Disallow /

expertsearchspider

Rule Path
Disallow /

extlinksbot

Rule Path
Disallow /

fatbot

Rule Path
Disallow /

findxbot

Rule Path
Disallow /

fr-crawler

Rule Path
Disallow /

garlikcrawler

Rule Path
Disallow /

genieo

Rule Path
Disallow /

gloomarbot

Rule Path
Disallow /

grapeshot

Rule Path
Disallow /

httrack

Rule Path
Disallow /

huaweisymantecspider

Rule Path
Disallow /

hubspot crawler 1.0 http://www.hubspot.com/

Rule Path
Disallow /

hubspot links crawler 1.0 http://www.hubspot.com/

Rule Path
Disallow /

hypercrawl

Rule Path
Disallow /

hypestat

Rule Path
Disallow /

ias_crawler

Rule Path
Disallow /

iccrawler - icjobs

Rule Path
Disallow /

idmarch

Rule Path
Disallow /

implisensebot

Rule Path
Disallow /

ips-agent

Rule Path
Disallow /

irlbot

Rule Path
Disallow /

jamesbot

Rule Path
Disallow /

jobboersebot

Rule Path
Disallow /

jobdiggerspider

Rule Path
Disallow /

kalooga

Rule Path
Disallow /

kraken

Rule Path
Disallow /

larbin

Rule Path
Disallow /

linkdexbot

Rule Path
Disallow /

linkpadbot

Rule Path
Disallow /

lipperhey

Rule Path
Disallow /

lssrocketcrawler

Rule Path
Disallow /

ltx71

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

mappy

Rule Path
Disallow /

mauibot

Rule Path
Disallow /

mbcrawler

Rule Path
Disallow /

meanpathbot

Rule Path
Disallow /

megaindex.ru 2.0

Rule Path
Disallow /

metajobbot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

nbot

Rule Path
Disallow /

neofonie

Rule Path
Disallow /

nerdybot

Rule Path
Disallow /

netcraftsurveyagent

Rule Path
Disallow /

netpeakspiderbot

Rule Path
Disallow /

netseer

Rule Path
Disallow /

nextgensearchbot

Rule Path
Disallow /

nutch

Rule Path
Disallow /

obot

Rule Path
Disallow /

openindexspider

Rule Path
Disallow /

panscient.com

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

plukkie

Rule Path
Disallow /

pockey-gethtml

Rule Path
Disallow /

proximic

Rule Path
Disallow /

pu_in

Rule Path
Disallow /

pub-crawler

Rule Path
Disallow /

radian6

Rule Path
Disallow /

ranksonicsiteauditor

Rule Path
Disallow /

ravencrawler

Rule Path
Disallow /

red

Rule Path
Disallow /

redesscrapy

Rule Path
Disallow /

riddler

Rule Path
Disallow /

rogerbot

Rule Path
Disallow /

safednsbot

Rule Path
Disallow /

sbsearch

Rule Path
Disallow /

scoutjet

Rule Path
Disallow /

screaming frog seo spider

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

sentibot

Rule Path
Disallow /

seocharger-robot

Rule Path
Disallow /

seokicks-robot

Rule Path
Disallow /

seolyticscrawler

Rule Path
Disallow /

seoscanners.net/1

Rule Path
Disallow /

sg-orbiter

Rule Path
Disallow /

siteexplorer

Rule Path
Disallow /

sitesucker

Rule Path
Disallow /

smabblerbot

Rule Path
Disallow /

smtbot

Rule Path
Disallow /

spbot

Rule Path
Disallow /

spiderling

Rule Path
Disallow /

spiderlytics

Rule Path
Disallow /

ssearch_bot

Rule Path
Disallow /

ssearch crawler

Rule Path
Disallow /

steeler

Rule Path
Disallow /

stq_bot

Rule Path
Disallow /

surdotlybot

Rule Path
Disallow /

surveybot

Rule Path
Disallow /

swiftbot

Rule Path
Disallow /

the knowledge ai

Rule Path
Disallow /

thumbshots-de-bot

Rule Path
Disallow /

thumbsniper (http://thumbsniper.com)

Rule Path
Disallow /

thunderstone

Rule Path
Disallow /

tineye

Rule Path
Disallow /

toscrawler

Rule Path
Disallow /

trendictionbot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

ucrawler

Rule Path
Disallow /

uptimebot

Rule Path
Disallow /

vagabondo

Rule Path
Disallow /

vebidoobot

Rule Path
Disallow /

velenpublicwebcrawler

Rule Path
Disallow /

voltron

Rule Path
Disallow /

wbsearchbot

Rule Path
Disallow /

webmeasurement-bot

Rule Path
Disallow /

wesee

Rule Path
Disallow /

wevikabot

Rule Path
Disallow /

wget

Rule Path
Disallow /

wikido

Rule Path
Disallow /

wonderbot

Rule Path
Disallow /

wotbox

Rule Path
Disallow /

x28-job-bot

Rule Path
Disallow /

xovibot

Rule Path
Disallow /

zoombot

Rule Path
Disallow /

unnecessarybot

Rule Path
Disallow /

googlebot

Rule Path
Allow /

Other Records

Field Value
sitemap https://www.leganews.es/sitemap.xml

Comments

  • Segundo Bloque
  • Bloqueo de busquedas
  • Bloqueo de trackbacks
  • Bloqueo de bots y crawlers poco utiles
  • http://crawler.007ac9.net/
  • http://www.aboundex.com/crawler/
  • https://www.adbeat.com/operation_policy
  • http://advbot.org/bot.html
  • https://ahrefs.com/robot/
  • http://alphaseobot.com/bot.html
  • http://alphaseobot.com/bot.html
  • https://audisto.com/help/crawler/bot
  • https://audisto.com/help/crawler/bot
  • https://awario.com/bots.html
  • https://awario.com/bots.html
  • http://www.backlinktest.com/crawler.html
  • (http://begun.ru/advertiser/technologies/indexer.php)
  • (https://bitly.com/)
  • http://webmeup-crawler.com/
  • http://law.di.unimi.it/BUbiNG.html
  • https://hypefactors.com/webcrawler
  • no more active!?
  • http://catchbot.com/ (via archive.org)
  • http://commoncrawl.org/big-picture/frequently-asked-questions/
  • http://www.cmscrawler.com/ (missachtet robots.txt)
  • http://www.crazywebcrawler.com/
  • https://www.companiesintheuk.co.uk/bot.html
  • https://www.dataprovider.com/spider/
  • https://www.deepcrawl.com/bot/
  • https://benbernardblog.com/der-bot/
  • https://support.dlvrit.com/hc/en-us/articles/200402934-How-do-I-block-dlvr-it-from-retrieving-the-feeds-on-my-site-
  • (http://www.profound.net/domainappender)
  • http://domainstats.io/our-bot
  • https://moz.com/researchtools/ose/dotbot (previously Ezooms bot!?)
  • http://adressendeutschland.de/ (obeys specific rule!?)
  • https://www.duedil.com/our-crawler
  • http://www.envolk.com/envolkspiderinfo.html
  • http://www.expertsearch.nl/spider (redirects to JobdiggerSpider)
  • https://extlinks.com/Bot.html
  • http://www.findxbot.com/
  • (http://www.garlik.com)
  • http://www.genieo.com/webfilter.html
  • https://www.gloomar.com/bot
  • http://www.grapeshot.com/crawler/
  • http://www.huaweisymantec.com/en/IRL/spider/ (via archive.org)
  • https://knowledge.hubspot.com/articles/kcs_article/reports/why-do-i-get-an-error-in-page-performance-that-the-crawler-is-blocked-by-robots-txt-for-my-hubspot-staging-domain
  • https://knowledge.hubspot.com/articles/kcs_article/reports/why-do-i-get-an-error-in-page-performance-that-the-crawler-is-blocked-by-robots-txt-for-my-hubspot-staging-domain
  • http://www.seograph.net/bot.html
  • http://www.hypestat.com/bot
  • https://integralads.com/site-indexing-policy/
  • https://www.icjobs.de/bot.htm
  • http://www.idmarch.org/bot.html
  • (http://implisense.com/)
  • http://irl.cs.tamu.edu/crawler/
  • https://cognitiveseo.com/bot.html
  • https://www.jobb�rse.com/bot.htm
  • http://www.jobdigger.nl/spider/
  • http://static.kalooga.com/legal/crawler.html
  • (http://linkfluence.net/)
  • (http://larbin.sourceforge.net/index-eng.html)
  • https://www.linkdex.com/en-us/about/bots/ (doesn't obey specific rule)
  • http://www.linkpad.ru
  • https://www.lipperhey.com/en/about/website-spider/ (doesn't obey specific rule)
  • http://ltx71.com/
  • https://www.brandwatch.com/magpie-crawler/
  • http://mappydata.net/#eng
  • http://www.meanpath.com/meanpathbot.html
  • http://www.metajob.de/the/crawler
  • http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
  • https://www.neofonie.de/spider/
  • http://nerdybot.com/
  • http://www.netseer.com/crawler/
  • http://www.zoominfo.com/About/misc/NextGenSearchBot.aspx (via arcvhive.org)
  • http://nutch.apache.org/bot.html
  • http://filterdb.iss.net/crawler/
  • https://www.openindex.io/saas/about-our-spider/
  • http://www.panscient.com/faq.htm
  • https://pipl.com/bot/
  • http://www.botje.com/plukkie.htm
  • https://www.comscore.com/proximic-spider
  • https://www.semanticjuice.com/web-crawler.php
  • (bixocrawler)
  • R6_FeedFetcher + R6_CommentReader (www.radian6.com/crawler)
  • https://ranksonic.com/ranksonic_sab.html
  • https://raven.zendesk.com/hc/en-us/articles/203221440-How-do-I-slow-down-Site-Auditor-s-crawl-of-my-website-
  • (https://raventools.com/)
  • https://redbot.org/about/
  • (http://g2pi.tsc.uc3m.es/)
  • http://riddler.io/about
  • http://moz.com/help/pro/what-is-rogerbot- (via archive.org)
  • https://www.safedns.com/searchbot
  • http://www.secretsearchenginelabs.com/secret-web-crawler.php
  • http://scoutjet.com/
  • (Blekkobot)
  • https://www.screamingfrog.co.uk/seo-spider/user-guide/general/
  • http://de.semrush.com/bot/
  • http://sentibot.eu/
  • https://seocharger.com/robot
  • http://www.seokicks.de/robot.html
  • http://crawler.seolytics.net/
  • http://crawler.seolytics.net/
  • http://www.searchgears.com/ueber-uns/crawling-faq.html
  • http://crawler.sistrix.net/
  • User-agent: sistrix
  • Disallow: /
  • http://siteexplorer.info/about.html
  • http://www.sitesucker.us/mac/limitations.html
  • https://smabbler.com/en/Home/About
  • https://www.similartech.com/smtbot
  • http://www.openlinkprofiler.org/bot
  • http://nlp.fi.muni.cz/projects/biwec/
  • http://www.tkl.iis.u-tokyo.ac.jp/~crawler/
  • http://sur.ly/bot.html
  • https://swiftype.com/swiftbot
  • http://www.thumbshots.de/content-39-seite_auszuschliessen.html
  • https://thumbsniper.com/cat/news/ (via Google cache)
  • http://search.thunderstone.com/texis/websearch/about.html
  • (https://www.tineye.com/crawler.html)
  • http://www.toshiba.co.jp/rdc/about/crawl_info_en.htm
  • http://www.trendiction.com/de/publisher/bot
  • https://turnitin.com/robot/crawlerinfo.html
  • https://blog.ucoz.ru/upolicy/
  • https://uptime.com/uptimebot
  • http://www.wise-guys.nl/webcrawler.php
  • (https://blog.vebidoo.de/vebidoobot/)
  • https://velen.io/
  • http://80legs.com/the-80legs-web-crawler/
  • http://warebay.com/bot.html (expired)
  • http://rvs.informatik.uni-leipzig.de/bot.php
  • http://www.wesee.com/bot (via archive.org)
  • http://www.wikido.com/wikido.php
  • http://www.wotbox.com/bot/
  • http://x28.ch/bot.html
  • http://www.xovibot.net/
  • https://suite.seozoom.it/bot.html
  • En condiciones normales este es el sitemap

Warnings

  • 2 invalid lines.