diariolalibertad.com
robots.txt

Robots Exclusion Standard data for diariolalibertad.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	diariolalibertad.com
Base Domain	diariolalibertad.com
Scan Status	Ok
Last Scan	2024-09-29T23:32:09+00:00
Next Scan	2024-10-06T23:32:09+00:00

Last Scan

Scanned	2024-09-29T23:32:09+00:00
URL	https://diariolalibertad.com/robots.txt
Domain IPs	198.136.58.130
Response IP	198.136.58.130
Found	Yes
Hash	d21b906594e173a6b911afe97e8e93c5e050fe37acf3986063e5f2cec0aa068c
SimHash	285014416068

Groups

*

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

googlebot-news

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

googlebot-image

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

googlebot-video

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

googlebot-mobile

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

mediapartners-google

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

cxensebot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

*

Rule	Path
Allow	/

Rule

Path

Allow

*

Rule	Path
Disallow	/wp-admin/admin-ajax.php
Disallow	/wp-login
Disallow	/wp-admin
Disallow	/*/feed/
Disallow	/*/trackback/
Disallow	/*/attachment/
Disallow	/author/
Disallow	*?replytocom
Disallow	/tag/*/page/
Disallow	/tag/*/feed/
Disallow	/comments/
Disallow	/xmlrpc.php
Disallow	/*?s=
Disallow	///*/feed.xml
Disallow	/?attachment_id*
Disallow	/search

Rule

Path

Disallow

/wp-admin/admin-ajax.php

Disallow

/wp-login

Disallow

/wp-admin

Disallow

/*/feed/

Disallow

/*/trackback/

Disallow

/*/attachment/

Disallow

/author/

Disallow

*?replytocom

Disallow

/tag/*/page/

Disallow

/tag/*/feed/

Disallow

/comments/

Disallow

/xmlrpc.php

Disallow

/*?s=

Disallow

/*/*/*/feed.xml

Disallow

/?attachment_id*

Disallow

/search

googlebot

Rule	Path
Allow	/*.css$
Allow	/*.js$

Rule

Path

Allow

/*.css$

Allow

/*.js$

facebookexternalhit

Rule	Path
Disallow

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bingbot

Rule	Path
Disallow	/

Rule

Path

Disallow

*

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

sistrix

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix

Rule	Path
Disallow	/

Rule

Path

Disallow

seokicks-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

jobs.de-robot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

unisterbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

searchmetricsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

surveybot

Rule	Path
Disallow	/

Rule

Path

Disallow

seodiver

Rule	Path
Disallow	/

Rule

Path

Disallow

spbot

Rule	Path
Disallow	/

Rule

Path

Disallow

wotbox

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meanpathbot

Rule	Path
Disallow	/

Rule

Path

Disallow

backlinkcrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

magpie-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

obot

Rule	Path
Disallow	/

Rule

Path

Disallow

fr-crawler

Rule	Path
Disallow	/

Rule

Path

Disallow

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex.ru

Rule	Path
Disallow	/

Rule

Path

Disallow

megaindex.com

Rule	Path
Disallow	/

Rule

Path

Disallow

cloudservermarketspider

Rule	Path
Disallow	/

Rule

Path

Disallow

trendictionbot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

careerbot

Rule

Path

Disallow

lipperhey-kaus-australis

Rule

Path

Disallow

seoscanners.net

Rule

Path

Disallow

metajobbot

Rule

Path

Disallow

spiderbot

Rule

Path

Disallow

linkstats

Rule

Path

Disallow

jobboersebot

Rule

Path

Disallow

iccrawler

Rule

Path

Disallow

plista

Rule

Path

Disallow

domain re-animator bot

Rule

Path

Disallow

lipperhey-kaus-australis

Rule

Path

Disallow

turnitinbot

Rule

Path

Disallow

coccoc

Rule

Path

Disallow

um-ic

Rule

Path

Disallow

mindupbot

Rule

Path

Disallow

sg-orbiter

Rule

Path

Disallow

ccbot

Rule

Path

Disallow

qwantify

Rule

Path

Disallow

kraken

Rule

Path

Disallow

plukkie

Rule

Path

Disallow

safednsbot

Rule

Path

Disallow

haosouspider

Rule

Path

Disallow

rogerbot

Rule

Path

Disallow

openhosebot

Rule

Path

Disallow

screaming frog seo spider

Rule

Path

Disallow

thumbsniper

Rule

Path

Disallow

r6_commentreader

Rule

Path

Disallow

implisensebot

Rule

Path

Disallow

cliqzbot

Rule

Path

Disallow

aihitbot

Rule

Path

Disallow

trendictionbot

Rule

Path

Disallow

wbsearchbot

Rule

Path

Disallow

companybook-crawler

Rule

Path

Disallow

companybook

Rule

Path

Disallow

mj12bot

Rule

Path

Disallow

ahrefsbot

Rule

Path

Disallow

sogou spider

Rule

Path

Disallow

seokicks-robot

Rule

Path

Disallow

blexbot

Rule

Path

Disallow

sistrix crawler

Rule

Path

Disallow

uptimerobot/2.0

Rule

Path

Disallow

ezooms robot

Rule

Path

Disallow

perl lwp

Rule

Path

Disallow

blexbot

Rule

Path

Disallow

netestate ne crawler (+http://www.website-datenbank.de/)

Rule

Path

Disallow

wiseguys robot

Rule

Path

Disallow

turnitin robot

Rule

Path

Disallow

turnitinbot

Rule

Path

Disallow

turnitin bot

Rule

Path

Disallow

turnitinbot/3.0 (http://www.turnitin.com/robot/crawlerinfo.html)

Rule

Path

Disallow

turnitinbot/3.0

Rule

Path

Disallow

heritrix

Rule

Path

Disallow

pimonster

Rule

Path

Disallow

pimonster

Rule

Path

Disallow

searchmetricsbot

Rule

Path

Disallow

eccp/1.0 (search@eniro.com)

Rule

Path

Disallow

yandex

Rule

Path

Disallow

baiduspider
baiduspider-video
baiduspider-image
mozilla/5.0 (compatible; baiduspider/2.0; +http://www.baidu.com/search/spider.html)
mozilla/5.0 (compatible; baiduspider/3.0; +http://www.baidu.com/search/spider.html)
mozilla/5.0 (compatible; baiduspider/4.0; +http://www.baidu.com/search/spider.html)
mozilla/5.0 (compatible; baiduspider/5.0; +http://www.baidu.com/search/spider.html)
baiduspider/2.0
baiduspider/3.0
baiduspider/4.0
baiduspider/5.0

Rule

Path

Disallow

sogou spider

Rule

Path

Disallow

youdaobot

Rule

Path

Disallow

gsa-crawler (enterprise; t4-knhh62cdkc2w3; gsa_manage@nikon-sys.co.jp)

Rule

Path

Disallow

megaindex.ru/2.0

Rule

Path

Disallow

megaindex.ru

Rule

Path

Disallow

megaindex.ru

Rule

Path

Disallow

Other Records

Field

Value

sitemap

https://diariolalibertad.com/sitio/sitemap_index.xml

Comments

robots.txt
This file is to prevent the crawling and indexing of certain parts
of your site by web crawlers and spiders run by sites like Yahoo!
and Google. By telling these "robots" where not to go on your site,
you save bandwidth and server resources.
This file will be ignored unless it is at the root of your host:
Used: http://example.com/robots.txt
Ignored: http://example.com/site/robots.txt
For more information about the robots.txt standard, see:
http://www.robotstxt.org/robotstxt.html
Agents
Slow down bots
Disallow: Sistrix
Disallow: Sistrix
Disallow: Sistrix
Disallow: SEOkicks-Robot
Disallow: jobs.de-Robot
Backlink Analysis
Bot der Leipziger Unister Holding GmbH
http://moz.com/products
http://www.searchmetrics.com
http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
http://www.domaintools.com/webmasters/surveybot.php
http://www.seodiver.com/bot
http://openlinkprofiler.org/bot
http://www.wotbox.com/bot/
http://www.opensiteexplorer.org/dotbot
http://moz.com/researchtools/ose/dotbot
http://www.meanpath.com/meanpathbot.html
http://www.backlinktest.com/crawler.html
http://www.brandwatch.com/magpie-crawler/
http://filterdb.iss.net/crawler/
http://webmeup-crawler.com
https://megaindex.com/crawler
http://www.cloudservermarket.com
http://www.trendiction.de/de/publisher/bot
http://www.exalead.com
http://www.career-x.de/bot.html
https://www.lipperhey.com/en/about/
https://www.lipperhey.com/en/about/
https://turnitin.com/robot/crawlerinfo.html
http://help.coccoc.com/
ubermetrics-technologies.com
datenbutler.de
http://searchgears.de/uber-uns/crawling-faq.html
http://commoncrawl.org/faq/
https://www.qwant.com/
http://linkfluence.net/
http://www.botje.com/plukkie.htm
https://www.safedns.com/searchbot
http://www.haosou.com/help/help_3_2.html
http://www.haosou.com/help/help_3_2.html
http://www.moz.com/dp/rogerbot
http://www.openhose.org/bot.html
http://www.screamingfrog.co.uk/seo-spider/
http://thumbsniper.com
http://www.radian6.com/crawler
http://cliqz.com/company/cliqzbot
https://www.aihitdata.com/about
http://www.trendiction.com/en/publisher/bot
http://warebay.com/bot.html
Block Companybook-Crawler
Block Companybook-Crawler
Block MJ12bot as it is just noise
Block Ahrefs
Block Sogou
Block SEOkicks
Block BlexBot
Block SISTRIX
Block Uptime robot
Block Ezooms Robot
Block Perl LWP
Block BlexBot
Block netEstate NE Crawler (+http://www.website-datenbank.de/)
Block WiseGuys Robot
Block Turnitin Robot
Block Heritrix
Block pricepi
Block Searchmetrics Bot
Block Eniro
Block YandexBot
Block Baidu
Block SoGou
Block Youdao
Block Nikon JP Crawler
Block MegaIndex.ru

Warnings

4 invalid lines.

diariolalibertad.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

googlebot

Other Records

googlebot-news

Other Records

googlebot-image

Other Records

googlebot-video

Other Records

googlebot-mobile

Other Records

mediapartners-google

Other Records

cxensebot

Other Records

*

*

googlebot

facebookexternalhit

semrushbot

bingbot

*

Other Records

sistrix

sistrix crawler

sistrix

seokicks-robot

jobs.de-robot

ahrefsbot

unisterbot

dotbot

searchmetricsbot

mj12bot

surveybot

seodiver

spbot

wotbox

dotbot

meanpathbot

backlinkcrawler

magpie-crawler

obot

fr-crawler

blexbot

megaindex.ru

megaindex.com

cloudservermarketspider

trendictionbot

exabot

careerbot

lipperhey-kaus-australis

seoscanners.net

metajobbot

spiderbot

linkstats

jobboersebot

iccrawler

plista

domain re-animator bot

lipperhey-kaus-australis

turnitinbot

coccoc

um-ic

mindupbot

sg-orbiter

ccbot

qwantify

kraken

plukkie

safednsbot

haosouspider

rogerbot

openhosebot

screaming frog seo spider

thumbsniper

diariolalibertad.com
robots.txt