curriculumvitaeplantilla.es
robots.txt

Robots Exclusion Standard data for curriculumvitaeplantilla.es

Archived Snapshots

Resource Scan

Scan Details

Site Domain	curriculumvitaeplantilla.es
Base Domain	curriculumvitaeplantilla.es
Scan Status	Ok
Last Scan	2024-07-04T06:18:44+00:00
Next Scan	2024-07-11T06:18:44+00:00

Last Scan

Scanned	2024-07-04T06:18:44+00:00
URL	https://curriculumvitaeplantilla.es/robots.txt
Redirect	https://www.curriculumvitaeplantilla.es/robots.txt
Redirect Domain	www.curriculumvitaeplantilla.es
Redirect Base	curriculumvitaeplantilla.es
Domain IPs	185.2.151.38
Redirect IPs	185.2.151.38
Response IP	185.2.151.38
Found	Yes
Hash	92ad65fc6bb88f905be77092c55a827ee63926f08a8d353b352a8ad931ba70ed
SimHash	887cd2000413

Groups

*

Rule	Path
Disallow	/?s=
Disallow	/search
Allow	/feed/$
Disallow	/feed
Disallow	/comments/feed
Disallow	/*/feed/$
Disallow	/*/feed/rss/$
Disallow	/*/trackback/$
Disallow	///feed/$
Disallow	///feed/rss/$
Disallow	///trackback/$
Disallow	///*/feed/$
Disallow	///*/feed/rss/$
Disallow	///*/trackback/$

Rule

Path

Disallow

/?s=

Disallow

/search

Allow

/feed/$

Disallow

/feed

Disallow

/comments/feed

Disallow

/*/feed/$

Disallow

/*/feed/rss/$

Disallow

/*/trackback/$

Disallow

/*/*/feed/$

Disallow

/*/*/feed/rss/$

Disallow

/*/*/trackback/$

Disallow

/*/*/*/feed/$

Disallow

/*/*/*/feed/rss/$

Disallow

/*/*/*/trackback/$

msiecrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

webcopier

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

microsoft.url.control

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

ezooms

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandeximages

Rule	Path
Disallow	/

Rule

Path

Disallow

yandexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sogou

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

seznambot

Rule	Path
Disallow	/

Rule

Path

Disallow

wbsearchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix

Rule	Path
Disallow	/

Rule

Path

Disallow

jikespider

Rule	Path
Disallow	/

Rule

Path

Disallow

sosospider

Rule	Path
Disallow	/

Rule

Path

Disallow

proximic

Rule	Path
Disallow	/

Rule

Path

Disallow

noxtrumbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	50

Field

Value

crawl-delay

msnbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

slurp

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

mediapartners-google

Rule	Path
Disallow

Rule

Path

Disallow

Other Records

Field	Value
sitemap	http://curriculumvitaeplantilla.es/sitemap.xml

Field

Value

sitemap

http://curriculumvitaeplantilla.es/sitemap.xml

Comments

Sitemap permitido, busquedas no.
Permitimos el feed general para Google Blogsearch.
Impedimos que permalink/feed/ sea indexado ya que el
feed con los comentarios suele posicionarse en lugar de
la entrada y desorienta a los usuarios.
Lo mismo con URLs terminadas en /trackback/ que solo
sirven como Trackback URI (y son contenido duplicado).
A partir de aquiÂ es opcional pero recomendado.
Lista de bots que suelen respetar el robots.txt pero rara
vez hacen un buen uso del sitio y abusan bastante...
Anyadir al gusto del consumidor...

Warnings

4 invalid lines.

curriculumvitaeplantilla.esrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

msiecrawler

webcopier

httrack

microsoft.url.control

libwww

ezooms

baiduspider

ahrefsbot

yandeximages

yandexbot

sogou

mj12bot

seznambot

wbsearchbot

exabot

sistrix

jikespider

sosospider

proximic

noxtrumbot

Other Records

msnbot

Other Records

slurp

Other Records

mediapartners-google

Other Records

Comments

Warnings

curriculumvitaeplantilla.es
robots.txt