diariodeibiza.com
robots.txt

Robots Exclusion Standard data for diariodeibiza.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	diariodeibiza.com
Base Domain	diariodeibiza.com
Scan Status	Ok
Last Scan	2024-05-26T20:33:45+00:00
Next Scan	2024-06-02T20:33:45+00:00

Last Scan

Scanned	2024-05-26T20:33:45+00:00
URL	https://www.diariodeibiza.com/robots.txt
Domain IPs	199.232.194.133, 199.232.198.133
Response IP	146.75.94.133
Found	Yes
Hash	ce98aeab5341ef6b68be1179853432a0bfb6ad81b820be8898a31536b98a4f35
SimHash	a2f68e10477a

Groups

*

Rule	Path
Disallow	/wp-admin
Allow	/wp-admin/admin-ajax.php
Allow	/di-contenido/cache/
Disallow	/*?s=
Disallow	/?s=
Disallow	/search
Disallow	/?filter_by=*
Disallow	/page/
Allow	/feed/$
Disallow	*/feed$
Disallow	/comments/feed
Disallow	/*/feed/rss/$
Disallow	/*/trackback/$
Disallow	///feed/rss/$
Disallow	///trackback/$
Disallow	///*/feed/rss/$
Disallow	///*/trackback/$
Allow	/*.js$
Allow	/*.css$

Rule

Path

Disallow

/wp-admin

Allow

/wp-admin/admin-ajax.php

Allow

/di-contenido/cache/

Disallow

/*?s=

Disallow

/?s=

Disallow

/search

Disallow

/?filter_by=*

Disallow

/page/

Allow

/feed/$

Disallow

*/feed$

Disallow

/comments/feed

Disallow

/*/feed/rss/$

Disallow

/*/trackback/$

Disallow

/*/*/feed/rss/$

Disallow

/*/*/trackback/$

Disallow

/*/*/*/feed/rss/$

Disallow

/*/*/*/trackback/$

Allow

/*.js$

Allow

/*.css$

googlebot-image

Rule	Path
Allow	/di-contenido/files/

Rule

Path

Allow

/di-contenido/files/

adsbot-google

Rule	Path
Allow	/

Rule

Path

Allow

googlebot-mobile

Rule	Path
Allow	/

Rule

Path

Allow

msiecrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

webcopier

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

microsoft.url.control

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

noxtrumbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	50

Field

Value

crawl-delay

msnbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

slurp

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

Other Records

Field	Value
sitemap	https://www.diariodeibiza.com/sitemap_index.xml

Field

Value

sitemap

https://www.diariodeibiza.com/sitemap_index.xml

Comments

Sitemap permitido.
Búsquedas no permitidas
Por si algún rastreador hiciera búsquedas (pueden producir contenido duplicado):
Permitimos el feed general para Google Blogsearch.
Impedimos que permalink/feed/ sea indexado ya que el
feed con los comentarios suele posicionarse en lugar de
la entrada y desorienta a los usuarios.
Lo mismo con URLs terminadas en /trackback/ que sólo
sirven como Trackback URI (y son contenido duplicado).
Evitamos bloqueos de CSS y JS.
Lista de bots que se debería permitir.
Lista de bots que suelen respetar el robots.txt pero rara
vez hacen un buen uso del sitio.
Slurp (Yahoo!), Noxtrum y el bot de MSN que suelen generar excesivas consultas.

diariodeibiza.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

googlebot-image

adsbot-google

googlebot-mobile

msiecrawler

webcopier

httrack

microsoft.url.control

libwww

noxtrumbot

Other Records

msnbot

Other Records

slurp

Other Records

Other Records

Comments

diariodeibiza.com
robots.txt