irph.org
robots.txt

Robots Exclusion Standard data for irph.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	irph.org
Base Domain	irph.org
Scan Status	Ok
Last Scan	2024-11-16T02:29:22+00:00
Next Scan	2024-11-23T02:29:22+00:00

Last Scan

Scanned	2024-11-16T02:29:22+00:00
URL	https://irph.org/robots.txt
Redirect	https://www.irph.org/robots.txt
Redirect Domain	www.irph.org
Redirect Base	irph.org
Domain IPs	104.21.17.220, 172.67.178.154, 2606:4700:3035::6815:11dc, 2606:4700:3035::ac43:b29a
Redirect IPs	104.21.17.220, 172.67.178.154, 2606:4700:3035::6815:11dc, 2606:4700:3035::ac43:b29a
Response IP	104.21.17.220
Found	Yes
Hash	9acf0251f102a85cb6a8a354291b3726c2ee7faaf7f9f64ea1465f8b7e1c7d45
SimHash	887c52040413

Groups

*

Rule	Path
Disallow	/?s=
Disallow	/search
Allow	/feed/$
Disallow	/feed
Disallow	/comments/feed
Disallow	/*/feed/$
Disallow	/*/feed/rss/$
Disallow	/*/trackback/$
Disallow	///feed/$
Disallow	///feed/rss/$
Disallow	///trackback/$
Disallow	///*/feed/$
Disallow	///*/feed/rss/$
Disallow	///*/trackback/$

Rule

Path

Disallow

/?s=

Disallow

/search

Allow

/feed/$

Disallow

/feed

Disallow

/comments/feed

Disallow

/*/feed/$

Disallow

/*/feed/rss/$

Disallow

/*/trackback/$

Disallow

/*/*/feed/$

Disallow

/*/*/feed/rss/$

Disallow

/*/*/trackback/$

Disallow

/*/*/*/feed/$

Disallow

/*/*/*/feed/rss/$

Disallow

/*/*/*/trackback/$

msiecrawler

Rule	Path
Disallow	/

Rule

Path

Disallow

webcopier

Rule	Path
Disallow	/

Rule

Path

Disallow

httrack

Rule	Path
Disallow	/

Rule

Path

Disallow

microsoft.url.control

Rule	Path
Disallow	/

Rule

Path

Disallow

libwww

Rule	Path
Disallow	/

Rule

Path

Disallow

ezooms

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandeximages

Rule	Path
Disallow	/

Rule

Path

Disallow

yandexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

sogou

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

seznambot

Rule	Path
Disallow	/

Rule

Path

Disallow

wbsearchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix

Rule	Path
Disallow	/

Rule

Path

Disallow

jikespider

Rule	Path
Disallow	/

Rule

Path

Disallow

sosospider

Rule	Path
Disallow	/

Rule

Path

Disallow

proximic

Rule	Path
Disallow	/

Rule

Path

Disallow

noxtrumbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	50

Field

Value

crawl-delay

msnbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

slurp

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

mediapartners-google

Rule	Path
Disallow

Rule

Path

Disallow

Other Records

Field	Value
sitemap	http://irph.org/sitemap.xml

Field

Value

sitemap

http://irph.org/sitemap.xml

Comments

Sitemap permitido, busquedas no.
Permitimos el feed general para Google Blogsearch.
Impedimos que permalink/feed/ sea indexado ya que el
feed con los comentarios suele posicionarse en lugar de
la entrada y desorienta a los usuarios.
Lo mismo con URLs terminadas en /trackback/ que solo
sirven como Trackback URI (y son contenido duplicado).
A partir de aquiÂ es opcional pero recomendado.
Lista de bots que suelen respetar el robots.txt pero rara
vez hacen un buen uso del sitio y abusan bastante...
Anyadir al gusto del consumidor...

Warnings

4 invalid lines.

irph.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

msiecrawler

webcopier

httrack

microsoft.url.control

libwww

ezooms

baiduspider

ahrefsbot

yandeximages

yandexbot

sogou

mj12bot

seznambot

wbsearchbot

exabot

sistrix

jikespider

sosospider

proximic

noxtrumbot

Other Records

msnbot

Other Records

slurp

Other Records

mediapartners-google

Other Records

Comments

Warnings

irph.org
robots.txt