republica.gt
robots.txt

Robots Exclusion Standard data for republica.gt

Resource Scan

Scan Details

Site Domain republica.gt
Base Domain republica.gt
Scan Status Ok
Last Scan2024-11-09T16:35:17+00:00
Next Scan 2024-11-16T16:35:17+00:00

Last Scan

Scanned2024-11-09T16:35:17+00:00
URL https://republica.gt/robots.txt
Domain IPs 104.21.34.56, 172.67.155.30, 2606:4700:3031::ac43:9b1e, 2606:4700:3033::6815:2238
Response IP 104.21.34.56
Found Yes
Hash 47285b24038cd86771177a3f4c0d8b8c50599b695490c3d7def5ab202643417e
SimHash 1820d9612560

Groups

*

Rule Path
Allow /
Disallow /partido_detalle/
Disallow /portadas/
Disallow /buscar/
Disallow /search/
Disallow /wp-content/
Disallow /b/
Disallow /registro
Disallow /login
Disallow /registro
Disallow /recuperar-password
Disallow /i_cuenta_regresiva
Disallow /i_buscador_automotriz
Disallow /*.pdf$
Disallow /*?utm_*

google-extended

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

Other Records

Field Value
sitemap https://republica.gt/sitemap.xml
sitemap https://republica.gt/sitemap_lite.xml
sitemap https://republica.gt/sitemap-news.xml
sitemap https://republica.gt/category-sitemap.xml

Comments

  • robots.txt file for https://republica.gt *
  • Last updated: 06/06/2024