roma.repubblica.it
robots.txt

Robots Exclusion Standard data for roma.repubblica.it

Resource Scan

Scan Details

Site Domain roma.repubblica.it
Base Domain repubblica.it
Scan Status Ok
Last Scan2024-04-30T22:21:40+00:00
Next Scan 2024-05-07T22:21:40+00:00

Last Scan

Scanned2024-04-30T22:21:40+00:00
URL https://roma.repubblica.it/robots.txt
Domain IPs 13.33.30.108, 13.33.30.124, 13.33.30.33, 13.33.30.96
Response IP 13.33.30.33
Found Yes
Hash de379122dba0985483c3558373f18d7547f6a71210a05592d46ead1da32e5a4f
SimHash 628c192321b3

Groups

*

Rule Path
Allow /
Disallow /ristoranti/
Disallow /multimedia/
Disallow /dettaglio/
Disallow /dettaglio-news/
Disallow /cronaca/2023/04/19/news/ciampino_appalti_aeronautica_militare_generale_giovanni_fantuzzi-396713005
Disallow /cronaca/2022/02/18/news/stalking_regione_lazio_denuncia_fabio_desideri-338283754/
Disallow /cronaca/2017/03/16/news/roma_mazzette_alle_asl_dai_centri_privati_norme_a_gettone-160663480/
Disallow /cronaca/2012/08/02/news/l_autista_di_piccolo_e_i_casamonica_fermatomentre_comprava_la_droga-40181764/
Disallow /cronaca/2012/07/25/news/false_fideiussioni_per_584mln_sette_misure_cautelari_e_sequestri-39686307
Disallow /blaize/datalayer

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://roma.repubblica.it/sitemap-n.xml