gazzettino.it
robots.txt

Robots Exclusion Standard data for gazzettino.it

Resource Scan

Scan Details

Site Domain gazzettino.it
Base Domain gazzettino.it
Scan Status Ok
Last Scan2024-06-22T02:12:00+00:00
Next Scan 2024-06-29T02:12:00+00:00

Last Scan

Scanned2024-06-22T02:12:00+00:00
URL https://gazzettino.it/robots.txt
Redirect https://www.ilgazzettino.it/robots.txt
Redirect Domain www.ilgazzettino.it
Redirect Base ilgazzettino.it
Response IP 34.149.236.87
Found Yes
Hash 481ae36050a9e0c73f4e8f488403e5578479152f8b5bd12b60c45f0fd68be1a9
SimHash b20c5374811f

Groups

*

Rule Path
Disallow /cache/
Disallow /includes/
Disallow /query_cache/
Disallow /cgi-bin/
Disallow *sez%3DJSON*
Disallow *sez%3DAJAX*
Disallow /view.php*
Disallow /view*
Disallow /ELEZIONI2014
Disallow /ANSAviewnews2.php*
Disallow /ANSAviewnews.php*
Disallow /articolo.php*
Disallow /articoloins.php
Disallow /articolo_app.php
Disallow /tag.php*
Disallow /ricerca.php*
Disallow /aprifoto.php*
Disallow /articolo_app.php*
Disallow /mobile/*
Disallow /foto.php*
Disallow /fotogallery.php*
Disallow /video.php*
Disallow /sondaggio.php*
Disallow /*.aspx
Disallow *p%3Dflashnews*
Disallow /twitter_share.php
Disallow /box_tuttomercato/index_tm.php
Disallow /casa/*
Disallow /box_ajax*
Disallow /dump_database.php?db=all
Disallow /admin_login.php
Disallow /sicurezza_stradale*
Disallow /dettaglio.php*
Disallow /box_pl*
Disallow /diretta_europei.php*
Disallow /mobile/
Disallow /38681514/
Disallow /flashnews/
Disallow *p%3Dall_news*
Disallow *?p=search*
Disallow /native_*
Disallow /speciale_eni-joule.html
Disallow /ultimissime_adn/*
Disallow /XXXleggitutte*
Disallow /u/*
Disallow /index.php/*
Disallow /sport/stats/*
Disallow /megapress/*
Disallow /?p=single_module*
Disallow /index.php?p=single_module*
Disallow /index.php?p=single_module_owl*
Disallow /*track_shop_event.php*
Disallow /monitor.php
Disallow /video/askanews/*
Disallow /video/adnkronos/*
Disallow *?p=informazioni_legali
Disallow /ricerca/*
Disallow /ansa_press_release*

umbot

Rule Path
Disallow /

umbot-ln

Rule Path
Disallow /

umbot-ic

Rule Path
Disallow /

arianna

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

arianna news

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

arianna web

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

arianna robot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

Other Records

Field Value
sitemap http://www.ilgazzettino.it/?sez=XML&p=MapNews

Comments

  • crawler senza utilità per noi
  • crawler per ora da tenere