ilmattino.it
robots.txt

Robots Exclusion Standard data for ilmattino.it

Resource Scan

Scan Details

Site Domain ilmattino.it
Base Domain ilmattino.it
Scan Status Ok
Last Scan2024-09-23T07:43:43+00:00
Next Scan 2024-09-30T07:43:43+00:00

Last Scan

Scanned2024-09-23T07:43:43+00:00
URL https://ilmattino.it/robots.txt
Redirect https://www.ilmattino.it/robots.txt
Redirect Domain www.ilmattino.it
Redirect Base ilmattino.it
Domain IPs 34.149.236.87
Redirect IPs 34.149.236.87
Response IP 34.149.236.87
Found Yes
Hash 653fd51e21986b3c1734f2fb7b90d78a007476590fde8fb875db3479441ed751
SimHash 390c727ca013

Groups

*

Rule Path
Disallow /cache/
Disallow /includes/
Disallow /query_cache/
Disallow /cgi-bin/
Disallow *sez%3DJSON*
Disallow *sez%3DAJAX*
Disallow /view.php
Disallow /view
Disallow /view.php
Disallow /tag.php
Disallow /ELEZIONI2014
Disallow /ANSAviewnews.php*
Disallow /ANSAviewnews2.php*
Disallow /articolo.php*
Disallow /articoloins.php
Disallow /articolo_app.php
Disallow /tag.php*
Disallow /ricerca.php*
Disallow *?p=leggitutte*
Disallow /fotogallery.php
Disallow /video.php
Disallow /sondaggio.php
Disallow /aprifoto.php
Disallow /articoloins.php
Disallow /articolo_app.php
Disallow /elezioni.php
Disallow /stampa_articolo.php
Disallow /home_blog.php
Disallow /appello_cittadini.php
Disallow /webvoto.php
Disallow /PrimaPagina.php
Disallow /dump_database.php?db=all
Disallow /admin_login.php
Disallow *?p=sondaggio*
Disallow *?p=sondaggio
Disallow *?p=print*
Disallow /casa/*
Disallow /CASA/*
Disallow /flashnews/*
Disallow /video.php*
Disallow /sondaggio.php*
Disallow /aprifoto.php*
Disallow /articoloins_app.php*
Disallow /stampa_articoloins.php*
Disallow /twitter_share.php*
Disallow /specialemondiali.php*
Disallow /box_ajax_pl.php*
Disallow /ModuloRicerca.php
Disallow /twitter_share.php
Disallow /box_ajax_pc.php
Disallow /box_ajax_pl.php
Disallow /boxanordest.php
Disallow /home_meteo.php
Disallow /Informazioni/
Disallow /casa/
Disallow /*?p=flashnews&n=*
Disallow /u/*
Disallow /Mattino/*
Disallow /fisco/*
Disallow /focus/*
Disallow /norme/*
Disallow /guide/*
Disallow *BANNER_SHOP*
Disallow /mobile/
Disallow /docs/
Disallow /posta.php
Disallow /meetic.php
Disallow /boxnewspl.php
Disallow /home_blog.php
Disallow /articoloins.php
Disallow /commenti.php
Disallow /aprifoto.php*
Disallow /specialemondiali.php
Disallow /sondaggionew.php
Disallow /chisiamo.php
Disallow /dilloalmessaggero.php
Disallow /articoloins.php
Disallow /contatti.php
Disallow /contatti
Disallow /ricerca_arc.php*
Disallow /tag/*
Disallow /leggitutte*
Disallow /include/*
Disallow /home_page*
Disallow /mobile/
Disallow /tetractis/*
Disallow /sport/messaggero/*
Disallow /registrazione.html
Disallow /?p=leggitutte*
Disallow /index.php?p=leggitutte*
Disallow /index.php?p=search*
Disallow /index.php?p=print*
Disallow /sport/stats/*
Disallow /megapress/*
Disallow /?p=single_module*
Disallow /index.php?p=single_module*
Disallow /index.php?p=single_module_owl*
Disallow /*.shtml%20$
Disallow /38681514/
Disallow /home_*
Disallow /index.php?p=search*
Disallow /?p=search*
Disallow /ricerca/*
Disallow *?p=search*
Disallow *?p=print*
Disallow /native_*
Disallow /speciale_eni-joule.html
Disallow /ultimissime_adn/*
Disallow /index.php/*
Disallow /t/*/0*
Disallow /t/*/1*
Disallow /t/*/2*
Disallow /t/*/3*
Disallow /t/*/4*
Disallow /t/*/5*
Disallow /t/*/6*
Disallow /t/*/7*
Disallow /t/*/8*
Disallow /t/*/9*
Disallow /t/*/a*
Disallow /t/*/b*
Disallow /t/*/c*
Disallow /t/*/d*
Disallow /t/*/e*
Disallow /t/*/f*
Disallow /t/*/g*
Disallow /t/*/h*
Disallow /t/*/i*
Disallow /t/*/j*
Disallow /t/*/k*
Disallow /t/*/l*
Disallow /t/*/m*
Disallow /t/*/n*
Disallow /t/*/o*
Disallow /t/*/p*
Disallow /t/*/q*
Disallow /t/*/r*
Disallow /t/*/s*
Disallow /t/*/t*
Disallow /t/*/u*
Disallow /t/*/v*
Disallow /t/*/w*
Disallow /t/*/x*
Disallow /t/*/y*
Disallow /t/*/z*
Disallow /t/*/A*
Disallow /t/*/B*
Disallow /t/*/C*
Disallow /t/*/D*
Disallow /t/*/E*
Disallow /t/*/F*
Disallow /t/*/G*
Disallow /t/*/H*
Disallow /t/*/I*
Disallow /t/*/J*
Disallow /t/*/K*
Disallow /t/*/L*
Disallow /t/*/M*
Disallow /t/*/N*
Disallow /t/*/O*
Disallow /t/*/P*
Disallow /t/*/Q*
Disallow /t/*/R*
Disallow /t/*/S*
Disallow /t/*/T*
Disallow /t/*/U*
Disallow /t/*/V*
Disallow /t/*/W*
Disallow /t/*/X*
Disallow /t/*/Y*
Disallow /t/*/Z*
Disallow /t/*/?*
Disallow /t/*/%*
Disallow /*track_shop_event.php*
Disallow /monitor.php
Disallow /video/askanews/*
Disallow /video/adnkronos/*
Disallow *?p=informazioni_legali
Disallow /ricerca/*
Disallow /ansa_press_release*

umbot

Rule Path
Disallow /

umbot-ln

Rule Path
Disallow /

umbot-ic

Rule Path
Disallow /

arianna

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

arianna news

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

arianna web

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

arianna robot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

gptbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

imagesiftbot

Rule Path
Disallow /

meltwater

Rule Path
Disallow /

seekr

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.ilmattino.it/?sez=XML&p=MapNews

Comments

  • crawler senza utilità per noi
  • crawler per ora da tenere