latin.it
robots.txt

Robots Exclusion Standard data for latin.it

Resource Scan

Scan Details

Site Domain latin.it
Base Domain latin.it
Scan Status Ok
Last Scan2024-09-20T20:58:11+00:00
Next Scan 2024-09-27T20:58:11+00:00

Last Scan

Scanned2024-09-20T20:58:11+00:00
URL https://latin.it/robots.txt
Redirect https://www.latin.it/robots.txt
Redirect Domain www.latin.it
Redirect Base latin.it
Domain IPs 188.11.177.203
Redirect IPs 188.11.177.203
Response IP 188.11.177.203
Found Yes
Hash 69bc5746b2da3ec0bc509ebda112b2c501af70ae98e5f5e86aa2ce19591b6f9d
SimHash 38065025647d

Groups

googlebot

Rule Path
Disallow

mediapartners-google

Rule Path
Disallow

bingbot

Rule Path
Disallow

msnbot

Rule Path
Disallow

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

admantx

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

yandex

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

*

Rule Path
Disallow /cgi-bin/
Disallow /css.php
Disallow /search.htm
Disallow /app

Comments

  • AI
  • GPTBot for OpenAI's web crawler bot;
  • ChatGPT-User for ChatGPT plugins bot.
  • Google-Extended block Google from scraping your site for Bard and VertexAI.
  • CCBot, the bot used by the Common Crawl. This data has been used by ChatGPT, Bard, and others for training a number of models
  • Omgilibot is from webz.io. I noticed The New York Times was blocking them and discovered that they sell data for training LLMs.
  • FacebookBot is Meta's bot that crawls public web pages to improve language models for their speech recognition technology. This is not what Facebook uses to get the image and snippet for when you post a link there.
  • Crawl-delay: 3 #regola ignorata# aspetta 3 secondi dopo l'ultima pagina scaricata con successo
  • Request-rate: 1/5 # Visita al massimo una pagina ogni 5 secondi #Sintassi non comprensibile
  • Visit-time: 2100-0500 # only visit between 21:00 (9PM) and 05:00 (5AM) UTC (GMT)
  • Visit-time: 0100-0745 # Visita soltanto tra le 1:00 AM e le 7:45 AM UT (GMT) G interr. 14:41 B e Y Non considera
  • Crawl-delay: It defines how many seconds to wait after each succesful crawling.
  • Request-rate: defines pages/seconds to be crawled ratio. 1/30 would be 1 page in every 30 second.
  • Visit-time: you can define between which hours you want your pages to be crawled. Example usage is: 1500-1700 which means that pages will be indexed between 03:00 PM – 05:00 PM GMT.