latin.it
robots.txt

Robots Exclusion Standard data for latin.it

Archived Snapshots

Resource Scan

Scan Details

Site Domain	latin.it
Base Domain	latin.it
Scan Status	Ok
Last Scan	2024-09-20T20:58:11+00:00
Next Scan	2024-09-27T20:58:11+00:00

Last Scan

Scanned	2024-09-20T20:58:11+00:00
URL	https://latin.it/robots.txt
Redirect	https://www.latin.it/robots.txt
Redirect Domain	www.latin.it
Redirect Base	latin.it
Domain IPs	188.11.177.203
Redirect IPs	188.11.177.203
Response IP	188.11.177.203
Found	Yes
Hash	69bc5746b2da3ec0bc509ebda112b2c501af70ae98e5f5e86aa2ce19591b6f9d
SimHash	38065025647d

Groups

googlebot

Rule	Path
Disallow

Rule

Path

Disallow

mediapartners-google

Rule	Path
Disallow

Rule

Path

Disallow

bingbot

Rule	Path
Disallow

Rule

Path

Disallow

msnbot

Rule	Path
Disallow

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

admantx

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/cgi-bin/
Disallow	/css.php
Disallow	/search.htm
Disallow	/app

Rule

Path

Disallow

/cgi-bin/

Disallow

/css.php

Disallow

/search.htm

Disallow

/app

Comments

AI
GPTBot for OpenAI's web crawler bot;
ChatGPT-User for ChatGPT plugins bot.
Google-Extended block Google from scraping your site for Bard and VertexAI.
CCBot, the bot used by the Common Crawl. This data has been used by ChatGPT, Bard, and others for training a number of models
Omgilibot is from webz.io. I noticed The New York Times was blocking them and discovered that they sell data for training LLMs.
FacebookBot is Meta's bot that crawls public web pages to improve language models for their speech recognition technology. This is not what Facebook uses to get the image and snippet for when you post a link there.
Crawl-delay: 3 #regola ignorata# aspetta 3 secondi dopo l'ultima pagina scaricata con successo
Request-rate: 1/5 # Visita al massimo una pagina ogni 5 secondi #Sintassi non comprensibile
Visit-time: 2100-0500 # only visit between 21:00 (9PM) and 05:00 (5AM) UTC (GMT)
Visit-time: 0100-0745 # Visita soltanto tra le 1:00 AM e le 7:45 AM UT (GMT) G interr. 14:41 B e Y Non considera
Crawl-delay: It defines how many seconds to wait after each succesful crawling.
Request-rate: defines pages/seconds to be crawled ratio. 1/30 would be 1 page in every 30 second.
Visit-time: you can define between which hours you want your pages to be crawled. Example usage is: 1500-1700 which means that pages will be indexed between 03:00 PM Ã¢â¬â 05:00 PM GMT.

latin.itrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

googlebot

mediapartners-google

bingbot

msnbot

gptbot

chatgpt-user

google-extended

ccbot

omgili

omgilibot

facebookbot

admantx

semrushbot

yandex

dotbot

*

Comments

latin.it
robots.txt