unil.ch
robots.txt

Robots Exclusion Standard data for unil.ch

Resource Scan

Scan Details

Site Domain unil.ch
Base Domain unil.ch
Scan Status Ok
Last Scan2025-10-03T01:07:39+00:00
Next Scan 2025-11-02T01:07:39+00:00

Last Scan

Scanned2025-10-03T01:07:39+00:00
URL https://unil.ch/robots.txt
Redirect https://www.unil.ch/robots.txt
Redirect Domain www.unil.ch
Redirect Base unil.ch
Domain IPs 192.42.183.101
Redirect IPs 192.42.183.101
Response IP 192.42.183.101
Found Yes
Hash d5329db51cc66714473d8dbd2b28693ed9854d9d2c60101e25ea2ee0e47628be
SimHash 66904913e385

Groups

*

Rule Path
Disallow /0000-*/
Disallow /accueil/
Disallow /arobasque/
Disallow /bas*/
Disallow /*jahia/
Disallow /ex*/
Disallow /Jahia/
Disallow /refonte-*/
Disallow /testfr/
Disallow /uneseconde/
Disallow /webdav/site/*/groups/
Disallow /*?*actunilMenuParam=*
Disallow /*?*actunilParam=*
Disallow /*?*c=*
Disallow /*?*cl=*
Disallow /*?*doLogin=*
Disallow /*?*matrix=*
Disallow /*?*pubsIdParam=*
Disallow /*?*redirect=*
Disallow /*?*rememberme=*
Disallow /*?*set_language=*
Disallow /*?*showActu=*
Disallow /*?*showFrom=*
Disallow /*?*site=*
Disallow /*?*url_params=*
Disallow /*?*url=*
Disallow /*?*utm_campaign=*
Disallow /*?*utm_medium=*
Disallow /*?*utm_source_platform=*
Disallow /*?*utm_source=*
Disallow /*?*CSRFTOKEN=*
Disallow /*?*channelIds=*
Disallow /*?*sortedBy=*
Disallow /*?*status=*
Disallow /*?*publicationStatus=*
Disallow /*?*summarize=*
Disallow /*?*languages=*
Disallow /*?*size=*
Disallow /*?*windowDays=*
Disallow /*?*resourceType=*
Disallow /*?*resourceId=*
Disallow /*?*eco=*
Disallow /*?*beginEventDate=*
Disallow /*?*endEventDate=*
Disallow /*?*nodeIdK=*
Disallow /*?*parentNodeIdK=*
Disallow /*mobileMenu.do*
Disallow /*resourcesProxy.do*
Disallow /*generateEventIcs.do*
Disallow /*newsMostViewedProxy.do*
Disallow /*newsProxy.do*
Disallow /*eventsProxy.do*
Allow /

claudebot
claude-user
claude-searchbot
ccbot
googlebot-extended
applebot-extended
facebookbot
meta-externalagent
meta-externalfetcher
diffbot
perplexitybot
perplexity‑user
omgili
omgilibot
webzio-extended
imagesiftbot
bytespider
tiktokspider
amazonbot
youbot
semrushbot-ocob
petalbot
velenpublicwebcrawler
turnitinbot
timpibot
oai-searchbot
icc-crawler
ai2bot
ai2bot-dolma
dataforseobot
awariobot
awariosmartbot
awariorssbot
google-cloudvertexbot
pangubot
kangaroo bot
sentibot
img2dataset
meltwater
seekr
peer39_crawler
cohere-ai
cohere-training-data-crawler
duckassistbot
scrapy
cotoyogi
aihitbot
factset_spyderbot
firecrawlagent

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.unil.ch/sitemap_www.xml

Comments

  • 28-May-25: all user agents
  • 26-Feb-24: add sitemaps index
  • 11-Mar-25: exclude some ai bots
  • Block all known AI crawlers and assistants
  • from using content for training AI models.
  • Source: https://robotstxt.com/ai

Warnings

  • `disallowaitraining` is not a known field.