cds.unistra.fr
robots.txt

Robots Exclusion Standard data for cds.unistra.fr

Resource Scan

Scan Details

Site Domain cds.unistra.fr
Base Domain unistra.fr
Scan Status Ok
Last Scan2024-09-27T19:06:09+00:00
Next Scan 2024-10-27T19:06:09+00:00

Last Scan

Scanned2024-09-27T19:06:09+00:00
URL https://cds.unistra.fr/robots.txt
Domain IPs 130.79.128.30
Response IP 130.79.128.30
Found Yes
Hash 1c9f8427abec73e5cd99148eeea9ef735696ff0217ceb647e5d8448c3817262e
SimHash 74185912c4c4

Groups

*

Rule Path
Disallow /*?
Disallow /twikiAIDA/
Disallow /twikiDCA/

Other Records

Field Value
crawl-delay 10

ai2bot
ai2bot-dolma
amazonbot
applebot
applebot-extended
bytespider
ccbot
chatgpt-user
claude-web
claudebot
diffbot
facebookbot
friendlycrawler
gptbot
google-extended
googleother
googleother-image
googleother-video
icc-crawler
imagesiftbot
meta-externalagent
meta-externalfetcher
oai-searchbot
perplexitybot
petalbot
scrapy
timpibot
velenpublicwebcrawler
webzio-extended
youbot
anthropic-ai
cohere-ai
facebookexternalhit
img2dataset
omgili
omgilibot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://cds.unistra.fr/sitemap.xml

Comments

  • Pages with GET parameters can be seen as new URLs.
  • The following rule disallow all of them:
  • Disallow all legacy Twikis:
  • Crawlers supporting this option will explore pages every 10 seconds:
  • (source: https://robots-txt.com/ressources/robots-txt-crawl-delay/)
  • Disallow all of the followings (AI) bots:
  • (copy-paste from https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt ;
  • version: 3 September 2024 / commit: fb5c995243c74389117589ed2a2b6d68abbb9a72)
  • Help robots to index our pages: