cds.unistra.fr
robots.txt

Robots Exclusion Standard data for cds.unistra.fr

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cds.unistra.fr
Base Domain	unistra.fr
Scan Status	Ok
Last Scan	2024-09-27T19:06:09+00:00
Next Scan	2024-10-27T19:06:09+00:00

Last Scan

Scanned	2024-09-27T19:06:09+00:00
URL	https://cds.unistra.fr/robots.txt
Domain IPs	130.79.128.30
Response IP	130.79.128.30
Found	Yes
Hash	1c9f8427abec73e5cd99148eeea9ef735696ff0217ceb647e5d8448c3817262e
SimHash	74185912c4c4

Groups

*

Rule	Path
Disallow	/*?
Disallow	/twikiAIDA/
Disallow	/twikiDCA/

Rule

Path

Disallow

/*?

Disallow

/twikiAIDA/

Disallow

/twikiDCA/

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

10

ai2bot
ai2bot-dolma
amazonbot
applebot
applebot-extended
bytespider
ccbot
chatgpt-user
claude-web
claudebot
diffbot
facebookbot
friendlycrawler
gptbot
google-extended
googleother
googleother-image
googleother-video
icc-crawler
imagesiftbot
meta-externalagent
meta-externalfetcher
oai-searchbot
perplexitybot
petalbot
scrapy
timpibot
velenpublicwebcrawler
webzio-extended
youbot
anthropic-ai
cohere-ai
facebookexternalhit
img2dataset
omgili
omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://cds.unistra.fr/sitemap.xml

Field

Value

sitemap

https://cds.unistra.fr/sitemap.xml

Back to top

Comments

Pages with GET parameters can be seen as new URLs.
The following rule disallow all of them:
Disallow all legacy Twikis:
Crawlers supporting this option will explore pages every 10 seconds:
(source: https://robots-txt.com/ressources/robots-txt-crawl-delay/)
Disallow all of the followings (AI) bots:
(copy-paste from https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt ;
version: 3 September 2024 / commit: fb5c995243c74389117589ed2a2b6d68abbb9a72)
Help robots to index our pages:

Back to top

cds.unistra.frrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

Other Records

Comments

cds.unistra.fr
robots.txt