/.well-known/

Log In Sign Up

cln.sh
robots.txt

Robots Exclusion Standard data for cln.sh

Archived Snapshots

Resource Scan

Scan Details

Site Domain	cln.sh
Base Domain	cln.sh
Scan Status	Ok
Last Scan	2024-10-22T04:43:17+00:00
Next Scan	2024-11-21T04:43:17+00:00

Last Scan

Scanned	2024-10-22T04:43:17+00:00
URL	https://cln.sh/robots.txt
Domain IPs	108.157.254.46, 108.157.254.68, 108.157.254.96, 108.157.254.99
Response IP	108.157.254.99
Found	Yes
Hash	0ecf1ed7db4568747ea3caff16bbc729be11ce86bd9d133f29193fa0615ff457
SimHash	1810f350cff3

Groups

*

Rule

Path

Disallow

/

googlebot

Rule

Path

Allow

/

ia_archiver

Rule

Path

Disallow

/

archive.org_bot

Rule

Path

Disallow

/

facebookexternalhit

Rule

Path

Allow

/

twitterbot

Rule

Path

Allow

/

Back to top

Comments

Block all crawlers by default
Allow Googlebot so it can read <meta name="robots" content="noindex"> tag and discard the page
Blocking Googlebot can still show pages in search results, just without a description
See https://developers.google.com/search/docs/crawling-indexing/robots/intro#what-is-a-robots.txt-file-used-for
Make sure web archives cannot save any page
Allow Facebook link previews
Allow Twitter link previews

Back to top