cln.sh
robots.txt

Robots Exclusion Standard data for cln.sh

Resource Scan

Scan Details

Site Domain cln.sh
Base Domain cln.sh
Scan Status Ok
Last Scan2024-10-22T04:43:17+00:00
Next Scan 2024-11-21T04:43:17+00:00

Last Scan

Scanned2024-10-22T04:43:17+00:00
URL https://cln.sh/robots.txt
Domain IPs 108.157.254.46, 108.157.254.68, 108.157.254.96, 108.157.254.99
Response IP 108.157.254.99
Found Yes
Hash 0ecf1ed7db4568747ea3caff16bbc729be11ce86bd9d133f29193fa0615ff457
SimHash 1810f350cff3

Groups

*

Rule Path
Disallow /

googlebot

Rule Path
Allow /

ia_archiver

Rule Path
Disallow /

archive.org_bot

Rule Path
Disallow /

facebookexternalhit

Rule Path
Allow /

twitterbot

Rule Path
Allow /

Comments

  • Block all crawlers by default
  • Allow Googlebot so it can read <meta name="robots" content="noindex"> tag and discard the page
  • Blocking Googlebot can still show pages in search results, just without a description
  • See https://developers.google.com/search/docs/crawling-indexing/robots/intro#what-is-a-robots.txt-file-used-for
  • Make sure web archives cannot save any page
  • Allow Facebook link previews
  • Allow Twitter link previews