/.well-known/

Log In Sign Up

linux.die.net
robots.txt

Robots Exclusion Standard data for linux.die.net

Archived Snapshots

Resource Scan

Scan Details

Site Domain	linux.die.net
Base Domain	die.net
Scan Status	Ok
Last Scan	2024-06-13T11:44:31+00:00
Next Scan	2024-07-13T11:44:31+00:00

Last Scan

Scanned	2024-06-13T11:44:31+00:00
URL	https://linux.die.net/robots.txt
Domain IPs	104.26.0.94, 104.26.1.94, 172.67.69.187, 2606:4700:20::681a:15e, 2606:4700:20::681a:5e, 2606:4700:20::ac43:45bb
Response IP	172.67.69.187
Found	Yes
Hash	ef8f3e7c23633a5c33d3e68d1e6269bb6f53a25616b902217203591c51738ae7
SimHash	2005519ac7a1

Groups

mediapartners-google
adsbot-google

Rule

Path

Disallow

architextspider
baiduspider
duckduckbot
googlebot
googlebot-image
googlebot-mobile
ia_archiver
mj12bot
msnbot
msnbot-media
robozilla
scooter
slurp
teoma
yahoo-mmcrawler
yahoo-blogs
yandex

Rule

Path

Disallow

/include/

Disallow

/icons/

Disallow

/this-is-a-bad-url/

*

Rule

Path

Disallow

/

Back to top

Other Records

Field

Value

sitemap

http://linux.die.net/sitemap.xml.gz

Back to top

Comments

Serve relevant ads on any page:
Big, public search engines can access most of the site:
Everyone else is not welcome to crawl:
And here's where to find everything:

Back to top