linux.die.net
robots.txt

Robots Exclusion Standard data for linux.die.net

Resource Scan

Scan Details

Site Domain linux.die.net
Base Domain die.net
Scan Status Ok
Last Scan2024-06-13T11:44:31+00:00
Next Scan 2024-07-13T11:44:31+00:00

Last Scan

Scanned2024-06-13T11:44:31+00:00
URL https://linux.die.net/robots.txt
Domain IPs 104.26.0.94, 104.26.1.94, 172.67.69.187, 2606:4700:20::681a:15e, 2606:4700:20::681a:5e, 2606:4700:20::ac43:45bb
Response IP 172.67.69.187
Found Yes
Hash ef8f3e7c23633a5c33d3e68d1e6269bb6f53a25616b902217203591c51738ae7
SimHash 2005519ac7a1

Groups

mediapartners-google
adsbot-google

Rule Path
Disallow

architextspider
baiduspider
duckduckbot
googlebot
googlebot-image
googlebot-mobile
ia_archiver
mj12bot
msnbot
msnbot-media
robozilla
scooter
slurp
teoma
yahoo-mmcrawler
yahoo-blogs
yandex

Rule Path
Disallow /include/
Disallow /icons/
Disallow /this-is-a-bad-url/

*

Rule Path
Disallow /

Other Records

Field Value
sitemap http://linux.die.net/sitemap.xml.gz

Comments

  • Serve relevant ads on any page:
  • Big, public search engines can access most of the site:
  • Everyone else is not welcome to crawl:
  • And here's where to find everything: