linustechtips.com
robots.txt

Robots Exclusion Standard data for linustechtips.com

Resource Scan

Scan Details

Site Domain linustechtips.com
Base Domain linustechtips.com
Scan Status Ok
Last Scan2024-05-28T05:27:10+00:00
Next Scan 2024-06-11T05:27:10+00:00

Last Scan

Scanned2024-05-28T05:27:10+00:00
URL https://linustechtips.com/robots.txt
Domain IPs 104.26.12.25, 104.26.13.25, 172.67.75.68, 2606:4700:20::681a:c19, 2606:4700:20::681a:d19, 2606:4700:20::ac43:4b44
Response IP 104.26.13.25
Found Yes
Hash 2db1ec86b2b15e101037b71c760ef98e78b4af9c4d3f8da2fc4b509ad7804f1c
SimHash 18304913cecc

Groups

*

Rule Path
Disallow /startTopic/
Disallow /*?do=add
Disallow /*?do=submit
Disallow /discover/unread/
Disallow /markallread/
Disallow /staff/
Disallow /online/
Disallow /discover/
Disallow /leaderboard/
Disallow /search/
Disallow /*?advancedSearchForm=
Disallow /register/
Disallow /lostpassword/
Disallow /login/
Disallow /*?sortby=
Disallow /*?filter=
Disallow /*?tab=comments
Disallow /*?do=findComment
Disallow /*?do=getLastComment
Disallow /*?do=getNewComment
Disallow /profile/
Disallow /tags/
Disallow *csrfKey%3D*
Disallow /status/

sogou web spider

Rule Path
Disallow /

sogou inst spider

Rule Path
Disallow /

Other Records

Field Value
sitemap https://linustechtips.com/sitemap.php

Comments

  • Rules for all user agents
  • IPS Default (https://remoteservices.invisionpower.com/docs/robots_txt_info/)
  • Block pages with no unique content
  • Block faceted pages and 301 redirect pages
  • Block profile pages as these have little unique value, consume a lot of crawl time and contain hundreds of 301 links
  • Sitemap URL
  • Additional rules
  • The naughty list