cls.cn
robots.txt

Robots Exclusion Standard data for cls.cn

Resource Scan

Scan Details

Site Domain cls.cn
Base Domain cls.cn
Scan Status Ok
Last Scan2024-10-29T01:25:34+00:00
Next Scan 2024-11-28T01:25:34+00:00

Last Scan

Scanned2024-10-29T01:25:34+00:00
URL https://www.cls.cn/robots.txt
Domain IPs 103.143.19.17, 240e:940:e009:1e0::18f
Response IP 103.143.19.17
Found Yes
Hash fb65f0409dcd8486d15dc87eb42220657fb6c102dd6ff40d12c913aaf543fceb
SimHash a838d4c00990

Groups

*

Rule Path
Disallow /hwwebscan_verify.html
Disallow /static/
Disallow .jpg$
Disallow .jpeg$
Disallow .gif$
Disallow .png$
Disallow .bmp$

Other Records

Field Value
sitemap https://cls.cn/map.xml

Comments

  • User-agent:* //制定规则适用于哪个蜘蛛,'*'代表所有搜索引擎
  • Disallow: /(禁止蜘蛛爬取网站的所有目录 "/" 表示根目录下)
  • Allow:(用来定义允许蜘蛛爬取的页面或子目录)
  • Sitemap:告诉蜘蛛XML网站地图在哪里。

Warnings

  • 1 invalid line.