sipmv.com
robots.txt

Robots Exclusion Standard data for sipmv.com

Resource Scan

Scan Details

Site Domain sipmv.com
Base Domain sipmv.com
Scan Status Ok
Last Scan2025-12-08T22:13:54+00:00
Next Scan 2026-01-07T22:13:54+00:00

Last Scan

Scanned2025-12-08T22:13:54+00:00
URL https://sipmv.com/robots.txt
Domain IPs 104.21.90.55, 172.67.196.10, 2606:4700:3030::ac43:c40a, 2606:4700:3036::6815:5a37
Response IP 172.67.196.10
Found Yes
Hash 9b621ecd55d7b5099a1af9383c3067f351c0f543dba3be65f28599b2c0480761
SimHash af37d8628572

Groups

baiduspider

Rule Path
Disallow /about/
Disallow /search/*

sogou web spider

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

newsai

Rule Path
Disallow /

bingbot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 5

Comments

  • ==========================
  • 屏蔽 Sogou 爬虫(你日志里刷得最多的)
  • ==========================
  • ==========================
  • 屏蔽 Amazonbot
  • ==========================
  • ==========================
  • 屏蔽 newsai/1.0 这类 AI/资讯爬虫
  • (它 UA 名叫 newsai/1.0,前面是伪装成 Chrome)
  • ==========================
  • ==========================
  • 可选:对 Bing 温柔一点(只限速,不封)
  • ==========================
  • 例如不想它抓产品列表翻页,可以:
  • Disallow: /products/microscope/*page