haoxin-as.com
robots.txt

Robots Exclusion Standard data for haoxin-as.com

Resource Scan

Scan Details

Site Domain haoxin-as.com
Base Domain haoxin-as.com
Scan Status Ok
Last Scan2026-02-07T08:35:56+00:00
Next Scan 2026-03-09T08:35:56+00:00

Last Scan

Scanned2026-02-07T08:35:56+00:00
URL http://haoxin-as.com/robots.txt
Domain IPs 183.136.138.177
Response IP 183.136.138.177
Found Yes
Hash 5ac1837e0340a489365d3c913dff3d1172c4593f2d781eeeac5c7aa3800832f2
SimHash 33164e20c2a6

Groups

*

Rule Path
Allow /

Other Records

Field Value
crawl-delay 2

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

openai

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

google-ai

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

anthropic

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

meta-ai

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

ai21

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

applebot

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

bytespider

Product Comment
bytespider TikTok
Rule Path
Disallow /
Disallow /admin/
Disallow /private/
Disallow /api/
Disallow /ajax/
Disallow /user-data/

Other Records

Field Value
sitemap https://haoxin-as.com/sitemap.xml
sitemap https://haoxin-as.com/news-sitemap.xml

Comments

  • 允许所有搜索引擎爬虫访问公开内容
  • ======== 屏蔽AI训练爬虫 ========
  • OpenAI
  • Google AI
  • Anthropic (Claude)
  • Common Crawl
  • Facebook/Meta AI
  • 其他AI/数据收集爬虫
  • ======== 目录限制 ========
  • 可选:限制特定目录
  • 站点地图
  • 额外指令
  • 建议爬虫不要缓存页面
  • 限制AI训练使用

Warnings

  • `cache-control` is not a known field.
  • `host` is not a known field.
  • `x-robots-tag` is not a known field.