haoxin-as.com
robots.txt

Robots Exclusion Standard data for haoxin-as.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	haoxin-as.com
Base Domain	haoxin-as.com
Scan Status	Ok
Last Scan	2026-02-07T08:35:56+00:00
Next Scan	2026-03-09T08:35:56+00:00

Last Scan

Scanned	2026-02-07T08:35:56+00:00
URL	http://haoxin-as.com/robots.txt
Domain IPs	183.136.138.177
Response IP	183.136.138.177
Found	Yes
Hash	5ac1837e0340a489365d3c913dff3d1172c4593f2d781eeeac5c7aa3800832f2
SimHash	33164e20c2a6

Groups

*

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
crawl-delay	2

Field

Value

crawl-delay

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

openai

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

google-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

ai21

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot

Rule	Path
Disallow	/

Rule

Path

Disallow

petalbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Product	Comment
bytespider	TikTok

Product

Comment

bytespider

TikTok

Rule	Path
Disallow	/
Disallow	/admin/
Disallow	/private/
Disallow	/api/
Disallow	/ajax/
Disallow	/user-data/

Rule

Path

Disallow

/admin/

Disallow

/private/

Disallow

/api/

Disallow

/ajax/

Disallow

/user-data/

Other Records

Field	Value
sitemap	https://haoxin-as.com/sitemap.xml
sitemap	https://haoxin-as.com/news-sitemap.xml

Field

Value

sitemap

https://haoxin-as.com/sitemap.xml

sitemap

https://haoxin-as.com/news-sitemap.xml

Comments

允许所有搜索引擎爬虫访问公开内容
======== 屏蔽AI训练爬虫 ========
OpenAI
Google AI
Anthropic (Claude)
Common Crawl
Facebook/Meta AI
其他AI/数据收集爬虫
======== 目录限制 ========
可选：限制特定目录
站点地图
额外指令
建议爬虫不要缓存页面
限制AI训练使用

Warnings

`cache-control` is not a known field.
`host` is not a known field.
`x-robots-tag` is not a known field.

haoxin-as.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

Other Records

gptbot

chatgpt-user

openai

google-extended

google-ai

claude-web

claudebot

anthropic

ccbot

facebookbot

meta-ai

cohere-ai

ai21

amazonbot

applebot

petalbot

bytespider

Other Records

Comments

Warnings

haoxin-as.com
robots.txt