halalive.com
robots.txt

Robots Exclusion Standard data for halalive.com

Resource Scan

Scan Details

Site Domain halalive.com
Base Domain halalive.com
Scan Status Ok
Last Scan2025-10-18T16:15:44+00:00
Next Scan 2025-11-01T16:15:44+00:00

Last Scan

Scanned2025-10-18T16:15:44+00:00
URL https://www.halalive.com/robots.txt
Domain IPs 104.21.67.228, 172.67.182.120, 2606:4700:3033::6815:43e4, 2606:4700:3036::ac43:b678
Response IP 104.21.67.228
Found Yes
Hash 4f216cd35a778a746b672aaa8a3cd80c24e4162bc2ce90337aaf0bc88be8844e
SimHash 055d185566b5

Groups

googlebot

Rule Path
Allow /

google-extended

Rule Path
Allow /

googleother

Rule Path
Allow /

bingbot

Rule Path
Allow /

gptbot

Rule Path
Allow /

deepseekbot

Rule Path
Allow /

grok

Rule Path
Allow /

perplexity

Rule Path
Allow /

llama

Rule Path
Allow /

claude

Rule Path
Allow /

facebookbot

Rule Path
Allow /

applebot-extended

Rule Path
Allow /

*

Rule Path
Disallow /lp/
Disallow /feedback/
Disallow /langtest/
Disallow /*?sessionid=
Disallow /*?sort=
Disallow /*%26filter%3D
Disallow /admin/
Disallow /login/
Disallow /user/
Allow /

Other Records

Field Value
sitemap https://www.halalive.com/google_sitemap_index.xml

Comments

  • 屏蔽非官方或非授权爬虫的抓取,统一规则
  • 屏蔽带有会话ID、排序参数、筛选参数的URL,防止重复抓取
  • 屏蔽后台、登录和用户隐私相关路径
  • 允许抓取网站其他所有内容
  • Sitemap 文件地址,方便搜索引擎发现站点结构