randompokegen.cc
robots.txt

Robots Exclusion Standard data for randompokegen.cc

Resource Scan

Scan Details

Site Domain randompokegen.cc
Base Domain randompokegen.cc
Scan Status Ok
Last Scan2025-10-30T21:39:42+00:00
Next Scan 2025-11-06T21:39:42+00:00

Last Scan

Scanned2025-10-30T21:39:42+00:00
URL https://randompokegen.cc/robots.txt
Domain IPs 104.21.23.23, 172.67.208.110, 2606:4700:3033::ac43:d06e, 2606:4700:3036::6815:1717
Response IP 104.21.23.23
Found Yes
Hash ab53e63c6026db9195c8f1eda3d5d6f7c9eb1bdadb7f572cf1ee23382a925180
SimHash 63144b21eab6

Groups

*

Rule Path
Allow /

gptbot
claude-web
anthropic-ai
perplexitybot
googleother
duckassistbot
bard
bard-google
chatgpt-user
cohere-ai
ccbot
facebookbot
omgilibot
omgili
bytespider
claudebot
geminibot
geminicrawler
bingbot
bytedance-spider
baiduspider
youbot
metabot
sogou-spider

Rule Path
Allow /llms.txt
Allow /llms-full.txt
Allow /

Other Records

Field Value
sitemap https://randompokegen.cc/sitemap.xml

Comments

  • robots.txt基础设置
  • 常规搜索引擎规则
  • 网站地图
  • AI爬虫特定规则 - 所有AI爬虫统一设置
  • 引导AI爬虫到llms.txt - 包含对AI爬虫特别有用的信息
  • 如果未来需要限制某些路径,可以取消以下注释并修改
  • Disallow: /user-content/
  • Disallow: /admin/
  • Disallow: /private/