app.aws.org
robots.txt

Robots Exclusion Standard data for app.aws.org

Resource Scan

Scan Details

Site Domain app.aws.org
Base Domain aws.org
Scan Status Ok
Last Scan2025-10-31T08:21:28+00:00
Next Scan 2025-11-30T08:21:28+00:00

Last Scan

Scanned2025-10-31T08:21:28+00:00
URL https://app.aws.org/robots.txt
Domain IPs 69.164.196.231
Response IP 69.164.196.231
Found Yes
Hash 7bd2ca180919d5b93c7e43567cc0c492b1267bc95e06ac04295ea3f9c0b11464
SimHash 001eca50a5ca

Groups

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

amazonbot

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

microsoftpreview

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

dataforseobot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

googlebot

Rule Path
Disallow

bingbot

Rule Path
Disallow

slurp

Product Comment
slurp Yahoo
Rule Path
Disallow

duckduckbot

Rule Path
Disallow

baiduspider

Rule Path
Disallow

yandexbot

Rule Path
Disallow

*

Rule Path
Disallow

Comments

  • robots.txt to block AI/data training crawlers
  • but allow normal search indexing
  • Updated September 2025
  • ==== AI/LLM CRAWLERS ====
  • OpenAI
  • Anthropic (Claude)
  • Perplexity AI
  • Google AI (Gemini training)
  • Apple AI training
  • Facebook/Meta AI
  • Amazon AI
  • ByteDance/TikTok AI
  • Microsoft AI (Bing AI/CoPilot previews)
  • Common meta/AI-related scrapers
  • ==== SEO / SCRAPING BOTS TO BLOCK ====
  • ==== NORMAL SEARCH ENGINES ALLOWED ====
  • (Do NOT disallow these if you want indexing)
  • ==== CATCH-ALL ====