app.aws.org
robots.txt

Robots Exclusion Standard data for app.aws.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	app.aws.org
Base Domain	aws.org
Scan Status	Ok
Last Scan	2025-10-31T08:21:28+00:00
Next Scan	2025-11-30T08:21:28+00:00

Last Scan

Scanned	2025-10-31T08:21:28+00:00
URL	https://app.aws.org/robots.txt
Domain IPs	69.164.196.231
Response IP	69.164.196.231
Found	Yes
Hash	7bd2ca180919d5b93c7e43567cc0c492b1267bc95e06ac04295ea3f9c0b11464
SimHash	001eca50a5ca

Groups

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

microsoftpreview

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dataforseobot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

semrushbot

Rule	Path
Disallow	/

Rule

Path

Disallow

dotbot

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot

Rule	Path
Disallow

Rule

Path

Disallow

bingbot

Rule	Path
Disallow

Rule

Path

Disallow

slurp

Product	Comment
slurp	Yahoo

Product

Comment

slurp

Yahoo

Rule	Path
Disallow

Rule

Path

Disallow

duckduckbot

Rule	Path
Disallow

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow

Rule

Path

Disallow

yandexbot

Rule	Path
Disallow

Rule

Path

Disallow

*

Rule	Path
Disallow

Rule

Path

Disallow

Comments

robots.txt to block AI/data training crawlers
but allow normal search indexing
Updated September 2025
==== AI/LLM CRAWLERS ====
OpenAI
Anthropic (Claude)
Perplexity AI
Google AI (Gemini training)
Apple AI training
Facebook/Meta AI
Amazon AI
ByteDance/TikTok AI
Microsoft AI (Bing AI/CoPilot previews)
Common meta/AI-related scrapers
==== SEO / SCRAPING BOTS TO BLOCK ====
==== NORMAL SEARCH ENGINES ALLOWED ====
(Do NOT disallow these if you want indexing)
==== CATCH-ALL ====

app.aws.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

gptbot

chatgpt-user

claudebot

perplexitybot

google-extended

applebot-extended

facebookbot

meta-externalagent

amazonbot

bytespider

microsoftpreview

ccbot

dataforseobot

diffbot

ahrefsbot

semrushbot

dotbot

googlebot

bingbot

slurp

duckduckbot

baiduspider

yandexbot

*

Comments

app.aws.org
robots.txt