pretlak.com
robots.txt

Robots Exclusion Standard data for pretlak.com

Resource Scan

Scan Details

Site Domain pretlak.com
Base Domain pretlak.com
Scan Status Ok
Last Scan2025-11-20T19:45:36+00:00
Next Scan 2025-12-20T19:45:36+00:00

Last Scan

Scanned2025-11-20T19:45:36+00:00
URL https://pretlak.com/robots.txt
Domain IPs 104.21.3.170, 172.67.130.247, 2606:4700:3031::ac43:82f7, 2606:4700:3034::6815:3aa
Response IP 172.67.130.247
Found Yes
Hash faac7bf63d2d27c046d59b9232888d56b52a195e02b1efda5814e81ad8855568
SimHash 888c98530c72

Groups

*

Rule Path
Disallow

gptbot

Rule Path
Allow /

google-extended

Rule Path
Allow /

microsoft-extended

Rule Path
Allow /

claudebot

Rule Path
Allow /

perplexitybot

Rule Path
Allow /

applebot-extended

Rule Path
Allow /

youbot

Rule Path
Allow /

bytespider

Rule Path
Disallow /

omgili

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

Other Records

Field Value
sitemap https://pretlak.com/sitemap.xml

Comments

  • AI JSON sitemap for LLMs
  • AI-Sitemap: https://pretlak.com/sitemap-ai.json
  • AI crawler guidance (see also /ai.txt)
  • GPTBot (OpenAI)
  • Google-Extended (Google)
  • Microsoft-Extended (Microsoft)
  • ClaudeBot (Anthropic)
  • PerplexityBot (Perplexity)
  • Applebot-Extended (Apple)
  • YouBot (You.com)
  • Disallow selected bots
  • Bytespider (ByteDance)
  • Omgili (webhose.io)
  • AhrefsBot (Ahrefs)
  • Disallow high-volume training/bulk crawlers
  • CCBot (Common Crawl)
  • Meta-ExternalAgent (Meta)