coffeegreenbeans.com
robots.txt

Robots Exclusion Standard data for coffeegreenbeans.com

Resource Scan

Scan Details

Site Domain coffeegreenbeans.com
Base Domain coffeegreenbeans.com
Scan Status Ok
Last Scan2026-03-13T02:50:12+00:00
Next Scan 2026-03-27T02:50:12+00:00

Last Scan

Scanned2026-03-13T02:50:12+00:00
URL https://coffeegreenbeans.com/robots.txt
Domain IPs 23.227.38.66, 2620:127:f00f:6::
Response IP 23.227.38.66
Found Yes
Hash 5ba3527ab4106452f4cc3a90cd462ba26b8a27576184d72e74ebe0c365ff1486
SimHash 7774905b9f9c

Groups

*

Rule Path
Disallow /a/downloads/-/*
Disallow /admin
Disallow /cart
Disallow /orders
Disallow /checkouts/
Disallow /checkout
Disallow /55392960561/checkouts
Disallow /55392960561/orders
Disallow /carts
Disallow /account
Disallow /collections/*sort_by*
Disallow /*/collections/*sort_by*
Disallow /collections/*%2B*
Disallow /collections/*%2B*
Disallow /collections/*%2B*
Disallow /*/collections/*%2B*
Disallow /*/collections/*%2B*
Disallow /*/collections/*%2B*
Disallow */collections/*filter*%26*filter*
Disallow /*?*oseid=*
Disallow /*preview_theme_id*
Disallow /*preview_script_id*
Disallow /policies/
Disallow /*/policies/
Disallow /*/*?*ls=*&ls=*
Disallow /*/*?*ls%3D*%3Fls%3D*
Disallow /*/*?*ls%3D*%3Fls%3D*
Disallow /search
Disallow /apple-app-site-association
Disallow /.well-known/shopify/monorail
Disallow /cdn/wpm/*.js
Disallow /recommendations/products
Disallow /*/recommendations/products

adsbot-google

Rule Path
Disallow /checkouts/
Disallow /checkout
Disallow /carts
Disallow /orders
Disallow /55392960561/checkouts
Disallow /55392960561/orders
Disallow /*?*oseid=*
Disallow /*preview_theme_id*
Disallow /*preview_script_id*
Disallow /cdn/wpm/*.js

nutch

Rule Path
Disallow /

ahrefsbot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

ahrefssiteaudit

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

mj12bot

No rules defined. All paths allowed.

Other Records

Field Value
crawl-delay 10

pinterest

Rule Path
Disallow /checkouts/*
Disallow /checkouts/cn/*
Disallow /*/*?*filter.p.m.*

Other Records

Field Value
crawl-delay 1

gptbot

Rule Path
Allow /

google-extended

Rule Path
Allow /

claudebot

Rule Path
Allow /

perplexitybot

Rule Path
Allow /

bingbot

Rule Path
Allow /

ccbot

Rule Path
Allow /

Other Records

Field Value
sitemap https://coffeegreenbeans.com/sitemap.xml

Comments

  • We use Shopify as our e-commerce platform.
  • Disallow common Shopify paths to prevent indexing of transactional pages, admin areas, and duplicate content.
  • Disallow sorting and filtering parameters that create duplicate content.
  • Disallow preview and tracking parameters.
  • Disallow policy pages (often duplicate or low SEO value).
  • Disallow session/language parameters that can cause duplication.
  • Disallow internal search page (indexing of individual results is preferred).
  • Disallow specific Apple app files and Shopify's Monorail.
  • Disallow product recommendation paths, often dynamically generated or of low SEO value.
  • Location of the XML Sitemap, essential for crawling and indexing.
  • Specific rules for individual bots
  • Google AdsBot
  • Nutch (open-source crawler, often used for analysis or research)
  • AhrefsBot (Ahrefs' bot for link and site analysis)
  • General User-agent: * directives apply by default.
  • Crawl-delay is maintained to be server-friendly.
  • AhrefsSiteAudit (Ahrefs' bot for site audits)
  • General User-agent: * directives apply by default.
  • Crawl-delay is maintained to be server-friendly.
  • MJ12bot (Majestic SEO's bot for link analysis)
  • Crawl-delay is applied to be server-friendly.
  • Pinterest (for crawling pins and content discovery)
  • Crawl-delay is maintained.
  • The "Disallow: /en-ca*" directive was removed as an e-commerce and blog typically wants Pinterest to index all language versions.
  • Specific checkout exclusions are kept.
  • Rules for Artificial Intelligence (AI) crawlers
  • GPTBot (OpenAI - for training and improving AI models)
  • Allow full access to public content so OpenAI can use it in their models.
  • This is beneficial for your content's visibility in AI-powered tools.
  • Google-Extended (Google - for improving search and training AI models like Bard/Gemini)
  • Allow full access to public content to maximize visibility in the Google ecosystem.
  • ClaudeBot (Anthropic - for training and improving AI models like Claude)
  • Allow full access to public content.
  • PerplexityBot (Perplexity AI - for conversational answers and summaries)
  • Allow full access to public content to be included in Perplexity's responses.
  • Bingbot (Microsoft Copilot / Bing Chat)
  • Although Bingbot is already covered by User-agent: *, it's explicitly included to ensure its crawling and use in Copilot.
  • Ensure Bingbot/Copilot can access all unrestricted content.
  • CCBot (Common Crawl - crawler for research datasets and AI training)
  • Consider carefully. Disallow if massive data use is a privacy/resource concern.
  • If the goal is maximum content dissemination for research or public AI training, it could be allowed.
  • For most e-commerce/blogs, allowing is acceptable if server impact is not problematic.
  • In this case, we will allow it to maximize content visibility for AI projects and analytics.