coffeegreenbeans.com
robots.txt

Robots Exclusion Standard data for coffeegreenbeans.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	coffeegreenbeans.com
Base Domain	coffeegreenbeans.com
Scan Status	Ok
Last Scan	2026-03-13T02:50:12+00:00
Next Scan	2026-03-27T02:50:12+00:00

Last Scan

Scanned	2026-03-13T02:50:12+00:00
URL	https://coffeegreenbeans.com/robots.txt
Domain IPs	23.227.38.66, 2620:127:f00f:6::
Response IP	23.227.38.66
Found	Yes
Hash	5ba3527ab4106452f4cc3a90cd462ba26b8a27576184d72e74ebe0c365ff1486
SimHash	7774905b9f9c

Groups

*

Rule	Path
Disallow	/a/downloads/-/*
Disallow	/admin
Disallow	/cart
Disallow	/orders
Disallow	/checkouts/
Disallow	/checkout
Disallow	/55392960561/checkouts
Disallow	/55392960561/orders
Disallow	/carts
Disallow	/account
Disallow	/collections/sort_by
Disallow	//collections/sort_by*
Disallow	/collections/%2B
Disallow	/collections/%2B
Disallow	/collections/%2B
Disallow	//collections/%2B*
Disallow	//collections/%2B*
Disallow	//collections/%2B*
Disallow	/collections/filter%26filter*
Disallow	/?oseid=*
Disallow	/preview_theme_id
Disallow	/preview_script_id
Disallow	/policies/
Disallow	/*/policies/
Disallow	//?ls=&ls=*
Disallow	//?ls%3D%3Fls%3D*
Disallow	//?ls%3D%3Fls%3D*
Disallow	/search
Disallow	/apple-app-site-association
Disallow	/.well-known/shopify/monorail
Disallow	/cdn/wpm/*.js
Disallow	/recommendations/products
Disallow	/*/recommendations/products

Rule

Path

Disallow

/a/downloads/-/*

Disallow

/admin

Disallow

/cart

Disallow

/orders

Disallow

/checkouts/

Disallow

/checkout

Disallow

/55392960561/checkouts

Disallow

/55392960561/orders

Disallow

/carts

Disallow

/account

Disallow

/collections/*sort_by*

Disallow

/*/collections/*sort_by*

Disallow

/collections/*%2B*

Disallow

/collections/*%2B*

Disallow

/collections/*%2B*

Disallow

/*/collections/*%2B*

Disallow

/*/collections/*%2B*

Disallow

/*/collections/*%2B*

Disallow

*/collections/*filter*%26*filter*

Disallow

/*?*oseid=*

Disallow

/*preview_theme_id*

Disallow

/*preview_script_id*

Disallow

/policies/

Disallow

/*/policies/

Disallow

/*/*?*ls=*&ls=*

Disallow

/*/*?*ls%3D*%3Fls%3D*

Disallow

/*/*?*ls%3D*%3Fls%3D*

Disallow

/search

Disallow

/apple-app-site-association

Disallow

/.well-known/shopify/monorail

Disallow

/cdn/wpm/*.js

Disallow

/recommendations/products

Disallow

/*/recommendations/products

adsbot-google

Rule	Path
Disallow	/checkouts/
Disallow	/checkout
Disallow	/carts
Disallow	/orders
Disallow	/55392960561/checkouts
Disallow	/55392960561/orders
Disallow	/?oseid=*
Disallow	/preview_theme_id
Disallow	/preview_script_id
Disallow	/cdn/wpm/*.js

Rule

Path

Disallow

/checkouts/

Disallow

/checkout

Disallow

/carts

Disallow

/orders

Disallow

/55392960561/checkouts

Disallow

/55392960561/orders

Disallow

/*?*oseid=*

Disallow

/*preview_theme_id*

Disallow

/*preview_script_id*

Disallow

/cdn/wpm/*.js

nutch

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

ahrefssiteaudit

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

mj12bot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

Rule	Path
Disallow	/checkouts/*
Disallow	/checkouts/cn/*
Disallow	//?filter.p.m.

Rule

Path

Disallow

/checkouts/*

Disallow

/checkouts/cn/*

Disallow

/*/*?*filter.p.m.*

Other Records

Field	Value
crawl-delay	1

Field

Value

crawl-delay

gptbot

Rule	Path
Allow	/

Rule

Path

Allow

google-extended

Rule	Path
Allow	/

Rule

Path

Allow

claudebot

Rule	Path
Allow	/

Rule

Path

Allow

perplexitybot

Rule	Path
Allow	/

Rule

Path

Allow

bingbot

Rule	Path
Allow	/

Rule

Path

Allow

ccbot

Rule	Path
Allow	/

Rule

Path

Allow

Other Records

Field	Value
sitemap	https://coffeegreenbeans.com/sitemap.xml

Field

Value

sitemap

https://coffeegreenbeans.com/sitemap.xml

Comments

We use Shopify as our e-commerce platform.
Disallow common Shopify paths to prevent indexing of transactional pages, admin areas, and duplicate content.
Disallow sorting and filtering parameters that create duplicate content.
Disallow preview and tracking parameters.
Disallow policy pages (often duplicate or low SEO value).
Disallow session/language parameters that can cause duplication.
Disallow internal search page (indexing of individual results is preferred).
Disallow specific Apple app files and Shopify's Monorail.
Disallow product recommendation paths, often dynamically generated or of low SEO value.
Location of the XML Sitemap, essential for crawling and indexing.
Specific rules for individual bots
Google AdsBot
Nutch (open-source crawler, often used for analysis or research)
AhrefsBot (Ahrefs' bot for link and site analysis)
General User-agent: * directives apply by default.
Crawl-delay is maintained to be server-friendly.
AhrefsSiteAudit (Ahrefs' bot for site audits)
General User-agent: * directives apply by default.
Crawl-delay is maintained to be server-friendly.
MJ12bot (Majestic SEO's bot for link analysis)
Crawl-delay is applied to be server-friendly.
Pinterest (for crawling pins and content discovery)
Crawl-delay is maintained.
The "Disallow: /en-ca*" directive was removed as an e-commerce and blog typically wants Pinterest to index all language versions.
Specific checkout exclusions are kept.
Rules for Artificial Intelligence (AI) crawlers
GPTBot (OpenAI - for training and improving AI models)
Allow full access to public content so OpenAI can use it in their models.
This is beneficial for your content's visibility in AI-powered tools.
Google-Extended (Google - for improving search and training AI models like Bard/Gemini)
Allow full access to public content to maximize visibility in the Google ecosystem.
ClaudeBot (Anthropic - for training and improving AI models like Claude)
Allow full access to public content.
PerplexityBot (Perplexity AI - for conversational answers and summaries)
Allow full access to public content to be included in Perplexity's responses.
Bingbot (Microsoft Copilot / Bing Chat)
Although Bingbot is already covered by User-agent: *, it's explicitly included to ensure its crawling and use in Copilot.
Ensure Bingbot/Copilot can access all unrestricted content.
CCBot (Common Crawl - crawler for research datasets and AI training)
Consider carefully. Disallow if massive data use is a privacy/resource concern.
If the goal is maximum content dissemination for research or public AI training, it could be allowed.
For most e-commerce/blogs, allowing is acceptable if server impact is not problematic.
In this case, we will allow it to maximize content visibility for AI projects and analytics.

coffeegreenbeans.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

adsbot-google

nutch

ahrefsbot

Other Records

ahrefssiteaudit

Other Records

mj12bot

Other Records

pinterest

Other Records

gptbot

google-extended

claudebot

perplexitybot

bingbot

ccbot

Other Records

Comments

coffeegreenbeans.com
robots.txt