conservationevidence.com
robots.txt

Robots Exclusion Standard data for conservationevidence.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	conservationevidence.com
Base Domain	conservationevidence.com
Scan Status	Ok
Last Scan	2026-02-19T23:31:53+00:00
Next Scan	2026-03-21T23:31:53+00:00

Last Scan

Scanned	2026-02-19T23:31:53+00:00
URL	https://conservationevidence.com/robots.txt
Domain IPs	104.26.8.213, 104.26.9.213, 172.67.72.167, 2606:4700:20::681a:8d5, 2606:4700:20::681a:9d5, 2606:4700:20::ac43:48a7
Response IP	172.67.72.167
Found	Yes
Hash	89208bcb1a73645cc46c9f62c7fd45e861e1b8695d496b89c56542f0625c51a0
SimHash	c634c251c7d5

Groups

*

Rule	Path
Allow	/

Rule

Path

Allow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow

Rule

Path

Disallow

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

googlebot

Rule	Path
Disallow

Rule

Path

Disallow

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

bingbot

Rule	Path
Disallow

Rule

Path

Disallow

Other Records

Field	Value
crawl-delay	30

Field

Value

crawl-delay

blexbot

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

oai-searchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-user

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-searchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexity-user

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

As a condition of accessing this website, you agree to abide by the following
content signals:
(a) If a Content-Signal = yes, you may collect content for the corresponding
use.
(b) If a Content-Signal = no, you may not collect content for the
corresponding use.
(c) If the website operator does not include a Content-Signal for a
corresponding use, the website operator neither grants nor restricts
permission via Content-Signal with respect to the corresponding use.
The content signals and their meanings are:
search: building a search index and providing search results (e.g., returning
hyperlinks and short excerpts from your website's contents). Search does not
include providing AI-generated search summaries.
ai-input: inputting content into one or more AI models (e.g., retrieval
augmented generation, grounding, or other real-time taking of content for
generative AI search answers).
ai-train: training or fine-tuning AI models.
ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF
AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
BEGIN Cloudflare Managed content
END Cloudflare Managed Content
.__________________________.
| .___________________. |==|
| | ................. | | |
| | ::[ Dear robot ]: | | |
| | ::::[ be nice ]:: | | |
| | ::::::::::::::::: | | |
| | ::::::::::::::::: | | |
| | ::::::::::::::::: | | |
| | ::::::::::::::::: | | ,|
| !___________________! |(c|
!_______________________!__!
/ \
/ [][][][][][][][][][][][][] \
/ [][][][][][][][][][][][][][] \
( [][][][][____________][][][][] )
\ ------------------------------ /
\______________________________/
Last updated: 2025-12-12 by Ibrahim Alhas.
--------------------------------------------------------------------
Cloudflare / Content Signals policy (human-readable explanation)
--------------------------------------------------------------------
As a condition of accessing this website, you agree to abide by the
following content signals:
(a) If a content-signal = yes, you may collect content for the
corresponding use.
(b) If a content-signal = no, you may not collect content for the
corresponding use.
(c) If the website operator does not include a content signal for a
corresponding use, the website operator neither grants nor
restricts permission via content signal with respect to that use.
The content signals and their meanings are:
search: building a search index and providing search results
(e.g., returning hyperlinks and short excerpts from the
website's contents). Search does not include providing
AI-generated search summaries.
ai-input: inputting content into one or more AI models (e.g.,
retrieval augmented generation, grounding, or other
real-time use of content for generative AI search answers).
ai-train: training or fine-tuning AI models.
ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS
RESERVATIONS OF RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION
SINGLE MARKET.
--------------------------------------------------------------------
1. Default rules – allow normal crawling & search indexing
--------------------------------------------------------------------
We allow standard search indexing, but do NOT permit use of our
content for AI input or AI training.
Explicitly restate for Google web search; AI training is controlled
separately via Google-Extended below.
Explicitly restate for Bing web search.
Aggressive generic crawler we wish to block entirely.
--------------------------------------------------------------------
2. AI / LLM-specific crawlers – blocked
--------------------------------------------------------------------
These user-agents are commonly associated with AI training or AI
search services. We do not permit crawling or reuse of our content
by these bots.
OpenAI
Anthropic (Claude)
Perplexity
Google AI training (separate from standard search indexing)
CommonCrawl (widely used in AI training corpora)
Apple AI training
Meta / Facebook
ByteDance
Amazon
--------------------------------------------------------------------
3. Notes
--------------------------------------------------------------------
- Standard search engines that respect robots.txt (Googlebot,
Bingbot, etc.) are allowed to crawl under the default rules.
- Content-Signal values indicate that traditional search indexing
is permitted, but AI input and AI training uses are not.
- robots.txt is an advisory mechanism: compliant crawlers will
respect it; hostile or disguised scrapers may ignore it and
must be handled via other measures (e.g. Cloudflare Bot
Management, WAF, rate limiting).
- Additional AI/LLM crawlers can be added to the blocked list
as the ecosystem evolves.

Warnings

`content-signal` is not a known field.

conservationevidence.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

amazonbot

applebot-extended

bytespider

ccbot

claudebot

google-extended

gptbot

meta-externalagent

*

Other Records

googlebot

Other Records

bingbot

Other Records

blexbot

gptbot

oai-searchbot

chatgpt-user

claudebot

claude-web

claude-user

claude-searchbot

perplexitybot

perplexity-user

google-extended

ccbot

applebot-extended

facebookbot

bytespider

amazonbot

Comments

Warnings

conservationevidence.com
robots.txt