ns.editeur.org
robots.txt

Robots Exclusion Standard data for ns.editeur.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	ns.editeur.org
Base Domain	editeur.org
Scan Status	Ok
Last Scan	2025-08-31T00:39:23+00:00
Next Scan	2025-09-30T00:39:23+00:00

Last Scan

Scanned	2025-08-31T00:39:23+00:00
URL	https://ns.editeur.org/robots.txt
Domain IPs	213.48.238.227
Response IP	213.48.238.227
Found	Yes
Hash	1dac9b6083df5709c81a41a1070a0d087a93b0a9224e094b4cef3cc2c15d6f90
SimHash	22379915a7f4

Groups

*

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot
bingbot
slurp
duckduckbot
baiduspider
yeti
ia_archiver
applebot
oai-searchbot

Rule	Path
Allow	/bic_categories/
Disallow	/bisac_categories/
Allow	/onix/
Disallow	/onix36/
Allow	/thema/
Disallow	/thema10/
Disallow	/thema11/
Disallow	/thema12/
Disallow	/thema13/
Disallow	/thema14/
Disallow	/thema15/
Disallow	/thema16/

Rule

Path

Allow

/bic_categories/

Disallow

/bisac_categories/

Allow

/onix/

Disallow

/onix36/

Allow

/thema/

Disallow

/thema10/

Disallow

/thema11/

Disallow

/thema12/

Disallow

/thema13/

Disallow

/thema14/

Disallow

/thema15/

Disallow

/thema16/

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

claude-web

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

cohere-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ai2bot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

diffbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

pangubot

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalfetcher

Rule	Path
Disallow	/

Rule

Path

Disallow

mistralai-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user/2.0

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexitybot

Rule	Path
Disallow	/

Rule

Path

Disallow

perplexity-user

Rule	Path
Disallow	/

Rule

Path

Disallow

bytedance

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

youbot

Rule	Path
Disallow	/

Rule

Path

Disallow

Other Records

Field	Value
sitemap	https://ns.editeur.org/sitemap.xml
sitemap	https://ns.editeur.org/thema/sitemap.xml
sitemap	https://ns.editeur.org/onix/sitemap.xml

Field

Value

sitemap

https://ns.editeur.org/sitemap.xml

sitemap

https://ns.editeur.org/thema/sitemap.xml

sitemap

https://ns.editeur.org/onix/sitemap.xml

Comments

See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
1. Ban MOST spiders from the entire site:
2. Allow CERTAIN search spiders limited access:
User-agent: YandexBot
3. Point to the sitemaps
4. Specifically disallow bots associated with scraping AI training data
or acting as an agent on behalf of a real user
Amazon
Anthropic AI
Apple
Cohere
Common Crawl (Allen Institute)
Diff
Google Bard
Huawei
Meta
Mistral
OpenAI
Perplexity AI
Bytedance (won't work but shows our intent)
Webz.io
You.com
disallow all the above AI bots
contact EDItEUR via info@editeur.org

ns.editeur.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

googlebotbingbotslurpduckduckbotbaiduspideryetiia_archiverapplebotoai-searchbot

amazonbot

anthropic-ai

claude-web

claudebot

applebot-extended

cohere-ai

ccbot

ai2bot

diffbot

diffbot

google-extended

pangubot

facebookbot

meta-externalagent

meta-externalfetcher

mistralai-user

gptbot

chatgpt-user

chatgpt-user/2.0

perplexitybot

perplexity-user

bytedance

bytespider

omgili

omgilibot

youbot

Other Records

Comments

ns.editeur.org
robots.txt

googlebot
bingbot
slurp
duckduckbot
baiduspider
yeti
ia_archiver
applebot
oai-searchbot