ns.editeur.org
robots.txt

Robots Exclusion Standard data for ns.editeur.org

Resource Scan

Scan Details

Site Domain ns.editeur.org
Base Domain editeur.org
Scan Status Ok
Last Scan2025-08-31T00:39:23+00:00
Next Scan 2025-09-30T00:39:23+00:00

Last Scan

Scanned2025-08-31T00:39:23+00:00
URL https://ns.editeur.org/robots.txt
Domain IPs 213.48.238.227
Response IP 213.48.238.227
Found Yes
Hash 1dac9b6083df5709c81a41a1070a0d087a93b0a9224e094b4cef3cc2c15d6f90
SimHash 22379915a7f4

Groups

*

Rule Path
Disallow /

googlebot
bingbot
slurp
duckduckbot
baiduspider
yeti
ia_archiver
applebot
oai-searchbot

Rule Path
Allow /bic_categories/
Disallow /bisac_categories/
Allow /onix/
Disallow /onix36/
Allow /thema/
Disallow /thema10/
Disallow /thema11/
Disallow /thema12/
Disallow /thema13/
Disallow /thema14/
Disallow /thema15/
Disallow /thema16/

amazonbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

ai2bot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

pangubot

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

meta-externalfetcher

Rule Path
Disallow /

mistralai-user

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

chatgpt-user/2.0

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

perplexity-user

Rule Path
Disallow /

bytedance

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

youbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://ns.editeur.org/sitemap.xml
sitemap https://ns.editeur.org/thema/sitemap.xml
sitemap https://ns.editeur.org/onix/sitemap.xml

Comments

  • See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
  • 1. Ban MOST spiders from the entire site:
  • 2. Allow CERTAIN search spiders limited access:
  • User-agent: YandexBot
  • 3. Point to the sitemaps
  • 4. Specifically disallow bots associated with scraping AI training data
  • or acting as an agent on behalf of a real user
  • Amazon
  • Anthropic AI
  • Apple
  • Cohere
  • Common Crawl (Allen Institute)
  • Diff
  • Google Bard
  • Huawei
  • Meta
  • Mistral
  • OpenAI
  • Perplexity AI
  • Bytedance (won't work but shows our intent)
  • Webz.io
  • You.com
  • disallow all the above AI bots
  • contact EDItEUR via info@editeur.org