craterian.org
robots.txt

Robots Exclusion Standard data for craterian.org

Resource Scan

Scan Details

Site Domain craterian.org
Base Domain craterian.org
Scan Status Ok
Last Scan2025-06-04T20:08:37+00:00
Next Scan 2025-07-04T20:08:37+00:00

Last Scan

Scanned2025-06-04T20:08:37+00:00
URL https://craterian.org/robots.txt
Domain IPs 104.207.159.106
Response IP 104.207.159.106
Found Yes
Hash 5005e4cc1e72fe7af06b93d561a9c4c8cb8862698637a932de839e9372049e62
SimHash 786499198cd0

Groups

agentic
ahrefsbot
ahrefssiteaudit
ai article writer
ai content detector
ai dungeon
ai search engine
ai seo crawler
ai writer
ai21 labs
ai2bot
ai2bot-dolma
aibot
aihitbot
aimatrix
aisearchbot
ai training
aitraining
alexa
alpha ai
alphaai
amazon bedrock
amazon-kendra
amazon lex
amazon comprehend
amazon sagemaker
amazon silk
amazon textract
amazonbot
amelia
anderspinkbot
anthropic
anthropic-ai
anypicker
anyword
applebot
applebot-extended
aria browse
articoolo
automated writer
awariorssbot
awariosmartbot
azure
baidu
bardbot
barkrowler
brave leo
brightbot 1.0
bytedance
bytespider
catboost
cc-crawler
ccbot
chatglm
chatglm-spider
chatgpt-user
chinchilla
claude
claude-web
claudebot
clearscope
coccocbot-web
cohere
cohere-ai
cohere-training-data-crawler
common crawl
commoncrawl
content harmony
content king
content optimizer
content samurai
contentatscale
contentbot
contentedge
conversion ai
copilot
copyai
copymatic
copyscape
cotoyogi
crawlq ai
crawlspace
crew ai
crewai
dall-e
dataforseobot
dataprovider
deepai
deepl
deepmind
deepseek
diffbot
doubao ai
duckassistbot
facebookbot
facebookexternalhit
factset_spyderbot
falcon
firecrawl
firecrawlagent
flyriver
frase ai
friendlycrawler
geedobot
gemma
genai
genspark
glm
googlebot-image
googleextended
google-extended
googleother
googleother-image
googleother-video
goose
gptbot
grammarly
grendizer
grok
gt bot
gtbot
hemingway editor
hugging face
hypotenuse ai
iaskspider
iaskspider/2.0
icc-crawler
imagegen
imagesiftbot
img2dataset
imgproxy
ink editor
inkforall
intelliseek
inferkit
isscyberriskcrawler
jasperai
kafkai
kangaroo
kangaroo bot
keyword density ai
knowledge
komobot
llama
llms
magpie-crawler
marketmuse
meltwater
metaai
meta ai
meta-ai
meta-external
meta-externalagent
meta-externalagent
meta-externalfetcher
meta-externalfetcher
metatagbot
neevabot
mistral
mj12bot
narrative
neural text
neuralseo
nova act
novaact
oai-searchbot
omgili
omgilibot
open ai
openai
openbot
opentext ai
operator
outwrite
page analyzer ai
pangubot
paperlibot
paraphraser.io
perplexity
perplexity-user
perplexitybot
petalbot
phindbot
piplbot
prowritingaid
quillbot
robotspider
rytr
saplingai
scalenut
scraper
scrapy
scriptbook
semrushbot
semrushbot-ba
semrushbot-ct
semrushbot-coub
semrushbot-ocob
semrushbot-si
semrushbot-swa
seo content machine
seo robot
sentibot
serpstatbot
sidetrade
sidetrade indexer bot
simplified ai
siteauditbot
sitefinity
skydancer
slickwrite
sonic
sosospider
spin rewriter
splitsignalbot
spinbot
stability
stablediffusionbot
sudowrite
super agent
surfer ai
text blaze
textcortex
the knowledge ai
tiktokspider
timpibot
velenpublicwebcrawler
vidnami ai
webzio
webzio-extended
whisper
wordai
wordtune
wormsgtp
wpbot
writecream
writerzen
writescope
writesonic
xai
xbot
yandex
youbot
zero gtp
zerochat
zhipu
zimm

Rule Path
Disallow /

*

Rule Path
Disallow /wp-admin/*
Disallow /wp-login.php
Disallow /wp-includes/*
Disallow /wp-content/*
Disallow /trackback
Disallow /feed
Disallow */comments
Disallow */comments-page-*
Disallow */trackback
Disallow */feed
Disallow */comments
Disallow /?s=*
Disallow /search/*
Disallow /readme.html
Disallow /refer/*
Disallow /search?tags%5B%5D=*
Disallow /search?search_type=*
Disallow /search?tags*
Disallow /?eventDate=
Disallow /?post_type=tribe_events
Disallow /?tribe-bar-date=
Disallow /?related-series=
Disallow *post_type%3Dtribe_events*
Disallow *hide_subsequent_recurrences%3D*
Disallow *tribe-bar-date%3D*
Disallow *tribe-venue%3D*
Disallow *eventDisplay%3D*
Disallow *eventDate%3D*
Disallow *paged%3D*
Disallow *pagename%3D*
Disallow *shortcode%3D*
Disallow *ical%3D*
Disallow *outlook-ical%3D*
Disallow *related_series%3D*
Disallow *tribe_geofence%3D*
Disallow /events/*
Disallow /events/list/
Disallow /events/page/
Disallow /events/series/
Disallow /events/summary/
Disallow */calendar/*
Disallow *Any?query=*
Allow /wp-content/cache/*
Allow /wp-content/uploads/*
Allow /wp-includes/js/*
Allow /wp-includes/css/*
Allow /event/*

Other Records

Field Value
sitemap https://craterian.org/sitemap_index.xml

Comments

  • Robots Exclusion Protocol
  • Filename: robots.txt
  • Author: Mark Garrison (markg@projecta.com)
  • Created: 2007/10/16
  • Updated: 2025/5/7 to add additonal User-Agents from Ultimate AI Block List v1.4
  • Set directories & files to be Disallow/Allow
  • NOTE: directives for Disallow/Allow are case-sensitive!
  • Use $ to anchor the match to the end of a URL string
  • As in disallowing or allowing files of a particular extension
  • See http://www.robotstxt.org/wc/norobots.html for full specifications
  • Includes Ultimate AI Block List v1.4 20250417
  • https://perishablepress.com/ultimate-ai-block-list/
  • For WordPress sites consider automatically updating with Dark Visitors plugin
  • https://wordpress.org/plugins/dark-visitors
  • !!!IMPORTANT!!!
  • Validate all changes before using
  • https://www.google.com/search?q=validate+robots.txt+online
  • Standard WordPress Disallows
  • For Events Calendar
  • START YOAST BLOCK
  • ---------------------------
  • ---------------------------
  • END YOAST BLOCK

Warnings

  • 4 invalid lines.