thespacechannel.org
robots.txt

Robots Exclusion Standard data for thespacechannel.org

Resource Scan

Scan Details

Site Domain thespacechannel.org
Base Domain thespacechannel.org
Scan Status Ok
Last Scan2026-02-03T09:12:46+00:00
Next Scan 2026-02-10T09:12:46+00:00

Last Scan

Scanned2026-02-03T09:12:46+00:00
URL https://thespacechannel.org/robots.txt
Domain IPs 72.52.178.18
Response IP 72.52.178.18
Found Yes
Hash 0ac42b9bb06276b079499abfdbb8c7372f728bf70d08f29173196bf23cbe31a3
SimHash 107d702087c8

Groups

friendlycrawler
orbbot
phpcrawl
nutch
magpie-crawler
heritrix
indy library
go-http-client

Rule Path
Disallow /

ahrefsbot
semrushbot
mj12bot
blexbot
dotbot
zoominfobot
maxpointcrawler
dataforseobot
seoscanners.net

Rule Path
Disallow /

gptbot
claudebot
bytespider
petalbot
barkrowler

Rule Path
Disallow /

yandexbot
seznambot
mail.ru
coccocbot-image

Rule Path
Disallow /

amazonbot
admantx
buzzbot
trendictionbot
rogerbot
zumbot
alphabot
aspiegelbot

Rule Path
Disallow /

duckduckbot

Rule Path
Disallow /

Comments

  • Aggressive crawlers & scrapers
  • SEO & data mining bots
  • AI / LLM crawlers
  • Regional / foreign search engines
  • Social / marketing crawlers
  • Privacy search engines