hyperborea.org
robots.txt

Robots Exclusion Standard data for hyperborea.org

Resource Scan

Scan Details

Site Domain hyperborea.org
Base Domain hyperborea.org
Scan Status Ok
Last Scan2025-06-25T14:15:35+00:00
Next Scan 2025-06-26T14:15:35+00:00

Last Scan

Scanned2025-06-25T14:15:35+00:00
URL https://hyperborea.org/robots.txt
Domain IPs 67.205.23.249
Response IP 67.205.23.249
Found Yes
Hash da93f1640b3c27b0dedf812b33ed41f2f80339204bf3966331ed98b6c011630c
SimHash ac9c4b11c6d0

Groups

lmspider

Rule Path
Disallow /

becomebot

Rule Path
Disallow /

ai2bot
ai2bot-dolma
aihitbot
amazonbot
andibot
anthropic-ai
applebot
applebot-extended
bedrockbot
brightbot 1.0
bytespider
ccbot
chatgpt-user
claude-searchbot
claude-user
claude-web
claudebot
cohere-ai
cohere-training-data-crawler
cotoyogi
crawlspace
diffbot
duckassistbot
echoboxbot
facebookbot
factset_spyderbot
firecrawlagent
friendlycrawler
google-cloudvertexbot
google-extended
googleother
googleother-image
googleother-video
gptbot
iaskspider/2.0
icc-crawler
imagesiftbot
img2dataset
isscyberriskcrawler
kangaroo bot
meta-externalagent
meta-externalagent
meta-externalfetcher
meta-externalfetcher
mistralai-user/1.0
mycentralaiscraperbot
novaact
oai-searchbot
omgili
omgilibot
operator
pangubot
panscient
panscient.com
perplexity-user
perplexitybot
petalbot
phindbot
poseidon research crawler
qualifiedbot
quillbot
quillbot.com
sbintuitionsbot
scrapy
semrushbot
semrushbot-ba
semrushbot-ct
semrushbot-ocob
semrushbot-si
semrushbot-swa
sidetrade indexer bot
tiktokspider
timpibot
velenpublicwebcrawler
webzio-extended
wpbot
yandexadditional
yandexadditionalbot
youbot

Rule Path
Disallow /journal
Disallow /writing
Disallow /temp/
Disallow /stuff/
Disallow /utils/
Disallow /usage/
Disallow /cgi-bin/
Disallow /selling/
Disallow /ebay/
Disallow /flash/bigimage.php
Disallow /flash/image.php
Disallow /flash/drzoom.cgi
Disallow /flash/drzoom.php
Disallow /journal/archives
Disallow /journal/wp-comments-popup.php
Disallow /journal/wp-commentsrss2.php
Disallow /journal/wp-trackback.php
Disallow /journal/wp-login.php
Disallow /journal/wp-mobile.php
Disallow /mirror/
Disallow /latest.html

*

Rule Path
Disallow /temp/
Disallow /stuff/
Disallow /utils/
Disallow /usage/
Disallow /cgi-bin/
Disallow /selling/
Disallow /ebay/
Disallow /flash/bigimage.php
Disallow /flash/image.php
Disallow /flash/drzoom.cgi
Disallow /flash/drzoom.php
Disallow /journal/archives
Disallow /journal/wp-comments-popup.php
Disallow /journal/wp-commentsrss2.php
Disallow /journal/wp-trackback.php
Disallow /journal/wp-login.php
Disallow /journal/wp-mobile.php
Disallow /mirror/
Disallow /latest.html
Disallow /journal/b2login
Disallow /journal/b2comments
Disallow /flash/%28
Disallow /flash/0
Disallow /flash/1
Disallow /flash/2
Disallow /flash/3
Disallow /flash/4
Disallow /flash/5
Disallow /flash/6
Disallow /flash/7
Disallow /flash/8
Disallow /flash/9
Disallow /humor/comiccon2004.phtml/
Disallow /journal/index.php?social_controller

Other Records

Field Value
sitemap https://hyperborea.org/sitemap.xml

Comments

  • It's a shopping search engine. Nothing's relevant except maybe the comics list
  • AI bots
  • URLs that keep getting requested despite being invalid. Causes include:
  • - Broken spiders that don't handle links/image maps/scripts/base correctly
  • - Typos in links on other sites
  • - Old removed files that don't need redirects
  • - Typos and bugs on this site that have since been corrected
  • - Most importantly, crawlers that won't remove bad URLs from their database.
  • Google has picked up a strange typo... and Apache just serves up the file,
  • relative links and all.
  • Googlebot is trying to load the Social login buttons. This should take care of it.