forum.starmen.net
robots.txt

Robots Exclusion Standard data for forum.starmen.net

Archived Snapshots

Resource Scan

Scan Details

Site Domain	forum.starmen.net
Base Domain	starmen.net
Scan Status	Ok
Last Scan	2025-11-29T02:13:46+00:00
Next Scan	2025-12-29T02:13:46+00:00

Last Scan

Scanned	2025-11-29T02:13:46+00:00
URL	https://forum.starmen.net/robots.txt
Domain IPs	104.21.77.50, 172.67.204.166, 2606:4700:3031::ac43:cca6, 2606:4700:3037::6815:4d32
Response IP	104.21.77.50
Found	Yes
Hash	35458a5381d903d28e54964a3f5b7b8cb681cbdd20e833fd307e19954c8310cf
SimHash	c6f72950c5d4

Groups

*

Rule	Path
Allow	/

Rule

Path

Allow

amazonbot

Rule	Path
Disallow	/

Rule

Path

Disallow

applebot-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

bytespider

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

claudebot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

*

Rule	Path
Disallow	/forum/Secret/

Rule

Path

Disallow

/forum/Secret/

*

Rule	Path
Disallow	/login/

Rule

Path

Disallow

/login/

*

Rule	Path
Disallow	/oauth/

Rule

Path

Disallow

/oauth/

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

ccbot

Rule	Path
Disallow	/

Rule

Path

Disallow

chatgpt-user

Rule	Path
Disallow	/

Rule

Path

Disallow

gptbot

Rule	Path
Disallow	/

Rule

Path

Disallow

google-extended

Rule	Path
Disallow	/

Rule

Path

Disallow

anthropic-ai

Rule	Path
Disallow	/

Rule

Path

Disallow

omgilibot

Rule	Path
Disallow	/

Rule

Path

Disallow

omgili

Rule	Path
Disallow	/

Rule

Path

Disallow

facebookbot

Rule	Path
Disallow	/

Rule

Path

Disallow

imagesiftbot

Rule	Path
Disallow	/

Rule

Path

Disallow

meta-externalagent

Rule	Path
Disallow	/

Rule

Path

Disallow

Comments

As a condition of accessing this website, you agree to abide by the following
content signals:
(a) If a content-signal = yes, you may collect content for the corresponding
use.
(b) If a content-signal = no, you may not collect content for the
corresponding use.
(c) If the website operator does not include a content signal for a
corresponding use, the website operator neither grants nor restricts
permission via content signal with respect to the corresponding use.
The content signals and their meanings are:
search: building a search index and providing search results (e.g., returning
hyperlinks and short excerpts from your website's contents). Search does not
include providing AI-generated search summaries.
ai-input: inputting content into one or more AI models (e.g., retrieval
augmented generation, grounding, or other real-time taking of content for
generative AI search answers).
ai-train: training or fine-tuning AI models.
ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF
AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
BEGIN Cloudflare Managed content
END Cloudflare Managed Content
GENERAL NOTICE TO ALL BOTS
If for some strange reason your bot wants to login and do things, that's cool.
However, your bot MUST respect session cookies like a normal browser. Else, I will prevent your bot from getting a cookie.
Due to these user agents below not following standard cookies, the following user agents are prevented from getting a session, therefore a cookie, and therefore able to login.
This is not a punishment so much as just prevent clogging the session store of you search engines that don't care about cookies, so why waste the space? :)
'msnbot', # Microsoft Bing search engine
'yahoo! slurp', # Yahoo! search engine
'googlebot', # Google search engine
'mediapartners', # Google AdSense spider
'feedfetcher-google', # Google Feedfetcher RSS fetcher
'teoma', # Ask Jeeves search engine
'wordpress', # Wordpress pingbacks
'baiduspider', # Chinese search/MP3 engine
'sparkflare', # Sparkflare feed fetcher for Campfire
'bingbot', # Bing changed
'downscout',
'ltx71',
'magpie-crawler', #Brandwatch? Dunno but they are MASSIVELY annoying and 100 times bigger than the next bot
'ahrefsbot',
'oauth',
'360spider',
'psbot'
See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
To ban all spiders from the entire site uncomment the next two lines:
User-Agent: *
Disallow: /

Warnings

`content-signal` is not a known field.

forum.starmen.netrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

amazonbot

applebot-extended

bytespider

ccbot

claudebot

google-extended

gptbot

meta-externalagent

*

*

*

mj12bot

ccbot

chatgpt-user

gptbot

google-extended

anthropic-ai

omgilibot

omgili

facebookbot

imagesiftbot

meta-externalagent

Comments

Warnings

forum.starmen.net
robots.txt