gnlug.org
robots.txt

Robots Exclusion Standard data for gnlug.org

Archived Snapshots

Resource Scan

Scan Details

Site Domain	gnlug.org
Base Domain	gnlug.org
Scan Status	Ok
Last Scan	2025-10-24T06:56:36+00:00
Next Scan	2025-11-23T06:56:36+00:00

Last Scan

Scanned	2025-10-24T06:56:36+00:00
URL	https://gnlug.org/robots.txt
Domain IPs	2a01:4ff:f0:dd3a::1, 5.161.180.234
Response IP	5.161.180.234
Found	Yes
Hash	dcb3ab319c4f046f0f5e95308842d281aa8390f386b8bd43ad2e6a49f901fb22
SimHash	76715a00c1c0

Groups

*

Rule	Path
Disallow	/panel/
Disallow	/bin/
Disallow	/conf/
Disallow	/data/
Disallow	/inc/
Disallow	/lib/
Disallow	/vendor/
Disallow	/.htaccess
Disallow	/.htaccess.dist
Disallow	/COPYING
Disallow	/README
Disallow	/SECURITY.md
Disallow	/VERSION
Disallow	/alrojovivo.html
Disallow	/composer.json
Disallow	/composer.lock
Disallow	/index_old.html
Disallow	/mc-legacy.html
Disallow	/mc-player-counter.min.js
Disallow	/mc.html

Rule

Path

Disallow

/panel/

Disallow

/bin/

Disallow

/conf/

Disallow

/data/

Disallow

/inc/

Disallow

/lib/

Disallow

/vendor/

Disallow

/.htaccess

Disallow

/.htaccess.dist

Disallow

/COPYING

Disallow

/README

Disallow

/SECURITY.md

Disallow

/VERSION

Disallow

/alrojovivo.html

Disallow

/composer.json

Disallow

/composer.lock

Disallow

/index_old.html

Disallow

/mc-legacy.html

Disallow

/mc-player-counter.min.js

Disallow

/mc.html

gptbot
claudebot
claude-web
ccbot
googlebot-extended
applebot-extended
facebookbot
meta-externalagent
meta-externalfetcher
diffbot
perplexitybot
omgili
omgilibot
webzio-extended
imagesiftbot
bytespider
amazonbot
youbot
semrushbot-ocob
petalbot
velenpublicwebcrawler
turnitinbot
timpibot
oai-searchbot
icc-crawler
ai2bot
ai2bot-dolma
dataforseobot
awariobot
awariosmartbot
awariorssbot
google-cloudvertexbot
pangubot
kangaroo bot
sentibot
img2dataset
meltwater
seekr
peer39_crawler
cohere-ai
cohere-training-data-crawler
duckassistbot
scrapy

Rule	Path
Disallow	/

Rule

Path

Disallow

/

*

No rules defined. All paths allowed.

Back to top

Comments

Block all known AI crawlers and assistants
from using content for training AI models.
Block any non-specified AI crawlers (e.g., new
or unknown bots) from using content for training
AI models. This directive is still experimental
and may not be supported by all AI crawlers.

Back to top

Warnings

`disallowaitraining` is not a known field.

Back to top

gnlug.orgrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

*

Comments

Warnings

gnlug.org
robots.txt