gnlug.org
robots.txt

Robots Exclusion Standard data for gnlug.org

Resource Scan

Scan Details

Site Domain gnlug.org
Base Domain gnlug.org
Scan Status Ok
Last Scan2025-10-24T06:56:36+00:00
Next Scan 2025-11-23T06:56:36+00:00

Last Scan

Scanned2025-10-24T06:56:36+00:00
URL https://gnlug.org/robots.txt
Domain IPs 2a01:4ff:f0:dd3a::1, 5.161.180.234
Response IP 5.161.180.234
Found Yes
Hash dcb3ab319c4f046f0f5e95308842d281aa8390f386b8bd43ad2e6a49f901fb22
SimHash 76715a00c1c0

Groups

*

Rule Path
Disallow /panel/
Disallow /bin/
Disallow /conf/
Disallow /data/
Disallow /inc/
Disallow /lib/
Disallow /vendor/
Disallow /.htaccess
Disallow /.htaccess.dist
Disallow /COPYING
Disallow /README
Disallow /SECURITY.md
Disallow /VERSION
Disallow /alrojovivo.html
Disallow /composer.json
Disallow /composer.lock
Disallow /index_old.html
Disallow /mc-legacy.html
Disallow /mc-player-counter.min.js
Disallow /mc.html

gptbot
claudebot
claude-web
ccbot
googlebot-extended
applebot-extended
facebookbot
meta-externalagent
meta-externalfetcher
diffbot
perplexitybot
omgili
omgilibot
webzio-extended
imagesiftbot
bytespider
amazonbot
youbot
semrushbot-ocob
petalbot
velenpublicwebcrawler
turnitinbot
timpibot
oai-searchbot
icc-crawler
ai2bot
ai2bot-dolma
dataforseobot
awariobot
awariosmartbot
awariorssbot
google-cloudvertexbot
pangubot
kangaroo bot
sentibot
img2dataset
meltwater
seekr
peer39_crawler
cohere-ai
cohere-training-data-crawler
duckassistbot
scrapy

Rule Path
Disallow /

*

No rules defined. All paths allowed.

Comments

  • Block all known AI crawlers and assistants
  • from using content for training AI models.
  • Block any non-specified AI crawlers (e.g., new
  • or unknown bots) from using content for training
  • AI models. This directive is still experimental
  • and may not be supported by all AI crawlers.

Warnings

  • `disallowaitraining` is not a known field.