tug.org
robots.txt

Robots Exclusion Standard data for tug.org

Resource Scan

Scan Details

Site Domain tug.org
Base Domain tug.org
Scan Status Ok
Last Scan2024-05-21T07:29:41+00:00
Next Scan 2024-06-20T07:29:41+00:00

Last Scan

Scanned2024-05-21T07:29:41+00:00
URL https://tug.org/robots.txt
Domain IPs 46.4.94.215
Response IP 46.4.94.215
Found Yes
Hash ed3a5949ed00e50f18494ddbc1965ed3d803e04e1ab284b875d01a33b3b0dc6b
SimHash a884746b44b3

Groups

*

Rule Path
Disallow /RCS/
Disallow /cgi-bin
Disallow /devsrc/
Disallow /ftp/
Disallow /i-packages/
Disallow /images/
Disallow /proTeXt/
Disallow /svn/
Disallow /teTeX/
Disallow /tetex/release
Disallow /tetex/tetex-src
Disallow /tetex/tetex-texmfdist
Disallow /tetex/tetex-texmfmain
Disallow /tetex/texmf
Disallow /tetex/texmf-dist
Disallow /tetex/texmf-local
Disallow /texlive/Contents/
Disallow /texlive/Images/
Disallow /texlive/devsrc/
Disallow /texlive/svn/
Disallow /texmf/
Disallow /texmf-dist/
Disallow /twg/MacTeX/
Disallow /viewvc-docroot/
Disallow /~rlevien/scan/

Other Records

Field Value
crawl-delay 1

java/1.4.1_04

Rule Path
Disallow /

java/1.5.0_06

Rule Path
Disallow /

mozilla/4.0 (compatible ; msie 6.0; windows nt 5.1)

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; .net

Rule Path
Disallow /

java/1.5.0_04

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 6.0; windows 98)

Rule Path
Disallow /

java/1.5.0_02

Rule Path
Disallow /

linkwalker

Rule Path
Disallow /

java/1.4.2_03

Rule Path
Disallow /

java/1.5.0_03

Rule Path
Disallow /

java/1.5.0_05

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1)

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 5.0; windows nt)

Rule Path
Disallow /

java/1.5.0_07

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; sv1)

Rule Path
Disallow /

java/1.4.2_05

Rule Path
Disallow /

java/1.5.0_01

Rule Path
Disallow /

java/1.5.0_09

Rule Path
Disallow /

isc systems irc search 2.1

Rule Path
Disallow /

java/1.4.2_04

Rule Path
Disallow /

java/1.5.0

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; sv1;)

Rule Path
Disallow /

java/1.4.2_01

Rule Path
Disallow /

java/1.5.0_10

Rule Path
Disallow /

mozilla/4.0 (compatible; msie 5.01; windows nt 5.0)

Rule Path
Disallow /

crawler (cometsearch@cometsystems.com)

Rule Path
Disallow /

http://www.live.com/

Rule Path
Disallow /

http://search.msn.com/msnbot.htm

Rule Path
Disallow /

http://www.almaden.ibm.com/cs/crawler

Rule Path
Disallow /

arribapacketrat

Rule Path
Disallow /

autoemailspider

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

bilbo

Rule Path
Disallow /

digext

Rule Path
Disallow /

dloader(naverrobot)/1.0

Rule Path
Disallow /

dittospyder

Rule Path
Disallow /

dts agent

Rule Path
Disallow /

fast

Rule Path
Disallow /

getleft 1.1b2

Rule Path
Disallow /

girafa

Rule Path
Disallow /

gigabot/1.0

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

htmlab

Rule Path
Disallow /

httrack

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

imagevampire

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

mail sweeper

Rule Path
Disallow /

msie 6.0

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

netcaptor

Rule Path
Disallow /

nitle blog spider/0.01

Rule Path
Disallow /

npbot

Rule Path
Disallow /

nutch

Rule Path
Disallow /

obot

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

psbot

Rule Path
Disallow /

quepasacreep v0.9.13

Rule Path
Disallow /

searchpreview

Rule Path
Disallow /

scooter/3.3

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

spiderku/0.9

Rule Path
Disallow /

steeler

Rule Path
Disallow /

surveybot/2.3

Rule Path
Disallow /

szukacz

Rule Path
Disallow /

szukacz/1.9

Rule Path
Disallow /

szukacz/1.10.2

Rule Path
Disallow /

teoma

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

vagabondo/2.1

Rule Path
Disallow /

vischeck_spiderbot/0.1libwww-perl/5.48

Rule Path
Disallow /

vscooter

Rule Path
Disallow /

webcopier v3.3

Rule Path
Disallow /

webcopier v3.2a

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

webcrawler

Rule Path
Disallow /

web downloader/4.9

Rule Path
Disallow /

web downloader/5.8

Rule Path
Disallow /

webgather 3.0

Rule Path
Disallow /

webstripper/2.56

Rule Path
Disallow /

webzip/3.65

Rule Path
Disallow /

webzip

Rule Path
Disallow /

yahooseeker/m1a1-r2d2

Rule Path
Disallow /

yahoo! slurp

Rule Path
Disallow /

yahoo-mmcrawler

Rule Path
Disallow /

yeti

Rule Path
Disallow /

zao

Rule Path
Disallow /

zeus 2.6

Rule Path
Disallow /

converacrawler

Rule Path
Disallow /

msnbot-media/1.0

Rule Path
Disallow /

megaindex

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

Comments

  • $Id: robots.txt,v 1.9 2024/02/05 23:03:35 karl Exp $
  • Robots.txt file
  • validate at http://www.sxw.org.uk/computing/robots/check.html
  • always disallow certain big or useless directories, and assorted symlinks.
  • Mostly from Project Honey Pot
  • User-agent: Wget
  • Disallow: /

Warnings

  • 16 invalid lines.