alpine-club.org.uk
robots.txt

Robots Exclusion Standard data for alpine-club.org.uk

Archived Snapshots

Resource Scan

Scan Details

Site Domain	alpine-club.org.uk
Base Domain	alpine-club.org.uk
Scan Status	Ok
Last Scan	2024-09-19T10:23:29+00:00
Next Scan	2024-10-19T10:23:29+00:00

Last Scan

Scanned	2024-09-19T10:23:29+00:00
URL	http://alpine-club.org.uk/robots.txt
Domain IPs	79.170.44.88
Response IP	79.170.44.88
Found	Yes
Hash	da734f9442e0969e41f382a320cbcb5f7ec0c55531673cbcfaa5e9ab112b4286
SimHash	021cdd590c7b

Groups

panscient.com

Rule	Path
Disallow	/

Rule

Path

Disallow

vscooter

Rule	Path
Disallow	/

Rule

Path

Disallow

psbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ia_archiver

Rule	Path
Disallow	/

Rule

Path

Disallow

mj12bot

Rule	Path
Disallow	/

Rule

Path

Disallow

twiceler

Rule	Path
Disallow	/

Rule

Path

Disallow

yandex

Rule	Path
Disallow	/

Rule

Path

Disallow

taptubot

Rule	Path
Disallow	/

Rule

Path

Disallow

googlebot-image

Rule	Path
Disallow	/

Rule

Path

Disallow

twengabot

Rule	Path
Disallow	/

Rule

Path

Disallow

sitebot

Rule	Path
Disallow	/

Rule

Path

Disallow

baiduspider

Rule	Path
Disallow	/

Rule

Path

Disallow

ahrefsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

ezooms

Rule	Path
Disallow	/

Rule

Path

Disallow

sistrix

Rule	Path
Disallow	/

Rule

Path

Disallow

aihitbot

Rule	Path
Disallow	/

Rule

Path

Disallow

infopath

Rule	Path
Disallow	/

Rule

Path

Disallow

infopath.2

Rule	Path
Disallow	/

Rule

Path

Disallow

swebot

Rule	Path
Disallow	/

Rule

Path

Disallow

ec2linkfinder

Rule	Path
Disallow	/

Rule

Path

Disallow

turnitinbot

Rule	Path
Disallow	/

Rule

Path

Disallow

searchmetericsbot

Rule	Path
Disallow	/

Rule

Path

Disallow

wbsearchbot

Rule	Path
Disallow	/

Rule

Path

Disallow

exabot

Rule	Path
Disallow	/

Rule

Path

Disallow

sosospider

Rule	Path
Disallow	/

Rule

Path

Disallow

ip-web-crawler.com

Rule	Path
Disallow	/

Rule

Path

Disallow

slurp

Rule	Path
Disallow	/*.jpg
Disallow	/*.JPG
Disallow	/*.png
Disallow	/*.PDF
Disallow	/*.pdf
Disallow	/disclaimer.html
Disallow	/security.html
Disallow	/poweredby.html
Disallow	/about_smythies.html
Disallow	/unused_link.html
Disallow	/old_pages.html
Disallow	/index_0*
Disallow	/administrator/
Disallow	/bin/
Disallow	/cache/
Disallow	/cli/
Disallow	/components/
Disallow	/includes/
Disallow	/installation/
Disallow	/language/
Disallow	/layouts/
Disallow	/libraries/
Disallow	/logs/
Disallow	/modules/
Disallow	/plugins/
Disallow	/tmp/
Disallow	/digital_camera/
Disallow	/lab/
Disallow	/xmas_*
Disallow	/~doug/archives/

Rule

Path

Disallow

/*.jpg

Disallow

/*.JPG

Disallow

/*.png

Disallow

/*.PDF

Disallow

/*.pdf

Disallow

/disclaimer.html

Disallow

/security.html

Disallow

/poweredby.html

Disallow

/about_smythies.html

Disallow

/unused_link.html

Disallow

/old_pages.html

Disallow

/index_0*

Disallow

/administrator/

Disallow

/bin/

Disallow

/cache/

Disallow

/cli/

Disallow

/components/

Disallow

/includes/

Disallow

/installation/

Disallow

/language/

Disallow

/layouts/

Disallow

/libraries/

Disallow

/logs/

Disallow

/modules/

Disallow

/plugins/

Disallow

/tmp/

Disallow

/digital_camera/

Disallow

/lab/

Disallow

/xmas_*

Disallow

/~doug/archives/

googlebot

Rule	Path
Disallow	/*.jpg$
Disallow	/*.JPG$
Disallow	/*.png$
Disallow	/*.PDF$
Disallow	/*.pdf$
Disallow	/index_0*$
Disallow	/index_0$
Disallow	/xmas_*
Disallow	/~doug/archives/
Disallow	/~doug/2010.01.23/
Disallow	/~doug/2007.11.20/
Disallow	/~doug/2004.06.26/
Disallow	/digital_camera/
Disallow	/old_pages.html
Disallow	/unused_link.html
Disallow	/disclaimer.html
Disallow	/security.html
Disallow	/about_smythies.html
Disallow	/poweredby.html
Disallow	/*.MOV
Disallow	/*.mov
Disallow	/*.AVI
Disallow	/*.avi
Disallow	/DSCN*.htm
Disallow	/lectures/
Disallow	/library/
Disallow	/join/
Disallow	/alpineclub/
Disallow	/publications/
Disallow	/notices/

Rule

Path

Disallow

/*.jpg$

Disallow

/*.JPG$

Disallow

/*.png$

Disallow

/*.PDF$

Disallow

/*.pdf$

Disallow

/index_0*$

Disallow

/*index_0*$

Disallow

/xmas_*

Disallow

/~doug/archives/

Disallow

/~doug/2010.01.23/

Disallow

/~doug/2007.11.20/

Disallow

/~doug/2004.06.26/

Disallow

/digital_camera/

Disallow

/old_pages.html

Disallow

/unused_link.html

Disallow

/disclaimer.html

Disallow

/security.html

Disallow

/about_smythies.html

Disallow

/poweredby.html

Disallow

/*.MOV

Disallow

/*.mov

Disallow

/*.AVI

Disallow

/*.avi

Disallow

/DSCN*.htm

Disallow

/lectures/

Disallow

/library/

Disallow

/join/

Disallow

/alpineclub/

Disallow

/publications/

Disallow

/notices/

msnbot

Rule	Path
Disallow	/*.jpg$
Disallow	/*.JPG
Disallow	/*.png$
Disallow	/*.PDF$
Disallow	/*.pdf$
Disallow	/disclaimer.html
Disallow	/security.html
Disallow	/poweredby.html
Disallow	/about_smythies.html
Disallow	/unused_link.html
Disallow	/old_pages.html
Disallow	/index_0*
Disallow	/index_0$
Disallow	/digital_camera/
Disallow	/lab/
Disallow	/xmas_*
Disallow	/~doug/archives/

Rule

Path

Disallow

/*.jpg$

Disallow

/*.JPG

Disallow

/*.png$

Disallow

/*.PDF$

Disallow

/*.pdf$

Disallow

/disclaimer.html

Disallow

/security.html

Disallow

/poweredby.html

Disallow

/about_smythies.html

Disallow

/unused_link.html

Disallow

/old_pages.html

Disallow

/index_0*

Disallow

/*index_0*$

Disallow

/digital_camera/

Disallow

/lab/

Disallow

/xmas_*

Disallow

/~doug/archives/

*

Rule	Path
Disallow	/*.jpg
Disallow	/*.JPG
Disallow	/*.png
Disallow	/*.PDF
Disallow	/*.pdf
Disallow	/disclaimer.html
Disallow	/security.html
Disallow	/poweredby.html
Disallow	/about_smythies.html
Disallow	/unused_link.html
Disallow	/old_pages.html
Disallow	/index_0*
Disallow	/index_0$
Disallow	/digital_camera/
Disallow	/lab/
Disallow	/xmas_*
Disallow	/~doug/archives/

Rule

Path

Disallow

/*.jpg

Disallow

/*.JPG

Disallow

/*.png

Disallow

/*.PDF

Disallow

/*.pdf

Disallow

/disclaimer.html

Disallow

/security.html

Disallow

/poweredby.html

Disallow

/about_smythies.html

Disallow

/unused_link.html

Disallow

/old_pages.html

Disallow

/index_0*

Disallow

/*index_0*$

Disallow

/digital_camera/

Disallow

/lab/

Disallow

/xmas_*

Disallow

/~doug/archives/

*

Rule	Path
Disallow	/_mm/
Disallow	/_notes/
Disallow	/_baks/
Disallow	/MMWIP/
Disallow	/*.LCK
Disallow	/*.bak
Disallow	/*.csi
Disallow	/*.mno

Rule

Path

Disallow

/_mm/

Disallow

/_notes/

Disallow

/_baks/

Disallow

/MMWIP/

Disallow

/*.LCK

Disallow

/*.bak

Disallow

/*.csi

Disallow

/*.mno

googlebot

Rule	Path
Disallow	*.csi

Rule

Path

Disallow

*.csi

*

Rule	Path
Disallow	/administrator/
Disallow	/bin/
Disallow	/cache/
Disallow	/cli/
Disallow	/components/
Disallow	/includes/
Disallow	/installation/
Disallow	/language/
Disallow	/layouts/
Disallow	/libraries/
Disallow	/logs/
Disallow	/modules/
Disallow	/plugins/
Disallow	/tmp/

Rule

Path

Disallow

/administrator/

Disallow

/bin/

Disallow

/cache/

Disallow

/cli/

Disallow

/components/

Disallow

/includes/

Disallow

/installation/

Disallow

/language/

Disallow

/layouts/

Disallow

/libraries/

Disallow

/logs/

Disallow

/modules/

Disallow

/plugins/

Disallow

/tmp/

Other Records

Field	Value
crawl-delay	2

Field

Value

crawl-delay

Comments

robots.txt 2013.04.25
disallow ip-web-crawler.com. It crawls way too fast and while
it claims to obey robtos.txt directives, it does not.
If it doesn't obey the disallow, then an iptables drop
50.31.96.6 - 50.31.96.12 could be used
robots.txt 2013.04.17
add some dissallow stuff for specific file extensions.
Somehow I missed it before.
robots.txt 2013.04.04
disallow Sosospider. Any web crawler that is too stupid to know the
difference between upper and lower case is not worthy.
robots.txt 2013.02.28
disallow Exabot. I wonder if the resulting search engine
database is the reason I get so many forged referrer
hits.
robots.txt 2012.10.08
disallow WBSearchBot.
robots.txt 2012.09.02
disallow SearchmetricsBot. It is mentally challenged.
robots.txt 2012.05.03
disallow TurnitinBot. It is mentally challenged.
robots.txt 2012.03.29
disallow EC2LinkFinder. I do not know if it obeys robots.txt, but I wll try.
For sure it ignores most robots.txt directives. It copies everything, hogging
bandwidth.
It is time to think of a generic deny, to cover all these new bots.
robots.txt 2012.03.13
disallow SWEBot. It is not polite and disobaeys robots.txt file.
robots.txt 2012.01.29
disallow aiHitBot
Try a useragent "InfoPath" and "InfoPath.2" dissallow. (Another MS thing.)
I am trying to get rid of what appears to be a tracking site.
80.40.134.103, .104, .120, seem to track 92.9.131.199 and 92.9.150.29 and ...
80.40.134.XXX does read the robots.txt file.
robots.txt 2012.01.04
SISTRIX crawler does not behave well. It ignores meta tags and some robots.txt directives.
Disallow it.
robots.txt 2011.12.01
Try to get rid of Ezooms bot, although it is not clear what its exact user agent name is.
(Days later: "User-agent: Ezooms" seems to work, but it takes a few days.)
It ignores meta tags, and has become generally annoying.
robots.txt 2011.09.26
Until now I have allowed Baiduspider. But it has gone mental and also ignores some meta tags.
Disallow it.
A new robot, AhrefsBot, does not behave or obey meta tags.
Disallow it.
robots.txt 2011.06.19
robots.txt 2011.04.12
Googlebot is so very very severely mentally challenged.
It ignores the NOFOLLOW meta tag.
Try to block useless content from being indexed via, yet another,
block command.
It is still looking for pages that haven't been there for over a year now.
(see 2010.04.29)
robots.txt 2010.10.14
Eliminate crawl delay for Yahoo slurp (see 2007.03.13)
robots.txt 2010.09.20
TwengaBot is severely mentally challenged. Try global disallow for it.
Googlebot is still annoying and accessing pages it shouldn't.
robots.txt 2010.04.29
Googlebot is very severely mentally challenged.
Add disallow directives for directories that are not even there,
and haven't been for over 5 weeks now.
This is merely to try to get around having my request to delete the
non-existant directories from the search database being denied.
robots.txt 2010.04.16
Add specific directives for exabot, including a crawl delay.
Reduce the slurp (Yahoo) crawl delay (which it doesn't seem to obey anyhow).
Disallow googlebot-image.
robots.txt 2010.04.13
disallow taptubot, the mobile device crawler
robots.txt 2010.04.01
Yet another attempt to get web crawlers not to index old versions of index.html files.
All old version are called index_0???.html.
robots.txt 2010.03.19
Archives have been moved to a seperate directory. Add disallow directive.
robots.txt 2010.02.10
The Yandex web crawler behaves in a very strange manor. Block it.
Ask Robots not to copy PDF files.
robots.txt 2009.12.07
Fix some syntax based on feedback from http://tool.motoricerca.info/robots-checker.phtml
robots.txt 2009.12.04
There are still issues with googlebot. I don't want old versions of index.html
type pages indexed, but I do want the photoshop elements generated pages indexed.
Try some new directives.
robots.txt 2009.09.09
Googlebot is not ignoring the rebuilt directory and is obtaining .MOV videos.
Add some more googlebot specific directives.
robots.txt 2009.07.27
Googlebot directives are case sensitive. Add .JPG to .jpg ignore directives.
Googlebot is not ignoring old index pages as global directive indicates to. Try a googlebot
specific directive.
robots.txt 2009.04.12
Some robots, for example googlebot, obey global directives as well as googlebot specific directives.
Other robots, for example slurp (Yahoo) and msnbot, only obey their specific directives.
The robots.txt standard is rather weak, incomplete, and generally annoying.
Add tons of the same specific directives to each robot area.
Try to change no index Christmas pages to include a wildcard.
robots.txt 2008.12.03ser-agent: *
Block the Cuil (twiceler) robot entirely.
robots.txt 2008.11.23
The majestic robot comes in bursts at a high rate. Just block it.
The Cuil robot comes to much. Try to slow it down.
robots.txt 2008.07.03
Now msnbot has started to grab images. Try to stop it.
Googlebot is grabbing PNG files. Try to stop it.
robots.txt 2007.11.20
Try to disallow the panscient.com web crawler.
ser-agent: *
robots.txt 2007.08.23
Still search engine pages do not agree with contents of robots.txt file.
Add specific disallow for ~doug/rebuilt.
- put global user agent lines after specific ones.
- next will be to repeat global lines in each specific agent area.
robots.txt 2007.05.03
Now Googlebot has started to grab images. Try to stop it.
For whatever reason, google is mainly showing my re-built directory. It
never seems to go back to the higher level page that now has meta tags
telling it not to index those pages. Put in a global disallow.
Add some other global disallows, that I got behind on.
robots.txt 2007.03.13
stupid yahoo slurp comes all the time now. It supports a non-standard delay command.
so add the command. The web site doesn't state the units of measure.
robots.txt 2007.02.11
yahoo, slurp seems to now obey the non-standard ignore this type of file wildcard usage
try it.
robots.txt 2006.12.29
Delete instructions for directories that don't exist anymore
robots.txt 2004:12:21
Try to eliminate yahoo.com grabbing images.
Can only think of global deny.
Can not find Yahoo name, try one shown below.
robots.txt 2004:11:16
Try to eliminate alexa.com grabbing images.
InkTomi comes too often, can them entirely.
robots.txt 2004:07:16
Try to eliminate picsearch.com grabbing images.
robots.txt 2004:07:09
Try to eliminate altavista grabbing images.
robots.txt for www.smythies.com 2003:12:21

Warnings

29 invalid lines.
`ser-agent` is not a known field.

alpine-club.org.ukrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

panscient.com

vscooter

psbot

ia_archiver

mj12bot

twiceler

yandex

taptubot

googlebot-image

twengabot

sitebot

baiduspider

ahrefsbot

ezooms

sistrix

aihitbot

infopath

infopath.2

swebot

ec2linkfinder

turnitinbot

searchmetericsbot

wbsearchbot

exabot

sosospider

ip-web-crawler.com

slurp

googlebot

msnbot

*

*

googlebot

*

Other Records

Comments

Warnings

alpine-club.org.uk
robots.txt