alpine-club.org.uk
robots.txt

Robots Exclusion Standard data for alpine-club.org.uk

Resource Scan

Scan Details

Site Domain alpine-club.org.uk
Base Domain alpine-club.org.uk
Scan Status Ok
Last Scan2024-09-19T10:23:29+00:00
Next Scan 2024-10-19T10:23:29+00:00

Last Scan

Scanned2024-09-19T10:23:29+00:00
URL http://alpine-club.org.uk/robots.txt
Domain IPs 79.170.44.88
Response IP 79.170.44.88
Found Yes
Hash da734f9442e0969e41f382a320cbcb5f7ec0c55531673cbcfaa5e9ab112b4286
SimHash 021cdd590c7b

Groups

panscient.com

Rule Path
Disallow /

vscooter

Rule Path
Disallow /

psbot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

twiceler

Rule Path
Disallow /

yandex

Rule Path
Disallow /

taptubot

Rule Path
Disallow /

googlebot-image

Rule Path
Disallow /

twengabot

Rule Path
Disallow /

sitebot

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

ezooms

Rule Path
Disallow /

sistrix

Rule Path
Disallow /

aihitbot

Rule Path
Disallow /

infopath

Rule Path
Disallow /

infopath.2

Rule Path
Disallow /

swebot

Rule Path
Disallow /

ec2linkfinder

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

searchmetericsbot

Rule Path
Disallow /

wbsearchbot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

sosospider

Rule Path
Disallow /

ip-web-crawler.com

Rule Path
Disallow /

slurp

Rule Path
Disallow /*.jpg
Disallow /*.JPG
Disallow /*.png
Disallow /*.PDF
Disallow /*.pdf
Disallow /disclaimer.html
Disallow /security.html
Disallow /poweredby.html
Disallow /about_smythies.html
Disallow /unused_link.html
Disallow /old_pages.html
Disallow /index_0*
Disallow /administrator/
Disallow /bin/
Disallow /cache/
Disallow /cli/
Disallow /components/
Disallow /includes/
Disallow /installation/
Disallow /language/
Disallow /layouts/
Disallow /libraries/
Disallow /logs/
Disallow /modules/
Disallow /plugins/
Disallow /tmp/
Disallow /digital_camera/
Disallow /lab/
Disallow /xmas_*
Disallow /~doug/archives/

googlebot

Rule Path
Disallow /*.jpg$
Disallow /*.JPG$
Disallow /*.png$
Disallow /*.PDF$
Disallow /*.pdf$
Disallow /index_0*$
Disallow /*index_0*$
Disallow /xmas_*
Disallow /~doug/archives/
Disallow /~doug/2010.01.23/
Disallow /~doug/2007.11.20/
Disallow /~doug/2004.06.26/
Disallow /digital_camera/
Disallow /old_pages.html
Disallow /unused_link.html
Disallow /disclaimer.html
Disallow /security.html
Disallow /about_smythies.html
Disallow /poweredby.html
Disallow /*.MOV
Disallow /*.mov
Disallow /*.AVI
Disallow /*.avi
Disallow /DSCN*.htm
Disallow /lectures/
Disallow /library/
Disallow /join/
Disallow /alpineclub/
Disallow /publications/
Disallow /notices/

msnbot

Rule Path
Disallow /*.jpg$
Disallow /*.JPG
Disallow /*.png$
Disallow /*.PDF$
Disallow /*.pdf$
Disallow /disclaimer.html
Disallow /security.html
Disallow /poweredby.html
Disallow /about_smythies.html
Disallow /unused_link.html
Disallow /old_pages.html
Disallow /index_0*
Disallow /*index_0*$
Disallow /digital_camera/
Disallow /lab/
Disallow /xmas_*
Disallow /~doug/archives/

*

Rule Path
Disallow /*.jpg
Disallow /*.JPG
Disallow /*.png
Disallow /*.PDF
Disallow /*.pdf
Disallow /disclaimer.html
Disallow /security.html
Disallow /poweredby.html
Disallow /about_smythies.html
Disallow /unused_link.html
Disallow /old_pages.html
Disallow /index_0*
Disallow /*index_0*$
Disallow /digital_camera/
Disallow /lab/
Disallow /xmas_*
Disallow /~doug/archives/

*

Rule Path
Disallow /_mm/
Disallow /_notes/
Disallow /_baks/
Disallow /MMWIP/
Disallow /*.LCK
Disallow /*.bak
Disallow /*.csi
Disallow /*.mno

googlebot

Rule Path
Disallow *.csi

*

Rule Path
Disallow /administrator/
Disallow /bin/
Disallow /cache/
Disallow /cli/
Disallow /components/
Disallow /includes/
Disallow /installation/
Disallow /language/
Disallow /layouts/
Disallow /libraries/
Disallow /logs/
Disallow /modules/
Disallow /plugins/
Disallow /tmp/

Other Records

Field Value
crawl-delay 2

Comments

  • robots.txt 2013.04.25
  • disallow ip-web-crawler.com. It crawls way too fast and while
  • it claims to obey robtos.txt directives, it does not.
  • If it doesn't obey the disallow, then an iptables drop
  • 50.31.96.6 - 50.31.96.12 could be used
  • robots.txt 2013.04.17
  • add some dissallow stuff for specific file extensions.
  • Somehow I missed it before.
  • robots.txt 2013.04.04
  • disallow Sosospider. Any web crawler that is too stupid to know the
  • difference between upper and lower case is not worthy.
  • robots.txt 2013.02.28
  • disallow Exabot. I wonder if the resulting search engine
  • database is the reason I get so many forged referrer
  • hits.
  • robots.txt 2012.10.08
  • disallow WBSearchBot.
  • robots.txt 2012.09.02
  • disallow SearchmetricsBot. It is mentally challenged.
  • robots.txt 2012.05.03
  • disallow TurnitinBot. It is mentally challenged.
  • robots.txt 2012.03.29
  • disallow EC2LinkFinder. I do not know if it obeys robots.txt, but I wll try.
  • For sure it ignores most robots.txt directives. It copies everything, hogging
  • bandwidth.
  • It is time to think of a generic deny, to cover all these new bots.
  • robots.txt 2012.03.13
  • disallow SWEBot. It is not polite and disobaeys robots.txt file.
  • robots.txt 2012.01.29
  • disallow aiHitBot
  • Try a useragent "InfoPath" and "InfoPath.2" dissallow. (Another MS thing.)
  • I am trying to get rid of what appears to be a tracking site.
  • 80.40.134.103, .104, .120, seem to track 92.9.131.199 and 92.9.150.29 and ...
  • 80.40.134.XXX does read the robots.txt file.
  • robots.txt 2012.01.04
  • SISTRIX crawler does not behave well. It ignores meta tags and some robots.txt directives.
  • Disallow it.
  • robots.txt 2011.12.01
  • Try to get rid of Ezooms bot, although it is not clear what its exact user agent name is.
  • (Days later: "User-agent: Ezooms" seems to work, but it takes a few days.)
  • It ignores meta tags, and has become generally annoying.
  • robots.txt 2011.09.26
  • Until now I have allowed Baiduspider. But it has gone mental and also ignores some meta tags.
  • Disallow it.
  • A new robot, AhrefsBot, does not behave or obey meta tags.
  • Disallow it.
  • robots.txt 2011.06.19
  • robots.txt 2011.04.12
  • Googlebot is so very very severely mentally challenged.
  • It ignores the NOFOLLOW meta tag.
  • Try to block useless content from being indexed via, yet another,
  • block command.
  • It is still looking for pages that haven't been there for over a year now.
  • (see 2010.04.29)
  • robots.txt 2010.10.14
  • Eliminate crawl delay for Yahoo slurp (see 2007.03.13)
  • robots.txt 2010.09.20
  • TwengaBot is severely mentally challenged. Try global disallow for it.
  • Googlebot is still annoying and accessing pages it shouldn't.
  • robots.txt 2010.04.29
  • Googlebot is very severely mentally challenged.
  • Add disallow directives for directories that are not even there,
  • and haven't been for over 5 weeks now.
  • This is merely to try to get around having my request to delete the
  • non-existant directories from the search database being denied.
  • robots.txt 2010.04.16
  • Add specific directives for exabot, including a crawl delay.
  • Reduce the slurp (Yahoo) crawl delay (which it doesn't seem to obey anyhow).
  • Disallow googlebot-image.
  • robots.txt 2010.04.13
  • disallow taptubot, the mobile device crawler
  • robots.txt 2010.04.01
  • Yet another attempt to get web crawlers not to index old versions of index.html files.
  • All old version are called index_0???.html.
  • robots.txt 2010.03.19
  • Archives have been moved to a seperate directory. Add disallow directive.
  • robots.txt 2010.02.10
  • The Yandex web crawler behaves in a very strange manor. Block it.
  • Ask Robots not to copy PDF files.
  • robots.txt 2009.12.07
  • Fix some syntax based on feedback from http://tool.motoricerca.info/robots-checker.phtml
  • robots.txt 2009.12.04
  • There are still issues with googlebot. I don't want old versions of index.html
  • type pages indexed, but I do want the photoshop elements generated pages indexed.
  • Try some new directives.
  • robots.txt 2009.09.09
  • Googlebot is not ignoring the rebuilt directory and is obtaining .MOV videos.
  • Add some more googlebot specific directives.
  • robots.txt 2009.07.27
  • Googlebot directives are case sensitive. Add .JPG to .jpg ignore directives.
  • Googlebot is not ignoring old index pages as global directive indicates to. Try a googlebot
  • specific directive.
  • robots.txt 2009.04.12
  • Some robots, for example googlebot, obey global directives as well as googlebot specific directives.
  • Other robots, for example slurp (Yahoo) and msnbot, only obey their specific directives.
  • The robots.txt standard is rather weak, incomplete, and generally annoying.
  • Add tons of the same specific directives to each robot area.
  • Try to change no index Christmas pages to include a wildcard.
  • robots.txt 2008.12.03ser-agent: *
  • Block the Cuil (twiceler) robot entirely.
  • robots.txt 2008.11.23
  • The majestic robot comes in bursts at a high rate. Just block it.
  • The Cuil robot comes to much. Try to slow it down.
  • robots.txt 2008.07.03
  • Now msnbot has started to grab images. Try to stop it.
  • Googlebot is grabbing PNG files. Try to stop it.
  • robots.txt 2007.11.20
  • Try to disallow the panscient.com web crawler.
  • ser-agent: *
  • robots.txt 2007.08.23
  • Still search engine pages do not agree with contents of robots.txt file.
  • Add specific disallow for ~doug/rebuilt.
  • - put global user agent lines after specific ones.
  • - next will be to repeat global lines in each specific agent area.
  • robots.txt 2007.05.03
  • Now Googlebot has started to grab images. Try to stop it.
  • For whatever reason, google is mainly showing my re-built directory. It
  • never seems to go back to the higher level page that now has meta tags
  • telling it not to index those pages. Put in a global disallow.
  • Add some other global disallows, that I got behind on.
  • robots.txt 2007.03.13
  • stupid yahoo slurp comes all the time now. It supports a non-standard delay command.
  • so add the command. The web site doesn't state the units of measure.
  • robots.txt 2007.02.11
  • yahoo, slurp seems to now obey the non-standard ignore this type of file wildcard usage
  • try it.
  • robots.txt 2006.12.29
  • Delete instructions for directories that don't exist anymore
  • robots.txt 2004:12:21
  • Try to eliminate yahoo.com grabbing images.
  • Can only think of global deny.
  • Can not find Yahoo name, try one shown below.
  • robots.txt 2004:11:16
  • Try to eliminate alexa.com grabbing images.
  • InkTomi comes too often, can them entirely.
  • robots.txt 2004:07:16
  • Try to eliminate picsearch.com grabbing images.
  • robots.txt 2004:07:09
  • Try to eliminate altavista grabbing images.
  • robots.txt for www.smythies.com 2003:12:21

Warnings

  • 29 invalid lines.
  • `ser-agent` is not a known field.