cool-web.de
robots.txt

Robots Exclusion Standard data for cool-web.de

Resource Scan

Scan Details

Site Domain cool-web.de
Base Domain cool-web.de
Scan Status Ok
Last Scan2024-10-24T01:57:12+00:00
Next Scan 2024-11-23T01:57:12+00:00

Last Scan

Scanned2024-10-24T01:57:12+00:00
URL https://cool-web.de/robots.txt
Domain IPs 89.58.38.147
Response IP 89.58.38.147
Found Yes
Hash f11a56a0aeb59ebc9e77264ca5053e3841a8cad61c1b66f681ef4ac38563af91
SimHash 3a563d364be6

Groups

*

Rule Path
Disallow /admin/
Disallow /cgi-bin/
Disallow /css/
Disallow /domains/
Disallow /exchange/
Disallow /files/
Disallow /forms/
Disallow /fonts/
Disallow /gc/
Disallow /images/
Disallow /img/
Disallow /intern/
Disallow /internal/
Disallow /js/
Disallow /mailsystem/
Disallow /perl/
Disallow /php/
Disallow /privat/
Disallow /private/
Disallow /profil/
Disallow /temp/
Disallow /TEMP/
Disallow /tmp/
Disallow /test/
Disallow /shops/
Disallow /veraltet/
Disallow /_veraltet/
Disallow /_*/
Disallow /tools/
Disallow /uploads/
Disallow /users/
Disallow /webmail/
Disallow /*.swf
Disallow /*.log
Disallow /*.bak
Disallow /*.sid
Disallow /*.mod
Disallow /*.mid
Disallow *BotDetectCaptcha.ashx*
Disallow /WebResource*
Disallow /%28*
Disallow *WebResource.axd*
Disallow *usersendpass.aspx*
Disallow */base.aspx
Disallow */sitemap.ashx

googlebot

Rule Path
Allow /css/
Disallow /js/

turnitinbot

Rule Path
Disallow /

slysearch

Rule Path
Disallow /

findlinks

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

pixray-seeker

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

ezooms

Rule Path
Disallow /

ahrefsbot

Rule Path
Disallow /

lb-spider

Rule Path
Disallow /

wbsearchbot

Rule Path
Disallow /

psbot

Rule Path
Disallow /

huaweisymantecspider

Rule Path
Disallow /

sistrix

Rule Path
Disallow /

ec2linkfinder

Rule Path
Disallow /

htdig

Rule Path
Disallow /

semrushbot

Rule Path
Disallow /

discobot

Rule Path
Disallow /

linkdex.com

Rule Path
Disallow /

seznambot

Rule Path
Disallow /

edisterbot

Rule Path
Disallow /

swebot

Rule Path
Disallow /

picmole

Rule Path
Disallow /

yeti

Rule Path
Disallow /

yeti-mobile

Rule Path
Disallow /

pagepeeker

Rule Path
Disallow /

catchbot

Rule Path
Disallow /

yacybot

Rule Path
Disallow /

netestate ne crawler

Rule Path
Disallow /

surveybot

Rule Path
Disallow /

comodo ssl checker

Rule Path
Disallow /

comodo-certificates-spider

Rule Path
Disallow /

gonzo

Rule Path
Disallow /

schrein

Rule Path
Disallow /

backlinkcrawler

Rule Path
Disallow /

afilias web mining tool

Rule Path
Disallow /

seokicks

Rule Path
Disallow /

seokicks-robot

Rule Path
Disallow /

suggybot

Rule Path
Disallow /

bdbrandprotect

Rule Path
Disallow /

bpimagewalker

Rule Path
Disallow /

bpimagewalker*

Rule Path
Disallow /

updownerbot

Rule Path
Disallow /

lex

Rule Path
Disallow /

content crawler

Rule Path
Disallow /

dcpbot

Rule Path
Disallow /

kaloogabot

Rule Path
Disallow /

mlbot

Rule Path
Disallow /

icjobs

Rule Path
Disallow /

obot

Rule Path
Disallow /

webmastercoffee

Rule Path
Disallow /

qualidator*

Rule Path
Disallow /

webinator

Rule Path
Disallow /

scooter

Rule Path
Disallow /

larbin

Rule Path
Disallow /

opidoobot

Rule Path
Disallow /

ips-agent

Rule Path
Disallow /

unisterbot

Rule Path
Disallow /

unister*

Rule Path
Disallow /

reverseget

Rule Path
Disallow /

wget

Rule Path
Disallow /

libwww-perl

Rule Path
Disallow /

curl

Rule Path
Disallow /

java

Rule Path
Disallow /

Comments

  • Inhalte, die grundsätzlich von keinem Bot indiziert werden sollen:
  • Testweise Ausnahme für Google, damit das Webmaster-Tools gut funktioniert
  • NACH DEM TESTEN JS WIEDER AUSKOMMENTIEREN, SONST EVTL. NEGATIVE
  • AUSWIRKUNGEN AUFS RANKING (Z. B. WG. GESCHWINDIGKEIT)
  • (in /js/ steht eh nichts drin, was indiziert werden müsste)
  • Allow: /js/
  • Allow: /images/
  • unerwünschte bots, die aber die robots.txt abfragen
  • "TurnitinBot/2.1 (http://www.turnitin.com/robot/crawlerinfo.html)"
  • "findlinks/2.1.5 (+http://wortschatz.uni-leipzig.de/findlinks/)"
  • "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
  • http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
  • http://www.80legs.com/webcrawler.html - if 008 is crawling your website, it means that one or more 80legs users created a web crawl
  • "Mozilla/5.0 (compatible; AhrefsBot/2.0; +http://ahrefs.com/robot/)"
  • "lb-spider/Mozilla/5.0 Gecko/20100101 Firefox/10.0.2 (lb-spider; http://www.linkbutler.de/spider; spider@linkbutler.de)"
  • "Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)"
  • "psbot/0.1 (+http://www.picsearch.com/bot.html)"
  • "HuaweiSymantecSpider/1.0+DSE-support@huaweisymantec.com+(compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR ; http://www.huaweisymantec.com/en/IRL/spider)"
  • "Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)"
  • "EC2LinkFinder"
  • "http://SiteIntel.net Bot"
  • "htdig"
  • "SemrushBot/0.91" - http://de.semrush.com/? -Professionelle Software für SEO & SEM Professionals?
  • "Mozilla/5.0 (compatible; discobot/2.0; +http://discoveryengine.com/discobot.html)" - we sell no wine before its time != trustworthy
  • "linkdex.com/v2.0" - SEO
  • "SeznamBot/3.0 (+http://fulltext.sblog.cz/)" - sz-SEO
  • "EdisterBot (http://www.edister.com/bot.html)"
  • "Mozilla/5.0 (compatible; SWEBot/1.0; +http://swebot-crawler.net)" - versucht auf posting im forum zu replien
  • ab hier noch in htaccess eintragen
  • "Mozilla/5.0 (compatible;picmole/1.0 +http://www.picmole.com)"
  • "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)"
  • "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) (compatible; Yeti-Mobile/0.1; +http://help.naver.com/robots/)"
  • "PagePeeker.com (info: http://pagepeeker.com/robots)"
  • "CatchBot/1.0; +http://www.catchbot.com"
  • "yacybot (freeworld/global; amd64 Linux 3.2.1-gentoo-r2; java 1.6.0_24; Europe/de) http://yacy.net/bot.html"
  • "netEstate NE Crawler (+http://www.sengine.info/)"
  • "Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.0.13) Gecko/2009073022 Firefox/3.5.2 (.NET CLR 3.5.30729) SurveyBot/2.3 (DomainTools)"
  • "COMODO SSL Checker"
  • "Comodo-Certificates-Spider"
  • "gonzo2[p] (+http://www.suchen.de/faq.html)" (Geschäftesuche)
  • "crawler schrein, crawler@schrein.nl id-4"
  • "BacklinkCrawler (http://www.backlinktest.com/crawler.html)"
  • "Mozilla/5.0 (compatible; Afilias Web Mining Tool 1.0; +http://www.afilias.info; awmt@afilias.info)"
  • "Mozilla/5.0 (compatible; SEOkicks-Robot +http://www.seokicks.de/robot.html)"
  • "Mozilla/5.0 (compatible; suggybot v0.01a, http://blog.suggy.com/was-ist-suggy/suggy-webcrawler/)"
  • "http://www.bdbrandprotect.com" "Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)"
  • "Updownerbot (+http://www.updowner.com/bot)"
  • "lex/1.0"
  • "Content Crawler"
  • "Mozilla/5.0 (compatible; DCPbot/1.1; +http://domains.checkparams.com/)"
  • "Mozilla/5.0 (compatible; KaloogaBot; http://kalooga.com/crawler)"
  • "MLBot (www.metadatalabs.com/mlbot)"
  • "Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.0.1; compatible; iCjobs Stellenangebote Jobs; http://www.icjobs.de) Gecko/20100401 iCjobs/3.2.3"
  • "Mozilla/5.0 (compatible; oBot/2.3.1; +http://filterdb.iss.net/crawler/)"
  • "Mozilla/5.0 (compatible; WebmasterCoffee/0.7; +http://webmastercoffee.com/about)"
  • "Mozilla/5.0 (compatible; Qualidator.com Bot 1.0;)" (http://www.qualidator.com/Web/de/Support/robotstxt_Hinweise.htm)
  • "Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)" (http://www.thunderstone.com/site/gw25man/page_exclusion_and_robots_txt.html)
  • "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) (larbin2.6.3@unspecified.mail)"
  • "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.24; ips-agent) Gecko/20111107 Ubuntu/10.04 (lucid) Firefox/3.6.24"
  • "Mozilla/5.0 (compatible; UnisterBot; crawler@unister.de)"
  • "Mozilla/5.0 (compatible; en-US; ReverseGet/1.0; http://reverseget.com/; robot@reverseget.com)"
  • robots über Linux-Tools, die sich nicht richtig zu erkennen geben und
  • die bei übermäßigem Gebrauch über die Tool-ID gesperrt werden können
  • "Wget/1.9"
  • "libwww-perl/5.837"
  • "curl/7.21.3 (amd64-portbld-freebsd7.2) libcurl/7.21.3 OpenSSL/0.9.8e zlib/1.2.3"
  • "Java/1.6.0_29"
  • to_check:
  • "\"Mozilla/5.0"
  • "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2)"
  • "Mozilla/4.0 (compatible; MSIE 6.0; MSN 2.5; Windows 98; Win 9x 4.90; FDM)"
  • beobachten:
  • "ssearch_bot (sSearch Crawler; http://www.semantissimo.de)"
  • "Mozilla/5.0 (compatible; Plukkie/1.4; http://www.botje.com/plukkie.htm)"
  • "Mozilla/5.0 (compatible; lemurwebcrawler admin@lemurproject.org; +http://boston.lti.cs.cmu.edu/crawler_12/)"
  • unerwünschte bots, die die robots.txt NICHT abfragen, gehören ggf. per Rewrite gesperrt:
  • "Mozilla/5.0+(compatible;+PiplBot;++http://www.pipl.com/bot/)" IGNORIERT ROBOTS.TXT
  • "Mozilla/5.0 (compatible; TweetmemeBot/2.11; +http://tweetmeme.com/)" IGNORIERT ROBOTS.TXT
  • okay:
  • "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
  • "Googlebot-Image/1.0"
  • "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
  • "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"
  • "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
  • "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
  • "Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)"
  • "ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)" (hängt auch mit archive.org zusammen)
  • "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
  • "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
  • "Mozilla/5.0 (compatible; OpenindexDeepSpider/Nutch-1.5-dev; +http://www.openindex.io/en/webmasters/spider.html)"
  • "CloudACL/Nutch-1.4"
  • "webcrawler (compatible; heritrix/1.14.4 ++http://www.onb.ac.at/about/webarchivierung.htm)"
  • "Mail.RU/2.0" (russ. Suchmaschine)
  • "Sosospider+(+http://help.soso.com/webspider.htm)" (chin. Suchmaschine)
  • "Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)"
  • "Eurobot/1.1 (http://eurobot.ayell.eu)"
  • "Mozilla/5.0 (compatible; MSIE or Firefox mutant; not on Windows server; +http://ws.daum.net/aboutWebSearch.html) Daumoa/2.0" (koreanische Suchmaschine)
  • "Acoon v4.10.3 (www.acoon.de)"
  • "DoCoMo/2.0 P900i(c100;TB;W24H11) (compatible; ichiro/mobile goo; +http://search.goo.ne.jp/option/use/sub4/sub4-1/)" (jap. Suchmaschine)
  • "ichiro/3.0 (http://help.goo.ne.jp/help/article/1142)"
  • "frogl-bot (Version: 1.06, powered by www.frogl.de +http://www.frogl.de/pfadzurbotseite/bot.html)"
  • "Mozilla/5.0 (compatible; NerdByNature.Bot; http://www.nerdbynature.net/bot)"
  • "Agent-SharewarePlazaBot/3.0+(+http://www.SharewarePlaza.com)" IGNORIERT ROBOTS.TXT
  • "Wotbox/2.0 (bot@wotbox.com; http://www.wotbox.com)" IGNORIERT ROBOTS.TXT
  • "www.freefileszone.com PadPollbot/1.1b (+http://www.freefileszone.com/)" IGNORIERT ROBOTS.TXT
  • "Mozilla/5.0 (compatible; Sitedomain-Bot 1.0; Headers only; +http://www.sitedomain.de/sitedomain-bot/)" IGNORIERT ROBOTS.TXT - checkt auf gelöschte Domains - ruft nur Hauptseite auf
  • "emefgebot/beta (+http://emefge.de/bot.html)" IGNORIERT ROBOTS.TXT
  • "TinEye/1.1 (http://tineye.com/crawler.html)" #User-agent: TinEye

Warnings

  • 4 invalid lines.