tifaware.com
robots.txt

Robots Exclusion Standard data for tifaware.com

Archived Snapshots

Resource Scan

Scan Details

Site Domain	tifaware.com
Base Domain	tifaware.com
Scan Status	Ok
Last Scan	2024-07-01T14:54:32+00:00
Next Scan	2024-07-15T14:54:32+00:00

Last Scan

Scanned	2024-07-01T14:54:32+00:00
URL	https://tifaware.com/robots.txt
Domain IPs	2001:4b98:dc0:41:216:3eff:fef0:ddd2, 92.243.1.63
Response IP	92.243.1.63
Found	Yes
Hash	c2b2f4e8a80d669d3ce2886ff53c2c7b459a10fe758519b4517c105948d832a6
SimHash	731a89428976

Groups

appie
aspiegelbot
barkrowler
bingbot
boitho.com-dc
clarabot
dataforseobot
fast
gaisbot
galaxybot
gigabot
googlebot
gowikibot
linespider
linguee
mercator
mj12bot
mogimogi
mojeekbot
mozdex
msnbot
ng
nutch
obot
pompos
quepasacreep
safednsbot
seekportbot
seznambot
scooter
slurp
vias
voilabot
vuhuvbot
wellknownbot
yacybot
yandexbot
yeti
zaldamosearchbot
zao
zeno

Rule	Path
Disallow	/administrate
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/administrate

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

applebot

Rule	Path
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

ccbot

Rule	Path
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

http://www.almaden.ibm.com/cs/crawler

Rule	Path
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

idg/eu

Rule	Path
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

ia_archiver

Rule	Path
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

linkwalker

Rule	Path
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

steeler

Rule	Path
Disallow	/cgi-bin
Disallow	/hidden
Disallow	/icons
Disallow	/nogo
Disallow	/zips
Disallow	/~theall/bookmarks
Disallow	/~theall/wedding

Rule

Path

Disallow

/cgi-bin

Disallow

/hidden

Disallow

/icons

Disallow

/nogo

Disallow

/zips

Disallow

/~theall/bookmarks

Disallow

/~theall/wedding

*

Rule	Path
Disallow	/
Disallow	/nogo

Rule

Path

Disallow

/nogo

Comments

See <http://www.robotstxt.org/wc/exclusion.html#robotstxt> for
detailed info on excluding robots from a site.
See <http://www.searchengineworld.com/cgi-bin/robotcheck.cgi> for
a way to validate the contents of this file.
updated: 2023-06-03, George A. Theall
Selected search engine 'bots get pretty much free reign.
nb:
appie => Walhello, http://www.walhello.com/
AspiegelBot => Huawei search engine
Barkrowler => Babbar.Tech / Exensa, https://babbar.tech/crawler
bingbot => Bing, http://www.bing.com/
boitho.com-dc => Boitho, http://www.boitho.com/, Norwegian search engine
Clarabot => http://www.clarabot.info/bots (domain not found) and https://hu.wikipedia.org/wiki/Clarabot, Hungarian search engine
DataForSeoBot, https://dataforseo.com/dataforseo-bot, build / maintain database of backlinks.
fast => Fastsearch (used by alltheweb.com)
gaisbot => Gais, http://gais.cs.ccu.edu.tw/, Taiwanese search engine
GalaxyBot => Galaxy, http://www.galaxy.com/
Gigabot => Gigablast, https://www.gigablast.com/
Googlebot => Google
Gowikibot => Gowiki, https://www.gowiki.com/
Linespider => https://help2.line.me/linesearchbot/web/?contentId=50006055&lang=en, Japanese search engine.
Linguee Bot => Linguee, https://www.linguee.com/bot, a multilingual text search engine.
Mercator + Scooter => AltaVista
Mj12bot => Majestic-12, http://www.majestic12.co.uk/projects/dsearch/mj12bot.php, a distributed search engine.
mogimogi => http://www.goo.ne.jp/, Japanese search engine.
MojeekBot => https://www.mojeek.com/bot.html
mozDex => http://www.mozdex.com/, an open source search engine
msnbot => MSN Search.
NG => Exalead, http://www.exalead.com/, French search engine
Nutch => http://www.nutch.org/, open-source search engine
oBot => https://www.xforce-security.com/crawler/, Content Security Division of IBM Germany Research & Development
Pompos => dir.com, http://dir.com, French search engine
QuepasaCreep => quepasa.com, Latin American portal / search engine
SafeDNSBot => Safe DNS, https://www.safedns.com/en/searchbot/
SeekportBot => https://bot.seekport.com/, Seekport search
SeznamBot => https://napoveda.seznam.cz/en/seznamcz-web-search/, Czech search engine
Slurp => Inktomi (includes MSN Search and HotBot)
VIAS => http://vias.ncsa.uiuc.edu/viasarchivinginformation.html
VoilaBot => http://www.voila.com (French search engine)
vuhuvBot => http://vuhuv.com/bot.html (Turkish search engine)
WellKnownBot => https://well-known.dev/, scans for selected .well-known resources.
yacybot => https://yacy.net/bot.html (Decentralized web search)
YandexBot => https://yandex.com/support/webmaster/robot-workings/robot.html, Yandex search
Yeti => http://naver.me/spd (NAVER search engine)
ZaldamoSearchBot => https://www.zaldamo.com/search.html
Zao => Kototai, http://www.kototai.org/, Japanese search engine research project
Zeno => Internet Archive (even though it doesn't support robots.txt)
ZyBorg => WiseNut, http://www.wisenut.com/, and Looksmart
NB: starting in January 2005, looksmart's seems to have switched from
WiseNut to grub for its crawler. The later doesn't bother
requesting robots.txt and doesn't seem to understand response
codes of 403. So should WiseNut ever come back, screw 'em.
User-agent: Zyborg
Other 'bots that I'm ok with.
o Applebot (https://support.apple.com/en-us/HT204683)
nb: while it retrieves robots.txt, it has not respected rules in that,
at least when it was not explicitly listed in the file.
o CCBot, https://commoncrawl.org/big-picture/frequently-asked-questions/
o IBM Almaden Research Center.
o IDG/EU => http://spaziodati.eu/, a European company building a knowledge graph.
o The Internet Archive, http://www.archive.org/.
o LinkWalker, http://www.seventwentyfour.com/, for checking links.
o research project from Kitsuregawa Laboratory, The University of Tokyo.
All robots are excluded by default. Please direct requests to
allow access to webmaster@tifaware.com.
'bots I know about but don't want to bother with
o 3w24bot, https://3w24.com/addYourSite
Appears to be for a search engine, although the main
page on the server only talks about the crawler itself.
o Adsbot, https://seostar.co/robot/
"Seostar collects link data from the web and shares it with
thousands of digital marketers." I'll pass.
o AhrefsBot, https://ahrefs.com/robot/
Quoting from the description of their bot, "Link data
collected by Ahrefs Bot from the web is used by
thousands of digital marketers around the world to plan,
execute, and monitor their online marketing campaigns."
Count me out.
o arquivo-web-crawler, http://arquivo.pt
Similar to the Internet Archive, although focused on
the Portuguese web. Although it more or less respects
robots.txt, I don't think the sites I host fit the
bot's coverage area.
o BLEXBot, http://webmeup-crawler.com/
"BLEXBot assists internet marketers to get information
on the link structure of sites and their interlinking
on the web, to avoid any technical and possible legal
issues and improve overall online experience." Count me
out.
o BuiltWith (aka BW), https://builtwith.com/biup (bit.ly/2W6Px8S)
Tracks technology used by web sites.
o CheckMarkNetwork, http://www.checkmarknetwork.com/spider.html/
Used by CheckMark, which describes itself as [offering]
"Complete Brand Protection".
o DF Bot
I have not yet found any info about it.
o DomainStatsBot, https://domainstats.com/pages/our-bot
Used for marketing SEO services.
o DomCopBot, https://www.domcop.com/bot
Used by DomCop, an expired domain search tool.
o DotBot, http://www.opensiteexplorer.org/dotbot
I would be ok with this if it wouldn't seemingly invent
URLs on my site that don't exist; eg,
/perl/describe-openvas-plugins and /perl/update-openvas-plugins
o evc-batch
Operated by eVenture Capital Partners and reportedly
scans for ads.txt (https://en.wikipedia.org/wiki/Ads.txt).
I have no interest in supporting advertising here.
o filibot, https://filibot.com/
Used to analyze "SEO signals" of a site.
o Girafabot
Used by girafa.com to visualize search results. I'd be ok
with this if only they'd respect robots.txt.
o grub-client, http://grub.org/html/documents.php?op=robots-faq
Distributed crawler for the grub search engine. I'd be ok
with this if only they'd respect robots.txt.
o IonCrawl, https://www.ionos.de/terms-gtc/faq-crawler-en/
Although it seems to respect restrictions in robots.txt,
it's purpose ("improve and expand our [Ionos'] world-class
hosting services") isn't one that interests me.
o ips-agent
Reportedly operated by Verisign for periodic reports for
expiring domains and their associated web traffic.
o The Knowledge AI
While it seems to respect restrictions in robots.txt,
I haven't turned up any authoritative info about it,
and what info there is suggests it doesn't support
https (eg, https://www.webmasterworld.com/search_engine_spiders/4983886.htm).
o lachesis, ftp://ftp.imag.fr/pub/labo-LSR/DRAKKAR/internet-performance/lachesis/
Supposedly an Intel tool for measuring ISP latency, although
after examining it I think it's mis-identified.
o larbin, http://larbin.sourceforge.net/index-eng.html
Multi-purpose web crawler.
o MauiBot
While it seems to respect restrictions in robots.txt,
I haven't turned up any authoritative info about it,
such as what it's for.
o Mb2345Browser
Browser used by Chinese web directory 2345.com according to
<http://john.cuppi.net/blocking-aggressive-chinese-mobile-browser-bots/>.
It seems to respect robots.txt, at least from what I've observed here.
o MixnodeCache, https://www.mixnode.com/
Used to scan web and make results in a database. I'd be happy
to support this if it were available to some extent at no cost.
o Mozilla/4.0 (efp@gmx.net)
Spammer tool to scrape email addresses.
o MTRobot, https://metrics-tools.de/robot.html
Used for SEO analysis.
o netEstate NE Crawler, http://www.website-datenbank.de/
Some sites consider this crawler malicious and badly-behaved
so for now it's blocked.
o NetpeakCheckerBot, https://netpeaksoftware.com/checker
Yet another bot used by marketing.
o NPBot, http://www.nameprotect.com/botinfo.html
Used by NameProtect to scan for brand / IP violations.
o PagePeeker, https://pagepeeker.com/robots/
Used for a "website thumbnailing service", whatever that means.
o PageThing.com, https://www.specialnoise.com/about/labs/pagething/
Seems to respect robots.txt, but the specialnoise.com page
doesn't really explain its purpose.
o Pandalytics/1.0, https://domainsbot.com/pandalytics/
While it seems to respect restrictions in robots.txt,
it is operated by a company that studies the market
for domain names, which I have no interest in
supporting.
o PetalBot, https://aspiegel.com/petalbot, Huawei search engine
I would be ok with this if it wouldn't seemingly invent
URLs on my site that don't exist; eg,
/perl/describe-openvas-plugins and /perl/update-openvas-plugins
o Prlog, https://prlog.ru/
Seems to be operated by a Russian SEO company for analysing sites.
o Psbot, http://www.picsearch.com/bot.html
Used by Picsearch to index pictures. I don't really have any
pictures here that I want indexed.
o Screaming Frog SEO Spider, https://www.screamingfrog.co.uk/seo-spider/
Free / commercial software for crawling a site, primarily for SEO.
Seems to respect robots.txt, albeit with requests for the top-level
root document.
o Seekport Crawler, http://seekport.com/
Maintained by SISTRIX, which focuses on digital marketing.
o SemrushBot*, https://www.semrush.com/bot/
Used by SEMrush primarily for marketing.
o serpstatbot, https://serpstatbot.com/
Used by Serpstat for "planning and monitoring marketing campaigns."
It claims to respect robots.txt and, so far from what I've seen,
does.
o tchelebi, https://tchelebi.io/
Used by Black Kite (former NormShield) to perform Internet-wide
scanning. While it does not request robots.txt, it claims to
be non-intrusive and has so far only requested top-level pages
here.
o Teoma
Used by AskJeeves search engine. I'd be ok with it if only
it would respect exclusions in robots.txt.
o TurnitinBot, http://www.turnitin.com/robot/crawlerinfo.html
Used by Turnitin.com to prevent plagarism.
o Vagabondo, https://www.wise-guys.nl/
Requests robots.txt but does not respect exclusions in that.
o webtechbot, https://www.webtechsurvey.com/bot
"Collects web technology information detected on the websites."
o ZoominfoBot, https://www.zoominfo.com/
Used for B2B marketing.

tifaware.comrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

applebot

ccbot

http://www.almaden.ibm.com/cs/crawler

idg/eu

ia_archiver

linkwalker

steeler

*

Comments

tifaware.com
robots.txt