4cheaters.de
robots.txt

Robots Exclusion Standard data for 4cheaters.de

Resource Scan

Scan Details

Site Domain 4cheaters.de
Base Domain 4cheaters.de
Scan Status Ok
Last Scan2025-09-24T11:35:06+00:00
Next Scan 2025-10-01T11:35:06+00:00

Last Scan

Scanned2025-09-24T11:35:06+00:00
URL https://www.4cheaters.de/robots.txt
Domain IPs 159.69.107.133
Response IP 159.69.107.133
Found Yes
Hash 3b7efc2088599aefa317d0899cc3bafe8d68e8a16ed25aa31edd486d75b03332
SimHash 82d26359cac7

Groups

*

Rule Path
Disallow /cgi-bin
Disallow /wp-admin/
Disallow /wp-includes/
Disallow /wp-
Disallow /trackback/
Disallow /linkex
Disallow /admin
Disallow /amazon
Disallow /banner
Disallow /i
Disallow /sys
Disallow /modules
Disallow /cdn
Disallow /js
Disallow /libs

googlebot

Rule Path
Disallow /*.js$
Disallow /*.inc$
Disallow /*.css$
Disallow /*.gz$
Disallow /*.wmv$
Disallow /*.cgi$
Disallow /*.xhtml$

fasterfox

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

dotbot

Rule Path
Disallow /

discobot

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

voilabot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

yanga

Rule Path
Disallow /

speedy

Rule Path
Disallow /

duggmirror

Rule Path
Disallow /

similarpages

Rule Path
Disallow /

nutch

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

offline explorer

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

psbot

Rule Path
Disallow /

python-urllib

Rule Path
Disallow /

lwp-trivial/1.34

Rule Path
Disallow /

lwp-trivial

Rule Path
Disallow /

lwp-request

Rule Path
Disallow /

cazoodle

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

sbider

Rule Path
Disallow /

asterias

Rule Path
Disallow /

tecomac-crawler

Rule Path
Disallow /

custo

Rule Path
Disallow /

ichiro

Rule Path
Disallow /

sensis web crawler

Rule Path
Disallow /

irlbot

Rule Path
Disallow /

exabot

Rule Path
Disallow /

cfnetwork

Rule Path
Disallow /

tencenttraveler

Rule Path
Disallow /

gaisbot

Rule Path
Disallow /

sosospider

Rule Path
Disallow /

nutch

Rule Path
Disallow /

caretbyte

Rule Path
Disallow /

dblbot

Rule Path
Disallow /

charlotte

Rule Path
Disallow /

thunderstone

Rule Path
Disallow /

catchbot

Rule Path
Disallow /

sogou

Rule Path
Disallow /

scoutjet

Rule Path
Disallow /

yodaobot

Rule Path
Disallow /

naverbot

Rule Path
Disallow /

yeti

Rule Path
Disallow /

feedsky

Rule Path
Disallow /

botonparade

Rule Path
Disallow /

tagoobot

Rule Path
Disallow /

aihitbot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

mlbot

Rule Path
Disallow /

linguee

Rule Path
Disallow /

biwec

Rule Path
Disallow /

blexbot

Rule Path
Disallow /

Comments

  • robots.txt
  • http://www.4cheaters.de/
  • Sitemap: /sitemap.xml
  • Prefetching
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • We are just a few Seattle based guys trying to figure out how to make internet data as open as possible.
  • http://www.dotnetdotcom.org/
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • Don't know these, but don't want them...
  • http://www.exabot.com/go/robot
  • 2009-06-03
  • http://gais.cs.ccu.edu.tw/robot.php
  • http://help.soso.com/webspider.htm
  • http://lucene.apache.org/nutch/bot.html
  • http://CaretByte.com
  • http://www.dontbuylists.com/
  • http://www.dontbuylists.com/DBLBot_Explained.pdf
  • Kalooga is an internet search engine for photo albums and image galleries
  • http://www.kalooga.com/info.html?page=crawler
  • Mozilla/5.0 (compatible; KaloogaBot; http://www.kalooga.com/info.html?page=crawler)
  • User-agent: kalooga
  • Allow: /
  • Charlotte is a spider that is indexing the web for sites to include in its search engine index.
  • http://www.searchme.com/support/spider/
  • This is one of several experimental search engines produced by Thunderstone's R&D group whose mission is to advance our overall technology leadership.
  • http://search.thunderstone.com/texis/websearch/about.html
  • http://help.live.com/help.aspx?project=wl_webmasters
  • User-agent: MSNbot
  • Allow: /
  • CatchBot investigates websites for publicly available information about companies, such as a company’s name, address, telephone number and keyword data about a company’s products and services.
  • http://www.catchbot.com
  • Startup based out of San Francisco called Topsy Labs. Building something cool.
  • http://labs.topsy.com/butterfly.html
  • User-agent: butterfly
  • Allow: /
  • http://www.sogou.com/docs/help/webmasters.htm#07
  • Allow only specific directories
  • http://www.scoutjet.com/
  • http://www.youdao.com/help/webmaster/robot/003/
  • http://help.naver.com/robots/
  • http://www.feedsky.com
  • 2009-08-03
  • http://www.bots-on-para.de/bot.html
  • Tagoobot/3.0; +http://www.tagoo.ru
  • 2009-12-02
  • aiHitBot-DS/1.0; +http://www.aihit.com/
  • 2009-12-17
  • Mozilla/5.0 (compatible; 008/0.83; http://www.80legs.com/spider.html;) Gecko/2008032620
  • 2010-01-04
  • TurnitinBot - http://www.turnitin.com/robot/crawlerinfo.html
  • TurnitinBot/2.1 (http://www.turnitin.com/robot/crawlerinfo.html)
  • MLBot (www.metadatalabs.com/mlbot)
  • 2010-02-02
  • Linguee Bot (http://www.linguee.com/bot; bot@linguee.com)
  • 2010-04-23
  • BiWeC: Big Web Corpus http://nlp.fi.muni.cz/projekty/biwec/
  • 2013-07-07
  • BLEXBot/1.0; +http://webmeup.com/crawler.html

Warnings

  • 2 invalid lines.