gearback.com
robots.txt

Robots Exclusion Standard data for gearback.com

Resource Scan

Scan Details

Site Domain gearback.com
Base Domain gearback.com
Scan Status Ok
Last Scan2026-02-28T21:49:14+00:00
Next Scan 2026-03-14T21:49:14+00:00

Last Scan

Scanned2026-02-28T21:49:14+00:00
URL https://gearback.com/robots.txt
Redirect https://www.gearback.com/robots.txt
Redirect Domain www.gearback.com
Redirect Base gearback.com
Domain IPs 104.21.39.232, 172.67.171.211, 2606:4700:3034::ac43:abd3, 2606:4700:3036::6815:27e8
Redirect IPs 104.21.39.232, 172.67.171.211, 2606:4700:3034::ac43:abd3, 2606:4700:3036::6815:27e8
Response IP 104.21.39.232
Found Yes
Hash 5d6d088719ebb6999576f31679ac691fd5097d705d9002b423c49cc20b609128
SimHash b692511b67d7

Groups

orthogaffe

Rule Path
Disallow /

ubicrawler

Rule Path
Disallow /

doc

Rule Path
Disallow /

zao

Rule Path
Disallow /

sitecheck.internetseer.com

Rule Path
Disallow /

zealbot

Rule Path
Disallow /

msiecrawler

Rule Path
Disallow /

sitesnagger

Rule Path
Disallow /

webstripper

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

fetch

Rule Path
Disallow /

teleport

Rule Path
Disallow /

teleportpro

Rule Path
Disallow /

webzip

Rule Path
Disallow /

linko

Rule Path
Disallow /

httrack

Rule Path
Disallow /

microsoft.url.control

Rule Path
Disallow /

xenu

Rule Path
Disallow /

larbin

Rule Path
Disallow /

libwww

Rule Path
Disallow /

zyborg

Rule Path
Disallow /

download ninja

Rule Path
Disallow /

wget

Rule Path
Disallow /

grub-client

Rule Path
Disallow /

k2spider

Rule Path
Disallow /

npbot

Rule Path
Disallow /

webreaper

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.gearback.com/sitemap.xml

Comments

  • robots.txt
  • This part is generated by Horuph
  • Please note: There could be a lot of pages on this site, and there are
  • some misbehaved spiders out there that go way too fast...
  • Crawlers that are kind enough to obey, but which we'd rather not have
  • unless they're feeding search engines.
  • Some bots are known to be trouble, particularly those designed to copy
  • entire sites. Please obey robots.txt.
  • Sorry, wget in its recursive mode is a frequent problem.
  • Please read the man page and use it properly; there is a
  • --wait option you can use to set the delay between hits,
  • for instance.
  • The 'grub' distributed client has been *very* poorly behaved.
  • Doesn't follow robots.txt anyway, but...
  • Hits many times per second, not acceptable
  • http://www.nameprotect.com/botinfo.html
  • A capture bot, downloads gazillions of pages with no public benefit
  • http://www.webreaper.net/
  • ----------------------------------------------------------
  • <!-- Please do not remove the space at the start of this line, it breaks the rendering. http://www.robotstxt.org/orig.html says spaces before comments are OK. --><syntaxhighlight lang="robots">
  • Localisable part of robots.txt
  • Please check any changes using a syntax validator such as http://tool.motoricerca.info/robots-checker.phtml
  • Enter http://<yourdomain>/robots.txt as the URL to check.
  • </syntaxhighlight>

Warnings

  • 12 invalid lines.