rebrickable.com
robots.txt

Robots Exclusion Standard data for rebrickable.com

Resource Scan

Scan Details

Site Domain rebrickable.com
Base Domain rebrickable.com
Scan Status Ok
Last Scan2024-06-06T22:23:49+00:00
Next Scan 2024-06-13T22:23:49+00:00

Last Scan

Scanned2024-06-06T22:23:49+00:00
URL https://rebrickable.com/robots.txt
Domain IPs 104.26.12.40, 104.26.13.40, 172.67.69.201, 2606:4700:20::681a:c28, 2606:4700:20::681a:d28, 2606:4700:20::ac43:45c9
Response IP 104.26.12.40
Found Yes
Hash cfabe910fd4bd3b1b14c3b21ff43368a3ee6e82adebbbea1513de16fbdd2a087
SimHash c02dbb198111

Groups

googlebot

Rule Path Comment
Allow /mocs/?* google ranked #1 for star wars, but other bots abusing it
Disallow /api/v3/lego/* -
Disallow /api/v3/lego/*/* -
Disallow /api/v3/lego/*/*/* -
Disallow /api/v3/lego/*/*/*/* -
Disallow /api/v3/users/* -
Disallow /mocs/?format=* -
Disallow /users/*/*/*/parts/?format=* -
Disallow /mocs/?tag=* -
Disallow /mocs/?include_accessory=* -
Disallow /sets/alternates/?* -
Disallow /inventory/*/ -
Disallow /inventory/*/parts/ -
Disallow /inventory/*/parts/? -
Disallow /sets/*/alts/ -
Disallow /mocs/*/designermocs/ -
Disallow /mocs/*/relatedpremium/ -
Disallow /mocs/*/viewer/* -
Disallow /mocs/*/coupons/apply/ sigh
Disallow /changes/?* -
Disallow /parts/*/*/*/?* paging through part/color sets/mocs
Disallow /users/*/likedmocs/* -
Disallow /parts/bricklink/* -
Disallow /users/*/allparts/* -
Disallow /users/*/setlists/* -
Disallow /users/*/partlists/* -
Disallow /users/*/lostparts/* -
Disallow /users/*/minifigs/* -
Disallow /users/*/mocs/?* -
Disallow /users/*/mocs/photos/?* -
Disallow /users/*/lists/*/?* -
Disallow /users/*/mocs/purchases/download/*/*/*/ -
Disallow /users/*/mocs/purchases/download/*/*/*/* -
Disallow /external/* -
Disallow /external/*/* -
Disallow /login/?next= -

*

Rule Path Comment
Disallow */slow/ -
Disallow /api/v3/lego/* -
Disallow /api/v3/users/* -
Disallow /build/* -
Disallow /build/set/* -
Disallow /users/*/comments/ -
Disallow *.pdf -
Disallow /instructions/*/*/download/?* set instruction files
Disallow /users/*/likedmocs/* -
Disallow /parts/bricklink/* -
Disallow /users/*/allparts/* -
Disallow /users/*/setlists/* -
Disallow /users/*/partlists/* -
Disallow /users/*/lostparts/* -
Disallow /users/*/minifigs/* -
Disallow /users/*/mocs/?* -
Disallow /users/*/mocs/photos/?* -
Disallow /users/*/lists/*/?* -
Disallow /users/*/mocs/purchases/download/*/*/*/ -
Disallow /users/*/mocs/purchases/download/*/*/*/* -
Disallow /search/*? -
Disallow /search/?* -
Disallow /search?* -
Disallow /parts/*? -
Disallow /parts/?* -
Disallow /users/*? -
Disallow /changes/?* -
Disallow /compare/*? -
Disallow /sets/compare/?* -
Disallow /sets/*? -
Disallow /sets/?* -
Disallow /sets/alternates/?* -
Disallow /mocs/?page=* -
Disallow /mocs/?format=* -
Disallow /users/*/*/*/parts/?format=* -
Disallow /mocs/?tag=* -
Disallow /mocs/?* dont want non-google bots hitting random search param combinations
Disallow /inventory/*/ -
Disallow /inventory/*/parts/ -
Disallow /inventory/*/parts/? -
Disallow /sets/*/alts/ -
Disallow /mocs/*/designermocs/ -
Disallow /mocs/*/relatedpremium/ -
Disallow /mocs/*/viewer/* -
Disallow /mocs/*/coupons/apply/ sigh
Disallow /login/* -
Disallow /external/* -
Disallow /external/*/* -
Disallow /login/?next= -
Disallow /mocs/*/*/*/? -
Disallow /stores/search/sets/slow/ -
Disallow /stores/search/parts/slow/ -
Disallow /stores/search/parts/single/slow/ -
Disallow /oldforum/* -
Disallow /parts/*/*/*/ -

Other Records

Field Value
crawl-delay 5

yandex
mj12bot
ccbot
megaindex
megaindex.ru/2.0
megaindex.ru/
mozilla/5.0 (compatible; megaindex.ru/2.0; +http://megaindex.com/crawler)
admantx
peer39_crawler
grapeshot
proximic
hyscore
netseer
mauibot
ahrefsbot
sogou spider
blexbot
semrushbot
gumgum-bot
petalbot
weborama-fetcher
demandbasepublisheranalyzer
criteobot/0.1
qwantify
dataforseobot

Rule Path
Disallow /

Comments

  • https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt
  • bots only look at a single section, so need to duplicate rules
  • Prevent stupid bots from scanning every damn combination of form parameters
  • Ajax requests don't want appearing in search results
  • Annoying bots that ignore some of the above rules
  • Don't index oldforum
  • Don't drill down into part-color pages, too many of them (allow google?)
  • Yandex is brutal and provides very little traffic
  • Seems to ignore rules or ignore 499 responses
  • Crawlers I don't need and just wasting resources

Warnings

  • 1 invalid line.