gw.lightinthebox.com
robots.txt

Robots Exclusion Standard data for gw.lightinthebox.com

Resource Scan

Scan Details

Site Domain gw.lightinthebox.com
Base Domain lightinthebox.com
Scan Status Ok
Last Scan2024-05-17T12:24:06+00:00
Next Scan 2024-05-31T12:24:06+00:00

Last Scan

Scanned2024-05-17T12:24:06+00:00
URL https://gw.lightinthebox.com/robots.txt
Domain IPs 23.48.107.42, 23.48.107.64
Response IP 23.44.4.145
Found Yes
Hash 3e6c1bef5f1050b8ad92bed559aa72051811e897eba1cdffd13fa0687cc9dd20
SimHash d37197fec631

Groups

*

Rule Path
Disallow /cache/
Disallow /api/
Disallow /plugins/
Disallow /newproducttags/
Disallow /ns/
Disallow /*/ns/
Disallow /narrow/
Disallow /n/
Disallow /*/n/
Disallow /index.php?main_page=login*
Disallow /*/index.php?main_page=login*
Disallow /index.php?main_page=shopping_cart*
Disallow /*/index.php?main_page=shopping_cart*
Disallow /index.php?main_page=shopping_cart_add*
Disallow /*/index.php?main_page=shopping_cart_add*
Allow /*%26litb_from%3Dpaid_adwords_shopping
Allow /*%26litb_from%3Dbing_shopping

pinterest/0.2 (+http://www.pinterest.com/)

Rule Path
Allow /

almaden

Rule Path
Disallow /

aspseek

Rule Path
Disallow /

axmo

Rule Path
Disallow /

baiduspider

Rule Path
Disallow /

booch

Rule Path
Disallow /

dts agent

Rule Path
Disallow /

downloader

Rule Path
Disallow /

emailcollector

Rule Path
Disallow /

emailsiphon

Rule Path
Disallow /

emailwolf

Rule Path
Disallow /

expired domain sleuth

Rule Path
Disallow /

franklin locator

Rule Path
Disallow /

gaisbot

Rule Path
Disallow /

grub

Rule Path
Disallow /

hughcrawler

Rule Path
Disallow /

iaea.org

Rule Path
Disallow /

lcabotaccept

Rule Path
Disallow /

iconsurf

Rule Path
Disallow /

iltrovatore-setaccio

Rule Path
Disallow /

indy library

Rule Path
Disallow /

iupui

Rule Path
Disallow /

kittiecentral

Rule Path
Disallow /

iaea.org

Rule Path
Disallow /

larbin

Rule Path
Disallow /

lwp-trivial

Rule Path
Disallow /

metatagrobot

Rule Path
Disallow /

missigua locator

Rule Path
Disallow /

netresearchserver

Rule Path
Disallow /

nextgensearch

Rule Path
Disallow /

npbot

Rule Path
Disallow /

nutch

Rule Path
Disallow /

objectssearch

Rule Path
Disallow /

oracle ultra search

Rule Path
Disallow /

peerbot

Rule Path
Disallow /

pictureofinternet

Rule Path
Disallow /

plantynet

Rule Path
Disallow /

quepasacreep

Rule Path
Disallow /

scspider

Rule Path
Disallow /

soft411

Rule Path
Disallow /

spider.acont.de

Rule Path
Disallow /

sqworm

Rule Path
Disallow /

ssm agent

Rule Path
Disallow /

tamu

Rule Path
Disallow /

theusefulbot

Rule Path
Disallow /

turnitinbot

Rule Path
Disallow /

tutorial crawler

Rule Path
Disallow /

tutorgig

Rule Path
Disallow /

webcopier

Rule Path
Disallow /

webzip

Rule Path
Disallow /

zipppbot

Rule Path
Disallow /

xenu

Rule Path
Disallow /

wotbox

Rule Path
Disallow /

wget

Rule Path
Disallow /

mozdex

Rule Path
Disallow /

sosospider

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.lightinthebox.com/sitemap.xml