kayme.com
robots.txt

Robots Exclusion Standard data for kayme.com

Resource Scan

Scan Details

Site Domain kayme.com
Base Domain kayme.com
Scan Status Ok
Last Scan2024-05-31T00:02:41+00:00
Next Scan 2024-06-30T00:02:41+00:00

Last Scan

Scanned2024-05-31T00:02:41+00:00
URL https://kayme.com/robots.txt
Domain IPs 192.124.249.19
Response IP 192.124.249.19
Found Yes
Hash 65b87dfbcfc59b785594253b809c24f72aa84c518e712a43e4b5087584b1c2c5
SimHash 880a9f5767f5

Groups

googlebot-image

Rule Path
Disallow

googlebot

Rule Path
Disallow

yandexbot

Rule Path
Allow /*?p=
Disallow /*?p=*&
Disallow /*?

Other Records

Field Value
crawl-delay 20

linguee

Rule Path
Disallow /

barkrowler

Rule Path
Disallow /

*

Rule Path
Allow /*?p=
Disallow /404/
Disallow /app/
Disallow /cgi-bin/
Disallow /downloader/
Disallow /errors/
Disallow /includes/
Disallow /magento/
Disallow /media/captcha/
Disallow /media/customer/
Disallow /media/dhl/
Disallow /media/downloadable/
Disallow /media/import/
Disallow /media/pdf/
Disallow /media/sales/
Disallow /media/tmp/
Disallow /media/xmlconnect/
Disallow /pkginfo/
Disallow /report/
Disallow /scripts/
Disallow /shell/
Disallow /stats/
Disallow /var/
Disallow */index.php/
Disallow */catalog/product_compare/
Disallow */catalog/category/view/
Disallow */catalog/product/view/
Disallow */catalog/product/gallery/
Disallow */catalogsearch/
Disallow */control/
Disallow */contacts/
Disallow */customer/
Disallow */customize/
Disallow */newsletter/
Disallow */poll/
Disallow */review/
Disallow */sendfriend/
Disallow */tag/
Disallow */wishlist/
Disallow */checkout/
Disallow */onestepcheckout/
Disallow /cron.php
Disallow /cron.sh
Disallow /error_log
Disallow /install.php
Disallow /LICENSE.html
Disallow /LICENSE.txt
Disallow /LICENSE_AFL.txt
Disallow /STATUS.txt
Disallow /*?___from_store=*
Disallow /*?___store=*
Disallow /*?cat=*
Disallow /*?q=*
Disallow /*?price=*
Disallow /*?availability=*
Disallow /*?brand=*
Disallow /*?kayme_size=*
Disallow /*?ring_size=*
Disallow /*?color=*
Disallow /*?kayme_pattern=*
Disallow /*?category_filter=*
Disallow /*?size_filter=*
Disallow /*?p=*&
Disallow /*.php$
Disallow /*?SID=

Other Records

Field Value
sitemap https://kayme.com/media/google_sitemap_1.xml
sitemap https://kayme.com/media/google_sitemap_11.xml

Comments

  • (original version from 2015, edited in 2017 to add filter query parameter disallow samples + some wildcards,
  • edited in 2018 to add query params blocking to Yandex as named User-agent does not read *)
  • based on:
  • http://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/
  • http://www.byte.nl/blog/magento-robots-txt/
  • https://astrio.net/blog/optimize-robots-txt-for-magento/
  • comment and clone at https://gist.github.com/petskratt/016c9dbf159a81b9d6aa
  • Keep in mind that by standard robots.txt should NOT contain empty lines, except between UA blocks!
  • Sitemap (uncomment, change and add language/shop specific sitemaps, if running on multiple domains
  • keep in mind sitemap can only point to own domain so something like sitemapindex.php is needed)
  • Sitemap: http://example.com/sitemap.xml
  • Google Image Crawler Setup - having crawler-specific sections makes it ignore generic e.g *
  • Google Bot for Google Merchants
  • Yandex tends to be rather aggressive, may be worth keeping them at arms lenght
  • Problem is mostly related to layered nav and query params, allow only paging
  • The Linguee bot is a web scraping bot that will scan the content of any website it encounters to search for multilingual text. It does not harvest email addresses, and it won’t index content that is not multilingual.
  • Barkrowler is exensa's experimental and very fresh version of the BUbiNG crawler. Supposed to respect robots.txt but received several reports
  • Crawlers Setup
  • Allow paging (unless paging inside a listing with more params, as disallowed below)
  • Directories
  • Disallow: /media/
  • Disallow: /media/catalog/
  • Disallow: /media/wysiwyg/
  • Disallow: /skin/
  • Paths (if using shop id in URL must prefix with * or copy for each)
  • Files
  • Do not crawl sub category pages that are sorted or filtered.
  • This would be very broad, could hurt (incl. SEO).
  • Disallow: /*?*
  • These are more specific, pick what you need - and do not forget to add your custom filters!
  • Disallow: /*?dir*
  • Disallow: /*?limit*
  • Disallow: /*?mode*
  • Custom filters
  • Paths that can be safely ignored (no clean URLs)
  • English sitemap
  • Japanese sitemap

Warnings

  • 1 invalid line.