harryhall.com
robots.txt

Robots Exclusion Standard data for harryhall.com

Resource Scan

Scan Details

Site Domain harryhall.com
Base Domain harryhall.com
Scan Status Ok
Last Scan2025-11-26T20:45:35+00:00
Next Scan 2025-12-26T20:45:35+00:00

Last Scan

Scanned2025-11-26T20:45:35+00:00
URL https://harryhall.com/robots.txt
Domain IPs 104.26.14.188, 104.26.15.188, 172.67.74.27, 2606:4700:20::681a:ebc, 2606:4700:20::681a:fbc, 2606:4700:20::ac43:4a1b
Response IP 172.67.74.27
Found Yes
Hash d8eaa3120643903738a8ab812882d3b09603de2a25ea6dcd0429dcaa0a404532
SimHash ad20eb3346e3

Groups

*

Rule Path
Disallow /*?cat*
Disallow /*?product*
Disallow /*?price*
Disallow /*?brand*
Disallow /*?colour*
Disallow /*?contains*
Disallow /*?size*
Disallow /*?fit*
Disallow /*%26cat*
Disallow /*%26product*
Disallow /*%26price*
Disallow /*%26brand*
Disallow /*%26colour*
Disallow /*%26contains*
Disallow /*%26size*
Disallow /*%26fit*
Disallow /CVS
Disallow /*.svn$
Disallow /*.idea$
Disallow /*.sql$
Disallow /*.tgz$
Disallow /*.disabled$
Disallow /admin/
Disallow /prod_export/
Disallow /blog/press/
Disallow /stores/store/redirect/
Disallow /app/
Disallow /downloader/
Disallow /errors/
Disallow /includes/
Disallow /lib/
Disallow /pkginfo/
Disallow /shell/
Disallow /var/
Disallow /static/
Disallow /export/
Disallow /dpdab/
Disallow /api.php
Disallow /cron.php
Disallow /cron.sh
Disallow /error_log
Disallow /get.php
Disallow /install.php
Disallow /LICENSE.html
Disallow /LICENSE.txt
Disallow /LICENSE_AFL.txt
Disallow /README.txt
Disallow /RELEASE_NOTES.txt
Disallow /*?dir*
Disallow /*?dir=desc
Disallow /*?dir=asc
Disallow /*?limit=all
Disallow /*?mode*
Disallow /bcp/samples/
Disallow /*?SID=
Disallow /checkout/
Disallow /onestepcheckout/
Disallow /customer/
Disallow /customer/account/
Disallow /customer/account/login/
Disallow /wishlist/
Disallow /catalogsearch/
Disallow /catalog/product_compare/
Disallow /catalog/category/view/
Disallow /catalog/product/view/
Disallow /welcome/
Disallow /cgi-bin/
Disallow /cleanup.php
Disallow /apc.php
Disallow /memcache.php
Disallow /phpinfo.php
Disallow /wp
Disallow /wp/

ahrefsbot

Rule Path
Disallow /

ahrefssiteaudit

Rule Path
Disallow /

petalbot

Rule Path
Disallow /

scrapy/2.11.2 (+https://scrapy.org)

Rule Path
Disallow /

Comments

  • robots.txt for Harry Hall
  • GENERAL SETTINGS
  • Enable robots.txt rules for all crawlers
  • Magento sitemap: uncomment and replace the URL to your Magento sitemap file
  • Sitemap: http://testdev.harryhall.co.uk/sitemap.xml
  • DEVELOPMENT RELATED SETTINGS
  • Do not crawl development files and folders: CVS, svn directories and dump files
  • GENERAL MAGENTO SETTINGS
  • Do not crawl Magento admin page
  • Do not crawl Product Download page
  • Do not crawl common Magento technical folders
  • Do not crawl common Magento files
  • MAGENTO SEO IMPROVEMENTS
  • Do not crawl sub category pages that are sorted or filtered.
  • Do not crawl fabric samples.
  • Do not crawl 2-nd home page copy (example.com/index.php/). Uncomment it only if you activated Magento SEO URLs.
  • Disallow: /index.php/
  • Do not crawl links with session IDs
  • Do not crawl PPC landing pages
  • Disallow: /welcome/
  • Do not crawl checkout, user account or wishlist pages
  • Do not crawl seach pages and not-SEO optimized catalog links
  • SERVER SETTINGS
  • Do not crawl common server technical folders and files