harvardapparatus.com
robots.txt

Robots Exclusion Standard data for harvardapparatus.com

Resource Scan

Scan Details

Site Domain harvardapparatus.com
Base Domain harvardapparatus.com
Scan Status Ok
Last Scan2024-06-20T02:50:12+00:00
Next Scan 2024-07-04T02:50:12+00:00

Last Scan

Scanned2024-06-20T02:50:12+00:00
URL https://harvardapparatus.com/robots.txt
Redirect https://www.harvardapparatus.com/robots.txt
Redirect Domain www.harvardapparatus.com
Redirect Base harvardapparatus.com
Domain IPs 18.161.6.125, 18.161.6.38, 18.161.6.54, 18.161.6.65
Redirect IPs 108.139.10.19, 108.139.10.49, 108.139.10.77, 108.139.10.8
Response IP 3.160.246.54
Found Yes
Hash a961c5f37bb731bf5a0edd0cb6e6ba4c28758882e6da7a4816683f6ab6e8d414
SimHash 6124f942c182

Groups

*

Rule Path
Disallow /*?
Disallow /index.php/
Disallow /wishlist/
Disallow /admin/
Disallow /catalogsearch/
Disallow /onestepcheckout/
Disallow /review/product/
Disallow /sendfriend/
Disallow /enable-cookies/
Disallow /LICENSE.txt
Disallow /LICENSE.html
Disallow /skin/
Disallow /js/
Disallow /directory/
Disallow /checkout/
Disallow /onestepcheckout/
Disallow /customer/
Disallow /customer/account/
Disallow /customer/account/login/
Disallow /catalogsearch/
Disallow /catalog/product_compare/
Disallow /catalog/category/view/
Disallow /catalog/product/view/
Disallow /*?dir*
Disallow /*?dir=desc
Disallow /*?dir=asc
Disallow /*?limit=all
Disallow /*?mode*
Disallow /app/
Disallow /bin/
Disallow /dev/
Disallow /lib/
Disallow /phpserver/
Disallow /pub/
Disallow /tag/
Disallow /review/

Comments

  • Robots.txt
  • Prevent crawl of user and login pages
  • Block native product URL (Only crawl by url key)
  • Block crawl of filters in pages
  • Block CMS directories
  • Block duplicate content