thewhitecompany.com
robots.txt

Robots Exclusion Standard data for thewhitecompany.com

Resource Scan

Scan Details

Site Domain thewhitecompany.com
Base Domain thewhitecompany.com
Scan Status Ok
Last Scan2025-03-10T06:09:07+00:00
Next Scan 2025-03-24T06:09:07+00:00

Last Scan

Scanned2025-03-10T06:09:07+00:00
URL https://www.thewhitecompany.com/robots.txt
Domain IPs 2600:1413:b000:6::17d5:2bc4, 2600:1413:b000:6::17d5:2bc8, 96.17.96.17, 96.17.96.25
Response IP 96.17.96.25
Found Yes
Hash 93538b3f7c0c3513772207624d6f783d2654fb33c3bd65e026c0535207c12f55
SimHash 627067bf8ff1

Groups

*

Rule Path
Disallow /uk/bag
Disallow /uk/checkout
Disallow /uk/my-account
Disallow /uk/account
Disallow /us/bag
Disallow /us/checkout
Disallow /us/my-account
Disallow /us/account
Disallow /quickView
Disallow /*?page=
Allow /*?page=1$
Allow /*?page=2$
Allow /*?page=3$
Allow /*?page=4$
Allow /*?page=5$
Allow /*?page=6$
Allow /*?page=7$
Allow /*?page=8$
Allow /*?page=9$
Allow /*?page=10$
Allow /*?page=11$
Allow /*?page=12$
Allow /*?page=13$
Allow /*?page=14$
Allow /*?page=15$
Allow /*?page=16$
Allow /*?page=17$
Allow /*?page=18$
Allow /*?page=19$
Disallow *?q=
Disallow */api/common
Disallow */twccmsservice
Disallow */currentCountry

Other Records

Field Value Comment
crawl-delay 5 5 seconds between page requests

cazoodlebot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

dotbot/1.0

Rule Path
Disallow /

gigabot

Rule Path
Disallow /
Disallow *.pdf$

Other Records

Field Value
sitemap https://www.thewhitecompany.com/uk/sitemap.xml
sitemap https://www.thewhitecompany.com/us/sitemap.xml

Comments

  • For all robots
  • Block access to specific groups of pages
  • Allow search crawlers to discover the sitemap
  • Remove duplication caused by URL facets.
  • Block unwanted api calls
  • Block CazoodleBot as it does not present correct accept content headers
  • Block MJ12bot as it is just noise
  • Block dotbot as it cannot parse base urls properly
  • Block Gigabot
  • Block PDF

Warnings

  • `request-rate` is not a known field.
  • `visit-time` is not a known field.