aldi.ie
robots.txt

Robots Exclusion Standard data for aldi.ie

Resource Scan

Scan Details

Site Domain aldi.ie
Base Domain aldi.ie
Scan Status Ok
Last Scan2025-06-14T06:31:50+00:00
Next Scan 2025-06-28T06:31:50+00:00

Last Scan

Scanned2025-06-14T06:31:50+00:00
URL https://aldi.ie/robots.txt
Redirect https://www.aldi.ie/robots.txt
Redirect Domain www.aldi.ie
Redirect Base aldi.ie
Domain IPs 23.45.207.71, 23.45.207.83, 2600:1413:5000:12::1737:27e6, 2600:1413:5000:12::1737:27ec
Redirect IPs 125.56.219.2, 2600:1413:5000:12::1737:27f2, 2600:1413:5000:12::1737:27f4
Response IP 23.59.168.97
Found Yes
Hash 6faae7e8560b7e44ee45713e51423dee9788e63edd7f1116c225604f7e6f766b
SimHash 2256f792ecfc

Groups

*

Rule Path
Disallow /results$
Disallow /results?
Disallow /*?page=
Disallow /*%26page%3D
Disallow /*?utm_
Disallow /*%26utm_
Disallow /*?cid=
Disallow /*%26cid%3D
Disallow /*?bid=
Disallow /*%26bid%3D

cazoodlebot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

dotbot/1.0

Rule Path
Disallow /

gigabot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.aldi.ie/sitemap.xml

Comments

  • For all robots
  • Block access to specific groups of pages
  • Block excessive pagination to prevent crawl budget waste
  • Block tracking parameters to reduce duplicate URLs
  • Allow search crawlers to discover the sitemap
  • Block CazoodleBot as it does not present correct accept content headers
  • Block MJ12bot as it is just noise
  • Block dotbot as it cannot parse base URLs properly
  • Block Gigabot

Warnings

  • `categories sitemap` is not a known field.