haribo.com
robots.txt

Robots Exclusion Standard data for haribo.com

Resource Scan

Scan Details

Site Domain haribo.com
Base Domain haribo.com
Scan Status Ok
Last Scan2024-04-28T07:21:11+00:00
Next Scan 2024-05-28T07:21:11+00:00

Last Scan

Scanned2024-04-28T07:21:11+00:00
URL https://www.haribo.com/robots.txt
Domain IPs 23.209.46.89, 23.209.46.92, 2600:1413:b000:14::b857:c14d, 2600:1413:b000:14::b857:c152
Response IP 42.99.140.137
Found Yes
Hash 980f2f35f61d20c8c6eb4c55f0cbf21da595a3dba7fadbfafa2cc880a748c510
SimHash 61349d563591

Groups

*

Rule Path
Disallow /cpresources/
Disallow /vendor/
Disallow /.env
Disallow /cache/
Disallow /admin/
Disallow /overview/
Disallow /website-manual/
Disallow /social-media-styleguide/
Disallow /playbook-e-commerce/
Disallow /robots.txt

Other Records

Field Value
sitemap https://www.haribo.com/sitemap.xml

Comments

  • robots.txt for https://www.haribo.com/
  • live - don't allow web crawlers to index cpresources/ or vendor/