help.ubuntu.com
robots.txt

Robots Exclusion Standard data for help.ubuntu.com

Resource Scan

Scan Details

Site Domain help.ubuntu.com
Base Domain ubuntu.com
Scan Status Ok
Last Scan2025-10-16T19:25:58+00:00
Next Scan 2025-11-15T19:25:58+00:00

Last Scan

Scanned2025-10-16T19:25:58+00:00
URL https://help.ubuntu.com/robots.txt
Domain IPs 185.125.190.17, 185.125.190.18, 2001:67c:1562::23, 2001:67c:1562::24, 2620:2d:4000:1::2a, 2620:2d:4000:1::2b, 91.189.91.48, 91.189.91.49
Response IP 185.125.190.17
Found Yes
Hash ced53d4bce85b697762b69e2d4e1513c584168412fff1216c17026a4ac994a58
SimHash 3e3a8910fdf0

Groups

*

Rule Path
Disallow /img/
Disallow /libs/
Disallow /14.04/
Disallow /16.04/
Disallow /16.10/
Disallow /17.04/
Disallow /17.10/
Disallow /18.04/
Disallow /18.10/
Disallow /19.04/
Disallow /19.10/
Disallow /20.04/
Disallow /20.10/
Disallow /21.04/
Disallow /21.10/
Disallow /22.04/
Disallow /22.10/
Disallow /23.04/
Disallow /23.10/
Disallow /24.04/
Disallow /24.10/
Disallow /25.04/
Disallow /25.10/
Disallow /26.04/
Disallow /26.10/
Disallow /lts/ubuntu-help/
Disallow /stable/installation-guide/
Disallow /stable/serverguide/
Disallow /dev/
Disallow /stable/clouddocs/
Disallow /lts/clouddocs/
Disallow /community/*?action=
Disallow /lts/serverguide/

Other Records

Field Value
crawl-delay 5

Comments

  • https://help.ubuntu.com/robots.txt
  • Notes:
  • 2022.04.20: Because we always forget, with the last edit we put in
  • several years worth of Disallow directives. Now we need
  • some more.
  • 2016.11.23: We, the docs team, do not actually have access to the
  • web server access logs, which would help in improving this
  • robots.txt file. Some things we do herein are based on experiences
  • from other web servers, where we do have access to the access logs.
  • Always keep in mind that crawlers, and google in particular, seem
  • to keep looking for deleted content for a very very long time after
  • it is gone.
  • 2016.11.23: Canonical is moving the CloudDocs elsewhere.
  • A disallow lts/clouddocs is being added in order to assist
  • web crawlers to realize that they should delete that content.
  • It should be left for at least a year, probably two.
  • For this type of server, a crawl delay of 20 is too long,
  • changing to 5, which might still be a little long.
  • Leave the following two lines at least until 2017.12.01, perferrably 2018.12.01
  • 2022.04.20 left these
  • as of 20.04 the serverguide has moved, and bots should find and index it from its new location.