www.gov.uk
robots.txt

Robots Exclusion Standard data for www.gov.uk

Archived Snapshots

Resource Scan

Scan Details

Site Domain	www.gov.uk
Base Domain	www.gov.uk
Scan Status	Ok
Last Scan	2024-11-02T10:54:50+00:00
Next Scan	2024-11-16T10:54:50+00:00

Last Scan

Scanned	2024-11-02T10:54:50+00:00
URL	https://www.gov.uk/robots.txt
Domain IPs	151.101.0.144, 151.101.128.144, 151.101.192.144, 151.101.64.144, 2a04:4e42:200::144, 2a04:4e42:400::144, 2a04:4e42:600::144, 2a04:4e42::144
Response IP	199.232.44.144
Found	Yes
Hash	5a1295a7846646430e2cf3a8f7d40898ed0c3b1cc02e9e1b3f68366c6b4d6574
SimHash	6e1c9c5d95d3

Groups

*

Rule	Path
Disallow	/*/print$
Disallow	/info/*
Disallow	/search/all*

Rule

Path

Disallow

/*/print$

Disallow

/info/*

Disallow

/search/all*

ahrefsbot

No rules defined. All paths allowed.

Other Records

Field	Value
crawl-delay	10

Field

Value

crawl-delay

10

deepcrawl

Rule	Path
Disallow	/

Rule

Path

Disallow

/

ms search 6.0 robot

Rule	Path
Disallow	/

Rule

Path

Disallow

/

Back to top

Other Records

Field	Value
sitemap	https://www.gov.uk/sitemap.xml

Field

Value

sitemap

https://www.gov.uk/sitemap.xml

Back to top

Comments

Don't allow indexing of user needs pages
Don't allow indexing of site search
https://ahrefs.com/robot/ crawls the site frequently
https://www.deepcrawl.com/bot/ makes lots of requests. Ideally we'd slow it
down rather than blocking it but it doesn't mention whether or not it
supports crawl-delay.
Complaints of 429 'Too many requests' seem to be coming from SharePoint servers
(https://social.msdn.microsoft.com/Forums/en-US/3ea268ed-58a6-4166-ab40-d3f4fc55fef4)
The robot doesn't recognise its User-Agent string, see the MS support article:
https://support.microsoft.com/en-us/help/3019711/the-sharepoint-server-crawler-ignores-directives-in-robots-txt

Back to top

www.gov.ukrobots.txt

Resource Scan

Scan Details

Last Scan

Groups

*

ahrefsbot

Other Records

deepcrawl

ms search 6.0 robot

Other Records

Comments

www.gov.uk
robots.txt