thefirstpost.co.uk
robots.txt

Robots Exclusion Standard data for thefirstpost.co.uk

Resource Scan

Scan Details

Site Domain thefirstpost.co.uk
Base Domain thefirstpost.co.uk
Scan Status Ok
Last Scan2025-06-17T16:22:15+00:00
Next Scan 2025-06-24T16:22:15+00:00

Last Scan

Scanned2025-06-17T16:22:15+00:00
URL https://thefirstpost.co.uk/robots.txt
Redirect https://theweek.com/robots.txt
Redirect Domain theweek.com
Redirect Base theweek.com
Domain IPs 178.79.178.218, 2a01:7e00:e000:3f7::
Redirect IPs 199.232.194.114, 199.232.198.114
Response IP 199.232.198.114
Found Yes
Hash d52c65ee6ed8b7ed40abed79b8e4f1a44292eb56cec9af27f9e46a02012853d4
SimHash 2424c480ad99

Groups

*

Rule Path
Disallow */deals/compare
Disallow */html/
Disallow */p/*/embed/captioned
Disallow *searchTerm%3D*
Disallow *sortBy%3D*
Disallow *productBrand%3D*
Disallow *%7B*%7D*
Disallow /infinite-scroll-article/*
Disallow /infinite-scroll-review/*
Disallow /infinite-scroll-recipe/*

*

Rule Path
Disallow /search/
Disallow /359/
Disallow /content/
Disallow /blaize/datalayer
Disallow /*?*xhr=*

*

No rules defined. All paths allowed.

Other Records

Field Value
sitemap https://theweek.com/sitemap.xml
sitemap https://theweek.com/uk/sitemap.xml
sitemap https://theweek.com/sitemap-news.xml
sitemap https://theweek.com/uk/sitemap-news.xml

Comments

  • Vanilla-wide rules
  • Common path patterns (* prefix to handle localisation)
  • Common query string patterns
  • Infinite scroll paths
  • Site-specific rules
  • Search
  • Prevent crawling DFP tags
  • Prevent crawling content with default url aliases
  • Prevent crawling blaize datalayer
  • POL-47
  • Sitemaps