thesundaytimes.co.uk
robots.txt

Robots Exclusion Standard data for thesundaytimes.co.uk

Resource Scan

Scan Details

Site Domain thesundaytimes.co.uk
Base Domain thesundaytimes.co.uk
Scan Status Ok
Last Scan2024-09-19T01:03:29+00:00
Next Scan 2024-09-26T01:03:29+00:00

Last Scan

Scanned2024-09-19T01:03:29+00:00
URL https://thesundaytimes.co.uk/robots.txt
Redirect https://www.thetimes.com/robots.txt
Redirect Domain www.thetimes.com
Redirect Base thetimes.com
Domain IPs 34.240.28.43, 52.208.17.106, 54.76.240.177
Redirect IPs 13.33.88.20, 13.33.88.30, 13.33.88.5, 13.33.88.95, 2600:9000:223b:3600:a:1602:de80:93a1, 2600:9000:223b:3800:a:1602:de80:93a1, 2600:9000:223b:4800:a:1602:de80:93a1, 2600:9000:223b:8600:a:1602:de80:93a1, 2600:9000:223b:8c00:a:1602:de80:93a1, 2600:9000:223b:9400:a:1602:de80:93a1, 2600:9000:223b:9600:a:1602:de80:93a1, 2600:9000:223b:9e00:a:1602:de80:93a1
Response IP 13.33.88.95
Found Yes
Hash 43261a821d0bea71a5c3b5fb2917438858745e4fef8916e261657feb98ebd8e5
SimHash 3d50194b4fc4

Groups

*

Rule Path
Disallow /login.thetimes.com/user/logout
Disallow /feeds.thetimes.com/puzzles/
Disallow /feeds.thetimes.com/timescrossword/
Disallow /archive/page/*
Disallow /archive/article/*
Disallow /interactives/*
Disallow /*?s=*
Disallow /*%26s%3D*
Disallow /*?p=*
Disallow /*?filter=*
Allow /past-six-days/$
Allow /past-six-days$
Disallow /past-six-days/*
Disallow /topic/bbc
Disallow /tto/*
Disallow /player/brightcove/
Disallow /my-articles
Disallow /my-articles/
Disallow /edition/null/
Disallow /goto
Disallow /?region=
Disallow /?_ga
Disallow /?CMP
Disallow /?ExternalDataReference
Disallow /article/category/
Disallow /article/this-article-has-been-deleted*
Disallow /article/this-article-has-been-removed*
Disallow /article/this-article-is-no-longer-available*
Disallow /search?*

newsnow

Rule Path
Disallow /

omgili

Rule Path
Disallow /

webvac

Rule Path
Disallow /

webzip

Rule Path
Disallow /

psbot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

meltwater

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-aibytespider

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

news-please

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

perplexitybot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.thetimes.com/sitemaps/sitemap.xml

Comments

  • This is the robots.txt file for thetimes.com
  • The Times does not permit the unlicensed use of our content for large language models. Contact enquiries@newslicensing.com for assistance
  • Agent Specific Disallowed Sections