thetimes.com
robots.txt

Robots Exclusion Standard data for thetimes.com

Resource Scan

Scan Details

Site Domain thetimes.com
Base Domain thetimes.com
Scan Status Ok
Last Scan2024-06-30T23:44:43+00:00
Next Scan 2024-07-07T23:44:43+00:00

Last Scan

Scanned2024-06-30T23:44:43+00:00
URL https://thetimes.com/robots.txt
Redirect https://www.thetimes.com/robots.txt
Redirect Domain www.thetimes.com
Redirect Base thetimes.com
Domain IPs 34.240.28.43, 52.208.17.106, 54.76.240.177
Redirect IPs 108.157.254.125, 108.157.254.4, 108.157.254.69, 108.157.254.93
Response IP 108.157.254.125
Found Yes
Hash ccd2b234a8c7146e9763c15eb43e20b4ebaf74e53603e514e09b3fc768c96467
SimHash 3d50194b6fc6

Groups

*

Rule Path
Disallow /login.thetimes.com/user/logout
Disallow /feeds.thetimes.com/puzzles/
Disallow /feeds.thetimes.com/timescrossword/
Disallow /archive/page/*
Disallow /archive/article/*
Disallow /*?s=*
Disallow /*%26s%3D*
Disallow /*?p=*
Disallow /*?filter=*
Allow /past-six-days/$
Allow /past-six-days$
Disallow /past-six-days/*
Disallow /topic/bbc
Disallow /tto/*
Disallow /player/brightcove/
Disallow /my-articles
Disallow /my-articles/
Disallow /edition/null/
Disallow /goto
Disallow /?region=
Disallow /?_ga
Disallow /?CMP
Disallow /?ExternalDataReference
Disallow /article/category/
Disallow /article/this-article-has-been-deleted*
Disallow /article/this-article-has-been-removed*
Disallow /article/this-article-is-no-longer-available*
Disallow /search?*

newsnow

Rule Path
Disallow /

omgili

Rule Path
Disallow /

webvac

Rule Path
Disallow /

webzip

Rule Path
Disallow /

psbot

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

meltwater

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

anthropic-aibytespider

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

magpie-crawler

Rule Path
Disallow /

news-please

Rule Path
Disallow /

facebookbot

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.thetimes.com/sitemaps/sitemap.xml

Comments

  • This is the robots.txt file for thetimes.com
  • The Times does not permit the unlicensed use of our content for large language models. Contact enquiries@newslicensing.com for assistance
  • Agent Specific Disallowed Sections