newscientist.com
robots.txt

Robots Exclusion Standard data for newscientist.com

Resource Scan

Scan Details

Site Domain newscientist.com
Base Domain newscientist.com
Scan Status Ok
Last Scan2024-10-28T06:29:02+00:00
Next Scan 2024-11-04T06:29:02+00:00

Last Scan

Scanned2024-10-28T06:29:02+00:00
URL https://newscientist.com/robots.txt
Redirect https://www.newscientist.com/robots.txt
Redirect Domain www.newscientist.com
Redirect Base newscientist.com
Domain IPs 199.232.26.217
Redirect IPs 151.101.130.217, 151.101.194.217, 151.101.2.217, 151.101.66.217
Response IP 199.232.46.217
Found Yes
Hash 875e5785bb5f26cdc17a70dbaa228b635dc8df5e67274d1fd9bf6361d951f4c7
SimHash ed6ddc20b481

Groups

piplbot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

claude-web

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

perplexity-ai

Rule Path
Disallow /

seekr

Rule Path
Disallow /

meltwater

Rule Path
Disallow /

diffbot

Rule Path
Disallow /

*

Rule Path
Disallow /21632812681/
Disallow /activate-subscription/
Disallow /feed/
Disallow /login/
Disallow /logout/
Disallow /lost-password/
Disallow /my-account/
Disallow /registration/
Disallow /search/
Disallow /upfront/
Disallow /wp-admin/
Disallow /api/
Disallow /build/
Disallow /nsj/jobsjson/
Disallow /nsj/logon/
Disallow /nsj/newalert/
Disallow /nsj/analytics/
Disallow /nsj/apply-profile/
Disallow /nsj/document/
Disallow /nsj/emailjob/
Disallow /nsj/external-redirect-registration/
Disallow /nsj/invalid-request/
Disallow /nsj/jbequicksignup/
Disallow /nsj/jobsrss/
Disallow /nsj/previewjob/
Disallow /nsj/searchjobs/
Disallow /nsj/session-img/
Disallow /nsj/your-jobs/
Disallow /nsj/profile/
Disallow /nsj/remindme/
Disallow /nsj/searchjobs/

Other Records

Field Value
crawl-delay 10

Other Records

Field Value
sitemap https://www.newscientist.com/sitemap.xml
sitemap https://www.newscientist.com/nsj/sitemapindex.xml