newscientist.com
robots.txt
Robots Exclusion Standard data for newscientist.com
Resource Scan
Scan Details
Site Domain | newscientist.com |
Base Domain | newscientist.com |
Scan Status | Ok |
Last Scan | 2024-11-11T12:24:58+00:00 |
Next Scan | 2024-11-18T12:24:58+00:00 |
Last Scan
Scanned | 2024-11-11T12:24:58+00:00 |
URL | https://newscientist.com/robots.txt |
Redirect | https://www.newscientist.com/robots.txt |
Redirect Domain | www.newscientist.com |
Redirect Base | newscientist.com |
Domain IPs | 199.232.26.217 |
Redirect IPs | 151.101.130.217, 151.101.194.217, 151.101.2.217, 151.101.66.217 |
Response IP | 199.232.46.217 |
Found | Yes |
Hash | 3bcb5c0a0c6903733e5244e98aa2a056ecfd370d9a05ad521d862c49e262b517 |
SimHash | 6c6dd8018585 |
Groups
ccbot
ai2bot
ai2bot-dolma
amazonbot
anthropic-ai
applebot
applebot-extended
bytespider
ccbot
chatgpt-user
claude-web
claudebot
cohere-ai
cohere-ai
diffbot
facebookbot
facebookexternalhit
friendlycrawler
google-extended
googleother
googleother-image
googleother-video
gptbot
iaskspider/2.0
icc-crawler
imagesiftbot
img2dataset
isscyberriskcrawler
kangaroo bot
meltwater
meta-externalagent
meta-externalfetcher
oai-searchbot
omgili
omgilibot
piplbot
perplexity-ai
perplexitybot
petalbot
scrapy
seekr
sidetrade indexer bot
timpibot
velenpublicwebcrawler
webzio-extended
youbot
Rule | Path |
---|---|
Disallow | / |
*
Rule | Path |
---|---|
Disallow | /21632812681/ |
Disallow | /activate-subscription/ |
Disallow | /feed/ |
Disallow | /login/ |
Disallow | /logout/ |
Disallow | /lost-password/ |
Disallow | /my-account/ |
Disallow | /registration/ |
Disallow | /search/ |
Disallow | /upfront/ |
Disallow | /wp-admin/ |
Disallow | /api/ |
Disallow | /build/ |
Disallow | /nsj/jobsjson/ |
Disallow | /nsj/logon/ |
Disallow | /nsj/newalert/ |
Disallow | /nsj/analytics/ |
Disallow | /nsj/apply-profile/ |
Disallow | /nsj/document/ |
Disallow | /nsj/emailjob/ |
Disallow | /nsj/external-redirect-registration/ |
Disallow | /nsj/invalid-request/ |
Disallow | /nsj/jbequicksignup/ |
Disallow | /nsj/jobsrss/ |
Disallow | /nsj/previewjob/ |
Disallow | /nsj/searchjobs/ |
Disallow | /nsj/session-img/ |
Disallow | /nsj/your-jobs/ |
Disallow | /nsj/profile/ |
Disallow | /nsj/remindme/ |
Disallow | /nsj/searchjobs/ |
Other Records
Field | Value |
---|---|
crawl-delay | 10 |
Other Records
Field | Value |
---|---|
sitemap | https://www.newscientist.com/sitemap.xml |
sitemap | https://www.newscientist.com/nsj/sitemapindex.xml |