cn.wsj.com
robots.txt

Robots Exclusion Standard data for cn.wsj.com

Resource Scan

Scan Details

Site Domain cn.wsj.com
Base Domain wsj.com
Scan Status Ok
Last Scan2024-04-17T16:07:15+00:00
Next Scan 2024-05-17T16:07:15+00:00

Last Scan

Scanned2024-04-17T16:07:15+00:00
URL https://cn.wsj.com/robots.txt
Domain IPs 18.155.68.40, 18.155.68.44, 18.155.68.70, 18.155.68.71, 2600:9000:2200:2e00:3:bbf5:9440:93a1, 2600:9000:2200:4a00:3:bbf5:9440:93a1, 2600:9000:2200:7600:3:bbf5:9440:93a1, 2600:9000:2200:8400:3:bbf5:9440:93a1, 2600:9000:2200:9400:3:bbf5:9440:93a1, 2600:9000:2200:c00:3:bbf5:9440:93a1, 2600:9000:2200:e800:3:bbf5:9440:93a1, 2600:9000:2200:ec00:3:bbf5:9440:93a1
Response IP 18.155.68.70
Found Yes
Hash 67e0c50a85c84cdfd170a0f3136c377231519291dad04da5bef28b101228daa6
SimHash 68a960fa21b1

Groups

*
*

Rule Path
Disallow /article_email/
Disallow /article_print/
Disallow /PA2VJBNA4R/
Disallow /home/
Disallow /advanced_search/
Disallow /login/
Disallow /acct/
Disallow /msgcenter/
Disallow /setup/
Disallow /marketing/
Disallow /public/article/
Disallow /search/

msnptc/1.0

Rule Path
Disallow /article_email/
Disallow /article_print/
Disallow /PA2VJBNA4R/
Disallow /advanced_search/
Disallow /login/
Disallow /acct/
Disallow /msgcenter/
Disallow /setup/
Disallow /marketing/
Disallow /public/article/
Disallow /search/
Disallow /static_html_files/

ccbot

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

chatgpt-user

Rule Path
Disallow /

anthropic-ai

Rule Path
Disallow /

cohere-ai

Rule Path
Disallow /

ia_archiver

Rule Path
Disallow /

omgili

Rule Path
Disallow /

omgilibot

Rule Path
Disallow /

mj12bot

Rule Path
Disallow /

piplbot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

Other Records

Field Value
sitemap https://cn.wsj.com/sitemap.xml
sitemap https://cn.wsj.com/sitemaps/web/wsj-cn/zh-cn/sitemap_wsj-cn_zh-cn_index.xml
sitemap https://cn.wsj.com/sitemaps/web/wsj-cn/zh-hant/sitemap_wsj-cn_zh-hant_index.xml
sitemap https://cn.wsj.com/wsj_cn_google_news.xml

Comments

  • cn.wsj.com/robots.txt content here

Warnings

  • `acap-crawler` is not a known field.
  • `acap-disallow-crawl` is not a known field.