cn.wsj.com
robots.txt
Robots Exclusion Standard data for cn.wsj.com
Resource Scan
Scan Details
Site Domain | cn.wsj.com |
Base Domain | wsj.com |
Scan Status | Ok |
Last Scan | 2024-04-17T16:07:15+00:00 |
Next Scan | 2024-05-17T16:07:15+00:00 |
Last Scan
Scanned | 2024-04-17T16:07:15+00:00 |
URL | https://cn.wsj.com/robots.txt |
Domain IPs | 18.155.68.40, 18.155.68.44, 18.155.68.70, 18.155.68.71, 2600:9000:2200:2e00:3:bbf5:9440:93a1, 2600:9000:2200:4a00:3:bbf5:9440:93a1, 2600:9000:2200:7600:3:bbf5:9440:93a1, 2600:9000:2200:8400:3:bbf5:9440:93a1, 2600:9000:2200:9400:3:bbf5:9440:93a1, 2600:9000:2200:c00:3:bbf5:9440:93a1, 2600:9000:2200:e800:3:bbf5:9440:93a1, 2600:9000:2200:ec00:3:bbf5:9440:93a1 |
Response IP | 18.155.68.70 |
Found | Yes |
Hash | 67e0c50a85c84cdfd170a0f3136c377231519291dad04da5bef28b101228daa6 |
SimHash | 68a960fa21b1 |
Groups
*
*
Rule | Path |
---|---|
Disallow | /article_email/ |
Disallow | /article_print/ |
Disallow | /PA2VJBNA4R/ |
Disallow | /home/ |
Disallow | /advanced_search/ |
Disallow | /login/ |
Disallow | /acct/ |
Disallow | /msgcenter/ |
Disallow | /setup/ |
Disallow | /marketing/ |
Disallow | /public/article/ |
Disallow | /search/ |
Other Records
Field | Value |
---|---|
sitemap | https://cn.wsj.com/sitemap.xml |
sitemap | https://cn.wsj.com/sitemaps/web/wsj-cn/zh-cn/sitemap_wsj-cn_zh-cn_index.xml |
sitemap | https://cn.wsj.com/sitemaps/web/wsj-cn/zh-hant/sitemap_wsj-cn_zh-hant_index.xml |
sitemap | https://cn.wsj.com/wsj_cn_google_news.xml |
Warnings
- `acap-crawler` is not a known field.
- `acap-disallow-crawl` is not a known field.
Comments