cn.wsj.com
robots.txt
Robots Exclusion Standard data for cn.wsj.com
Resource Scan
Scan Details
Site Domain | cn.wsj.com |
Base Domain | wsj.com |
Scan Status | Ok |
Last Scan | 2024-05-17T16:10:16+00:00 |
Next Scan | 2024-06-16T16:10:16+00:00 |
Last Scan
Scanned | 2024-05-17T16:10:16+00:00 |
URL | https://cn.wsj.com/robots.txt |
Domain IPs | 18.155.68.40, 18.155.68.44, 18.155.68.70, 18.155.68.71, 2600:9000:23d2:400:3:bbf5:9440:93a1, 2600:9000:23d2:5a00:3:bbf5:9440:93a1, 2600:9000:23d2:6a00:3:bbf5:9440:93a1, 2600:9000:23d2:7c00:3:bbf5:9440:93a1, 2600:9000:23d2:7e00:3:bbf5:9440:93a1, 2600:9000:23d2:a00:3:bbf5:9440:93a1, 2600:9000:23d2:a400:3:bbf5:9440:93a1, 2600:9000:23d2:c200:3:bbf5:9440:93a1 |
Response IP | 18.155.68.40 |
Found | Yes |
Hash | 67e0c50a85c84cdfd170a0f3136c377231519291dad04da5bef28b101228daa6 |
SimHash | 68a960fa21b1 |
Groups
*
*
Rule | Path |
---|---|
Disallow | /article_email/ |
Disallow | /article_print/ |
Disallow | /PA2VJBNA4R/ |
Disallow | /home/ |
Disallow | /advanced_search/ |
Disallow | /login/ |
Disallow | /acct/ |
Disallow | /msgcenter/ |
Disallow | /setup/ |
Disallow | /marketing/ |
Disallow | /public/article/ |
Disallow | /search/ |
Other Records
Field | Value |
---|---|
sitemap | https://cn.wsj.com/sitemap.xml |
sitemap | https://cn.wsj.com/sitemaps/web/wsj-cn/zh-cn/sitemap_wsj-cn_zh-cn_index.xml |
sitemap | https://cn.wsj.com/sitemaps/web/wsj-cn/zh-hant/sitemap_wsj-cn_zh-hant_index.xml |
sitemap | https://cn.wsj.com/wsj_cn_google_news.xml |
Warnings
- `acap-crawler` is not a known field.
- `acap-disallow-crawl` is not a known field.
Comments