cancer.org
robots.txt
Robots Exclusion Standard data for cancer.org
Resource Scan
Scan Details
Site Domain | cancer.org |
Base Domain | cancer.org |
Scan Status | Ok |
Last Scan | 2024-04-23T22:10:15+00:00 |
Next Scan | 2024-05-23T22:10:15+00:00 |
Last Scan
Scanned | 2024-04-23T22:10:15+00:00 |
URL | https://cancer.org/robots.txt |
Redirect | https://www.cancer.org/robots.txt |
Redirect Domain | www.cancer.org |
Redirect Base | cancer.org |
Domain IPs | 40.71.250.191 |
Redirect IPs | 13.107.213.59, 13.107.246.59, 2620:1ec:46::59, 2620:1ec:bdf::59 |
Response IP | 13.107.246.59 |
Found | Yes |
Hash | 08741fa6637e92c8d240836751cabde2bd10be3ea67ff72691d683ff8a3dbbf9 |
SimHash | 600408884893 |
Groups
*
Rule | Path |
---|---|
Disallow | */content/cancer/en/cancer/understanding-cancer/glossary/glossary/ |
Disallow | */content/cancer/es/cancer/understanding-cancer/glossary/glossary/ |
Disallow | */cancer/understanding-cancer/glossary/glossary/ |
Disallow | */content/cancer/en/ |
Disallow | */content/cancer-staging/ |
Disallow | */content/dam/CRC/ |
Disallow | *?q= |
Other Records
Field | Value |
---|---|
sitemap | https://www.cancer.org/sitemap.xml |
sitemap | https://www.cancer.org/es/sitemap.xml |
sitemap | https://www.cancer.org/old-legacy-redirects-xml-sitemap.xml |