cancer.org
robots.txt

Robots Exclusion Standard data for cancer.org

Resource Scan

Scan Details

Site Domain cancer.org
Base Domain cancer.org
Scan Status Ok
Last Scan2024-04-23T22:10:15+00:00
Next Scan 2024-05-23T22:10:15+00:00

Last Scan

Scanned2024-04-23T22:10:15+00:00
URL https://cancer.org/robots.txt
Redirect https://www.cancer.org/robots.txt
Redirect Domain www.cancer.org
Redirect Base cancer.org
Domain IPs 40.71.250.191
Redirect IPs 13.107.213.59, 13.107.246.59, 2620:1ec:46::59, 2620:1ec:bdf::59
Response IP 13.107.246.59
Found Yes
Hash 08741fa6637e92c8d240836751cabde2bd10be3ea67ff72691d683ff8a3dbbf9
SimHash 600408884893

Groups

*

Rule Path
Disallow */content/cancer/en/cancer/understanding-cancer/glossary/glossary/
Disallow */content/cancer/es/cancer/understanding-cancer/glossary/glossary/
Disallow */cancer/understanding-cancer/glossary/glossary/
Disallow */content/cancer/en/
Disallow */content/cancer-staging/
Disallow */content/dam/CRC/
Disallow *?q=

Other Records

Field Value
sitemap https://www.cancer.org/sitemap.xml
sitemap https://www.cancer.org/es/sitemap.xml
sitemap https://www.cancer.org/old-legacy-redirects-xml-sitemap.xml