cguardian.com
robots.txt

Robots Exclusion Standard data for cguardian.com

Resource Scan

Scan Details

Site Domain cguardian.com
Base Domain cguardian.com
Scan Status Ok
Last Scan5/19/2025, 1:20:56 PM
Next Scan 6/2/2025, 1:20:56 PM

Last Scan

Scanned5/19/2025, 1:20:56 PM
URL https://www.cguardian.com/robots.txt
Domain IPs 43.174.14.43, 43.174.14.45, 43.174.14.46, 43.174.32.114, 43.174.32.193, 43.174.32.194, 43.174.32.211, 43.174.32.212, 43.174.32.88, 43.174.51.192, 43.175.138.218, 43.175.139.55, 43.175.139.72, 43.175.139.86, 43.175.141.63
Response IP 43.174.51.192
Found Yes
Hash 5893aeeba8d01d09a3182e82161f55d07faf6691063ee103168de33bb4e1cb72
SimHash 695c8ff103a3

Groups

googlebot

Rule Path
Allow /$
Allow /en/$
Allow /discover
Allow /discover/$
Allow /about/about-us/$
Allow /*.jpg$
Disallow /
Disallow /auction*
Disallow /Auction*
Disallow /*.shtml$

googlebot-news

Rule Path
Allow /discover/$
Disallow /

googlebot-mobile

Rule Path
Disallow /

*

Rule Path
Disallow /

Other Records

Field Value
sitemap https://www.cguardian.com/sitemap.xml
sitemap https://www.cguardian.com/sitemap_news.xml