myhocdaicuong.com
robots.txt

Robots Exclusion Standard data for myhocdaicuong.com

Resource Scan

Scan Details

Site Domain myhocdaicuong.com
Base Domain myhocdaicuong.com
Scan Status Ok
Last Scan2025-11-03T11:33:34+00:00
Next Scan 2025-11-10T11:33:34+00:00

Last Scan

Scanned2025-11-03T11:33:34+00:00
URL https://myhocdaicuong.com/robots.txt
Domain IPs 104.21.50.222, 172.67.167.162, 2606:4700:3033::6815:32de, 2606:4700:3033::ac43:a7a2
Response IP 172.67.167.162
Found Yes
Hash 4888b98f0eda24bbef5cbca0c2974614f74cae163ef0e03ecbf182ca6e193cd1
SimHash 257e2943e5dc

Groups

*

Rule Path
Allow /
Disallow /403.shtml
Disallow /404.shtml
Disallow /429.shtml
Disallow /50x.shtml
Disallow /search.shtml/
Disallow /ssi/
Disallow /cgi-bin/
Disallow /tmp/
Disallow /cache/
Disallow /backup/
Disallow /backups/
Disallow /db_backup/
Disallow /drafts/

amazonbot

Rule Path
Disallow /

applebot-extended

Rule Path
Disallow /

bytespider

Rule Path
Disallow /

ccbot

Rule Path
Disallow /

claudebot

Rule Path
Disallow /

google-extended

Rule Path
Disallow /

gptbot

Rule Path
Disallow /

meta-externalagent

Rule Path
Disallow /

googlebot

Rule Path
Allow /ads.txt

adsbot-google

Rule Path
Allow /ads.txt

bingbot

Rule Path
Allow /ads.txt
Disallow /*?*utm_*
Disallow /*?*session*

Other Records

Field Value
sitemap https://myhocdaicuong.com/sitemap.xml

Comments

  • Allow all bots to crawl the entire site by default
  • Disallow crawling of error pages (using .shtml extension)
  • Disallow common sensitive or utility directories
  • Rules for specific user agents
  • Block URL parameters that may cause duplicate content
  • Sitemap location