websitehcm.com
robots.txt

Robots Exclusion Standard data for websitehcm.com

Resource Scan

Scan Details

Site Domain websitehcm.com
Base Domain websitehcm.com
Scan Status Ok
Last Scan2025-10-12T10:23:41+00:00
Next Scan 2025-11-11T10:23:41+00:00

Last Scan

Scanned2025-10-12T10:23:41+00:00
URL https://websitehcm.com/robots.txt
Domain IPs 103.75.186.15
Response IP 103.75.186.15
Found Yes
Hash 96021a7bda030646b2e264015f0d9bbfd4007ec0ea3563de8a4bfe7f59e0dedb
SimHash a152d9210c5b

Groups

*

Rule Path
Disallow /search/
Disallow /?s=
Disallow */1000
Disallow */1000/
Disallow *//1000
Disallow *//1000/
Disallow *?amp

*

Rule Path
Allow /

Other Records

Field Value
sitemap https://websitehcm.com/sitemap_index.xml

Comments

  • Chặn mọi URL có chứa /1000 hoặc kết thúc bằng /1000
  • Chặn mọi URL có chứa //1000 (2 dấu gạch chéo)
  • Chặn mọi URL có chứa ?amp