cerulli.com
robots.txt

Robots Exclusion Standard data for cerulli.com

Resource Scan

Scan Details

Site Domain cerulli.com
Base Domain cerulli.com
Scan Status Ok
Last Scan2026-02-27T15:16:33+00:00
Next Scan 2026-03-06T15:16:33+00:00

Last Scan

Scanned2026-02-27T15:16:33+00:00
URL https://cerulli.com/robots.txt
Redirect https://www.cerulli.com/robots.txt
Redirect Domain www.cerulli.com
Redirect Base cerulli.com
Domain IPs 104.20.16.157, 172.66.166.19, 2606:4700:10::6814:109d, 2606:4700:10::ac42:a613
Redirect IPs 104.20.16.157, 172.66.166.19, 2606:4700:10::6814:109d, 2606:4700:10::ac42:a613
Response IP 172.66.166.19
Found Yes
Hash 3a611b4046029abdd62ee955aa288fe34a406b060cb42e71faae8eb7bf5687c5
SimHash a3201b7627b3

Groups

*

Rule Path
Disallow /cpresources/
Disallow /vendor/
Disallow /.env
Disallow /cache/
Disallow /news/

Other Records

Field Value
sitemap https://www.cerulli.com/sitemaps-1-sitemap.xml

Comments

  • robots.txt for https://www.cerulli.com/
  • live - don't allow web crawlers to index cpresources/ or vendor/