Hi
While troubleshooting a performance issue with some VMs on the cluster, we noticed that the cluster-wide latency, as shown in the Prism graph on the front page, was around 23ms on average. This is significantly higher than our usual 10-12ms.
Upon further investigation by support, the issue was traced to two VMs with high I/O activity, writing large blocks of data (over 1MB in size).
To troubleshoot, we added the following graphs for each VM:
- Storage controller bandwidth (MB/s)
- Storage controller latency (ms)
I have a few questions:
- Why is the term “bandwidth” used instead of “throughput”? My understanding is that bandwidth refers to the capacity (size of the pipe), while throughput is the amount of data passing through it per second.
- If the average cluster latency is high, wouldn’t this impact other workloads?
- Does the term “storage controller” refer to the CVM?
Thanks in advance
