While catching up on post-Explore email, I received a question from a customer who has a large number of vSAN deployments spanning their ROBO environment. In one of their environment, they had some physical congestion issue that caused some problems for their vSAN stretched cluster and they were looking for a way to monitor the vSAN congestion health, which is available as part of vSAN Health.
Since this information is available as part of vSAN Health, we can certainly leverage the vSAN Health API to retrieve this information programmatically but we can also look at using the PowerCLI Test-VsanClusterHealth cmdlet to get this information in a quicker manner for administrators.
Here is a quick PowerCLI snippet that will retrieve the disk health metrics for a given vSAN Cluster from vSAN Health, which is what provides the congestion information:
$vsanClusterName = "vcf-m01-cl01" $healthTestResults = Test-VsanClusterHealth -Cluster (Get-Cluster $vsanClusterName) -TestResultFilter PhysicalDiskHealth foreach ($result in $healthTestResults.DiskHealthResult) { Write-Host "`n$($result.host)" foreach ($diskHealth in $result.DiskHealth) { Write-Host "$($diskHealth.Disk) $($diskHealth.CongestionValue)" } }
Here is an example of the output which provides congestion health for each device within an ESXi host as shown in the screenshot below:
While you can periodically monitor the congestion health, a more proactive way is to leverage vCenter Alarms and we an out of the box congestion alarm called vSAN physical disk alarm 'Congestion' allowing you to specify custom thresholds, so you can get notified when a warning threshold is thrown before an error is observed.
Thanks for the comment!