There have been quite a few questions lately around vCenter Server Alarms for VSAN, one in particular that I have noticed is around individual disk failure for VSAN. Outside of the generic default datastore alarms, there seems to be only two VSAN specific alarms:
I figure there must be other useful alarms that we could create, especially after showing how you can create a vCenter Server Alarm to monitor the VSAN component count threshold based on a particular VSAN VOB. I took a look around and found the following VSAN specific VOBs which could be useful for creating additional vCenter Alarms.
VOB ID | VOB Description |
---|---|
esx.audit.vsan.clustering.enabled | VSAN clustering services have been enabled. |
esx.clear.vob.vsan.pdl.online | VSAN device has come online. |
esx.clear.vsan.clustering.enabled | VSAN clustering services have now been enabled. |
esx.clear.vsan.vsan.network.available | VSAN now has at least one active network configuration. |
esx.clear.vsan.vsan.vmknic.ready | A previously reported vmknic now has a valid IP. |
esx.problem.vob.vsan.lsom.componentthreshold | VSAN Node: Near node component count limit. |
esx.problem.vob.vsan.lsom.diskerror | VSAN device is under permanent error. |
esx.problem.vob.vsan.lsom.diskgrouplimit | Failed to create a new disk group. |
esx.problem.vob.vsan.lsom.disklimit | Failed to add disk to disk group. |
esx.problem.vob.vsan.pdl.offline | VSAN device has gone offline. |
esx.problem.vsan.clustering.disabled | VSAN clustering services have been disabled. |
esx.problem.vsan.lsom.congestionthreshold | VSAN device Memory/SSD congestion has changed. |
esx.problem.vsan.net.not.ready | A vmknic added to VSAN network configuration doesn't have valid IP. Network is not ready. |
esx.problem.vsan.net.redundancy.lost | VSAN doesn't haven any redundancy in its network configuration. |
esx.problem.vsan.net.redundancy.reduced | VSAN is operating on reduced network redundancy. |
esx.problem.vsan.no.network.connectivity | VSAN doesn't have any networking configuration for use. |
esx.problem.vsan.vmknic.not.ready | A vmknic added to VSAN network configuration doesn't have valid IP. It will not be in use. |
Looking at the list above, the following two VOBs seems like they would be useful for alerting on a disk failure is:
- esx.problem.vob.vsan.lsom.diskerror
- esx.problem.vob.vsan.pdl.offline
Disclaimer: There are no guarantees that a disk error or failure will automatically trigger these VOBs due to the unknown nature of how a disk may be fail, especially if it is intermittently.
Even though we can not simulate a disk error on a physical disk, we can still do some magic using a Nested VSAN environment. The worse case scenario that you could run into is that one of the disk just goes completely offline. We can simulate a similar behavior in a Nested ESXi environment by removing one of the virtual disks from the Virtual Machine (not deleting it).
To demonstrate the following scenario, here are the steps to create a vCenter Alarm for the following two VOBs:
Step 1 - Create a new vCenter Alarm and give it a name. Select “Hosts” for Monitor and “Specific event occurring …” for Monitor:
Step 2 - Add the following two VOBs above into the Event trigger:
Step 3 - Remove one of the Virtual Disks (SSD/MD) from the Virtual Machine running the Nested ESXi VM.
Step 4 - There are two ways in which you can trigger the alarm. You can either create a new Virtual Machine which will try to write to the Nested ESXi VM in which you remove the Virtual Disk or you can rescan the storage adapter for the Nested ESXi VM. In my environment, I happen to have a VM running on an NFS datastore and I performed a Storage vMotion of the VM onto my VSAN Datastore using the default FTT=1 policy on a three node VSAN Cluster. This immediately triggered the alarm as seen in the screenshots below:
Jarek says
Great stuff!
Do you know if is possible to define "New Alarm" with "VirtualMachine.Config.AddNewDisk" only when using VM-Reconfigured?
That will only alarm when new disk was added/created?
Regards,
Jarek
William Lam says
Correction, for changes to a VM, you'll just get the generic "VmReconfiguredEvent" event. However, within the event there is a configSpec that would give you exactly what changed. It's not pretty and you would need to do some parsing, but it is possible. In fact, here's a nice solution from fellow Automation community member Luc Dekens on a PowerCLI script that does this http://www.lucd.info/2009/12/18/events-part-3-auditing-vm-device-changes/
Jarek says
Hi,
Thx for swift reply 🙂
I will check the link, and will try to figure it out how to get configSpec within the alarm
Regards,
Jarek