Handy VSAN VOBs for creating vCenter Alarms

There have been quite a few questions lately around vCenter Server Alarms for VSAN, one in particular that I have noticed is around individual disk failure for VSAN. Outside of the generic default datastore alarms, there seems to be only two VSAN specific alarms:

I figure there must be other useful alarms that we could create, especially after showing how you can create a vCenter Server Alarm to monitor the VSAN component count threshold based on a particular VSAN VOB. I took a look around and found the following VSAN specific VOBs which could be useful for creating additional vCenter Alarms.

VOB ID	VOB Description
esx.audit.vsan.clustering.enabled	VSAN clustering services have been enabled.
esx.clear.vob.vsan.pdl.online	VSAN device has come online.
esx.clear.vsan.clustering.enabled	VSAN clustering services have now been enabled.
esx.clear.vsan.vsan.network.available	VSAN now has at least one active network configuration.
esx.clear.vsan.vsan.vmknic.ready	A previously reported vmknic now has a valid IP.
esx.problem.vob.vsan.lsom.componentthreshold	VSAN Node: Near node component count limit.
esx.problem.vob.vsan.lsom.diskerror	VSAN device is under permanent error.
esx.problem.vob.vsan.lsom.diskgrouplimit	Failed to create a new disk group.
esx.problem.vob.vsan.lsom.disklimit	Failed to add disk to disk group.
esx.problem.vob.vsan.pdl.offline	VSAN device has gone offline.
esx.problem.vsan.clustering.disabled	VSAN clustering services have been disabled.
esx.problem.vsan.lsom.congestionthreshold	VSAN device Memory/SSD congestion has changed.
esx.problem.vsan.net.not.ready	A vmknic added to VSAN network configuration doesn't have valid IP. Network is not ready.
esx.problem.vsan.net.redundancy.lost	VSAN doesn't haven any redundancy in its network configuration.
esx.problem.vsan.net.redundancy.reduced	VSAN is operating on reduced network redundancy.
esx.problem.vsan.no.network.connectivity	VSAN doesn't have any networking configuration for use.
esx.problem.vsan.vmknic.not.ready	A vmknic added to VSAN network configuration doesn't have valid IP. It will not be in use.

Looking at the list above, the following two VOBs seems like they would be useful for alerting on a disk failure is:

esx.problem.vob.vsan.lsom.diskerror
esx.problem.vob.vsan.pdl.offline

Disclaimer: There are no guarantees that a disk error or failure will automatically trigger these VOBs due to the unknown nature of how a disk may be fail, especially if it is intermittently.

Even though we can not simulate a disk error on a physical disk, we can still do some magic using a Nested VSAN environment. The worse case scenario that you could run into is that one of the disk just goes completely offline. We can simulate a similar behavior in a Nested ESXi environment by removing one of the virtual disks from the Virtual Machine (not deleting it).

To demonstrate the following scenario, here are the steps to create a vCenter Alarm for the following two VOBs:

Step 1 - Create a new vCenter Alarm and give it a name. Select “Hosts” for Monitor and “Specific event occurring …” for Monitor:

Step 2 - Add the following two VOBs above into the Event trigger:

Step 3 - Remove one of the Virtual Disks (SSD/MD) from the Virtual Machine running the Nested ESXi VM.

Step 4 - There are two ways in which you can trigger the alarm. You can either create a new Virtual Machine which will try to write to the Nested ESXi VM in which you remove the Virtual Disk or you can rescan the storage adapter for the Nested ESXi VM. In my environment, I happen to have a VM running on an NFS datastore and I performed a Storage vMotion of the VM onto my VSAN Datastore using the default FTT=1 policy on a three node VSAN Cluster. This immediately triggered the alarm as seen in the screenshots below:

Comments

Jarek says

04/14/2015 at 7:40 am

Great stuff!

Do you know if is possible to define "New Alarm" with "VirtualMachine.Config.AddNewDisk" only when using VM-Reconfigured?
That will only alarm when new disk was added/created?

Regards,
Jarek

- William Lam says
  
  04/14/2015 at 3:55 pm
  
  Correction, for changes to a VM, you'll just get the generic "VmReconfiguredEvent" event. However, within the event there is a configSpec that would give you exactly what changed. It's not pretty and you would need to do some parsing, but it is possible. In fact, here's a nice solution from fellow Automation community member Luc Dekens on a PowerCLI script that does this http://www.lucd.info/2009/12/18/events-part-3-auditing-vm-device-changes/
  
  - Jarek says
    
    04/16/2015 at 7:12 am
    
    Hi,
    
    Thx for swift reply 🙂
    I will check the link, and will try to figure it out how to get configSpec within the alarm
    
    Regards,
    Jarek

More from my site

Comments

Thanks for the comment!Cancel reply