Proactive HA is a very cool new feature that was introduced in vSphere 6.5, which enables our hardware vendors to communicate their hardware specific health information directly into vSphere and specifically with vSphere DRS. This hardware health information can then be leveraged by vSphere DRS to take proactive actions to guard against potential hardware failures. Brian Graf, Product Manager for Proactive HA, DRS and overall vSphere Availability has a nice blog post here where he goes into more details on how Proactive HA works.
As Brian mentioned, a few of our select hardware vendors are already in the process of developing and certifying Proactive HA integrations for vSphere, so stay tuned for those announcements in the future by both VMware and our partners. In the meantime, there was an interesting comment from one of our field folks asking whether it would be possible to "simulate" the new Quarantine Mode operation for an ESXi host to be better understand how this feature might work?
Quarantine Mode is new mode for ESXi, which can only be triggered by Proactive HA. It functions similar to the Maintenance Mode operation, but instead of migrating all VMs off, it will allow existing VMs to continue to run but prevent additional new VMs to be placed on the host.
Proactive HA does provide a set of public vSphere APIs under the healthUpdateManager which is primarily targeted at our hardware vendors to consume. However, these APIs could also be used by our customers to get visibility into the current Proactive HA configuration as well as the health of the ESXi hosts from the Proactive HA provider standpoint. Going back to our initial question, it is possible to "register" a fake Proactive HA provider and manually generate health updates to simulate what a real Proactive HA solution could look like.
Disclaimer: This is for educational and lab purposes only. Creating a fake or simulated Proactive HA provider is not officially supported by VMware, please use at your own risk. The creation of Proactive HA providers as well as publishing health updates is for our hardware vendors to consume which in turn will provide native integrations that include customer visible interfaces within the vSphere Web Client.
Now that we have the boring disclaimer out of the way, I have created a Proactive HA PowerCLI module called ProactiveHA.psm1 which exercises some of the new vSphere APIs that are available. Below are a list of functions that I have created and the ones highlighted in orange are those required to create a simulated Proactive HA event which are not intended for general customer consumption outside of education and lab purposes. The remainder functions can be used to inspect an existing Proactive HA configuration and are fully supported for direct customer use if required.
- Get-PHAConfig
- Get-PHAHealth
- Get-PHAProvider
- New-PHAProvider
- New-PHASimulation
- Set-PHAConfig
Step 1 - We need to first create a "fake/simulated" Proactive HA provider by running the following command:
New-PHAProvider -ProviderName "virtuallyGhetto" -ComponentType Power -ComponentDescription "Simulated ProactiveHA Provider" -ComponentId "Power"
Note: The supported component types are: Fan, Memory, Network, Power or Storage which are case sensitive values.
Step 2 - We can now list all Proactive HA providers including the one we had just created by running the following command:
Get-PHAProvider
Make a note of the ProviderID which we will need when enabling Proactive HA on a vSphere Cluster as well as performing health updates.
Step 3 - Next, we need to enable Proactive HA on a vSphere Cluster. This operation does a few things, it first enables all ESXi hosts to be monitored by the given Proactive HA provider and then configures the types of operation to take (Maintenance Mode and/or Quarantine Mode) for when a moderate or severe condition is observed from a given health status update.
In the above example, I have enabled our simulated Proactive HA provider and selected Quarantine Mode for both moderate as well as severe conditions which translates back to "yellow" and "red" health state. More details on this shortly.
Step 4 - To check whether Proactive HA is configured on a vSphere Cluster or just view its current configuration, you can do so by running the following command and specifying the vSphere Cluster:
Get-PHAConfig -Cluster VSAN-Cluster
At this point, we have now successfully enabled Proactive HA on our vSphere Cluster. We can login to the vSphere Web Client and we should see that Proactive HA is enabled and using "Automated" mode per our example above.
One thing you might notice is that all ESXi hosts will have a red icon stating health status is unknown as shown in the screenshot above. This is expected as the Proactive HA provider must initialize the initial health status of all ESXi hosts during the initial configuration.
We can also get this exact same view which includes some additional information by running the following command and specifying the vSphere Cluster:
Get-PHAHealth -Cluster VSAN-Cluster
For the default uninitialized health status, the "Status" value will be gray. Once health updates are returned, you should see either Green, Yellow or Red. In addition to the status, you can see which component is providing the health information as well as a nice remediation message where hardware vendors can add specific details on what needs to be done such as replacing a power supply for example.
OK, so lets now finally simulate a Proactive HA health update to one of our ESXi hosts. To do so, you will use the New-PHASimulation function which will require the name of an ESXi host, the component that was registered for the given Proactive HA Provider, the health status (green, yellow or red), a remediation message and the Proactive HA Provider ID that was obtained in Step #2.
In the example below, we are just going to initialize the health state of the ESXi hosts to green (which also requires an empty remediation string):
New-PHASimulation -EsxiHost vesxi65-4.primp-industries.com -Component Power -HealthStatus green -Remediation "" -ProviderId "52 85 22 c2 f2 6a e7 b9-fc ff 63 9e 10 81 00 79"
Once the health update has been received by vCenter Server which could take a second or two, you can then refresh the vSphere Web Client and our ESXi host should no longer have the initial red icon. You can also use the Get-PHAHealth function to view the exact same information.
Here is a quick PowerCLI snippet that you can use to quickly update all ESXi hosts to a green status:
foreach ($EsxiHost in (Get-Cluster -Name VSAN-Cluster | Get-VMHost)) {
New-PHASimulation -EsxiHost $EsxiHost.name -Component Power -HealthStatus green -Remediation "" -ProviderId "52 85 22 c2 f2 6a e7 b9-fc ff 63 9e 10 81 00 79"
}
Lets now simulate a more interesting health update such as a red status which should then trigger a Quarantine Mode operation. To do so, run the following command below and substituting the values based on your environment:
New-PHASimulation -EsxiHost vesxi65-4.primp-industries.com -Component Power -HealthStatus red -Remediation "Please replace my virtual PSU" -ProviderId "52 85 22 c2 f2 6a e7 b9-fc ff 63 9e 10 81 00 79"
If we now re-run our Get-PHAHealth function, we should see the health update that we had just published including the remediation message. If we now take a look at the vSphere Web Client, we should see the ESXi host go into Quarantine Mode shortly which is based on our Proactive HA configuration. Pretty slick, huh?
Once you are done with your Proactive HA simulation, you can reset all ESXi hosts to health green status and then disable it on the vSphere Cluster. You will see the Set-PHAConfig and specify the -Disabled option along with the Proactive HA Provider ID that was configured originally. (If you forgot the Proactive HA Provider, be sure to use Get-PHAConfig and Get-PHAProvider to retrieve it)
Run the following command to disable Proactive HA from your vSphere Cluster:
Set-PHAConfig -Cluster VSAN-Cluster -Disabled -ProviderID "52 85 22 c2 f2 6a e7 b9-fc ff 63 9e 10 81 00 79"
Lastly, you can remove the fake/simulated Proactive HA Provider by using the Remove-PHAProvider by running the following:
Remove-PHAProvider -ProviderID "52 85 22 c2 f2 6a e7 b9-fc ff 63 9e 10 81 00 79
Creating Proactive HA vSphere Alarm
Lastly, I suspected the individual who wanted to simulate a Proactive HA event was probably also interested in figuring out how to setup alerts for when a host goes into Quarantine Mode. I was actually surprised to see there were no default Proactive HA alarms created within vCenter Server. Luckily, you can easily create them but it was not as straight forward as I had expected since I needed to go digging into the events.vmsg files to figure out the specific event keys.
Below are the three Proactive HA Event keys that corresponds to the red, yellow and green status which an ESXi host can transition to from a Proactive HA provider:
- com.vmware.vc.infraUpdateHa.RedHealthEvent
- com.vmware.vc.infraUpdateHa.YellowHealthEvent
- com.vmware.vc.infraUpdateHa.GreenHealthEvent
To create a Proactive HA alarm for "red" status, you start off by creating a new vCenter Alarm and specifying the monitor type as Hosts:
Next, you just need to add the "com.vmware.vc.infraUpdateHa.RedHealthEvent" keyword by simply copying and pasting it into the trigger box and then click finished.
If we go back and simulate another Proactive HA update using a red health status, we should now see the new vCenter Alarm trigger as shown in the screenshot below.
Have any of the big vendors released their provider information as yet?
I think 3 vendors released Proactive HA plugin so far - Dell, HPE and Cisco, but more are coming.
HPE, Cisco, and Dell are currently working on VMware Certification for their plugins
WoW Got a lots of Things to explore and Thanks for sharing.Vmware Jobs in Hyderabad
I'm looking the esxcli command for checking the InQuarantineMode status but no luck so far. do you have any idea?
Hi William,
I ran into some issues where the cluster filter can get multiple clusters. -Filter uses a partial match and can't be used for an exact match returning only one item/cluster.
My solution was to change ' -Filter @{"Name" = $Cluster}'
with ' | Where-Object { $Cluster.name -eq $_.Name }'
Many thanks for all your great work and blogs!
Regards,
Frank.