When deploying a new VMware Cloud Foundation (VCF) 9.1 Fleet, users specify either a Simple or High Availability (HA) deployment model along with the desired deployment size: Small, Medium or Large. Unlike components such as NSX Manager, VCF Operations and VCF Automation, where deployment size and availability are configured independently, VCF Management Services (VCFMS) determines its availability model based on the selected deployment size.
- Simple
- Small - 1 x Control Plane Node (4 vCPU & 10GB mem) + 3 x Worker Node (12 vCPU & 24GB mem)
- High Availability
- Medium - 3 x Control Plane Node (4 vCPU & 10GB mem) + 3 x Worker Node (24 vCPU & 48GB mem)
- Large - 3 x Control Plane Node (8 vCPU & 12GB mem) + 3 x Worker Node (24 vCPU & 48GB mem)

While the Simple VCF deployment model is ideal for lab and proof-of-concept environments, it is also a good fit for smaller production environments. However, the only way to provide HA for VCF Management Services (VCFMS) using the VCF Operations Fleet LCM UI is by scaling up to either a Medium or Large deployment, which comes with additional worker nodes and increased resource consumption.
If only we had a way to enable high availability for a Small VCFMS deployment? 🤔
It turns out that while the VCF Operations Fleet LCM UI only supports scaling up VCFMS deployments, the underlying VCFMS platform is far more capable than what is currently exposed through the UI. In fact, I recently discovered that by using the VCFMS API, we can easily enable HA for an existing Small deployment!
To demonstrate the VCFMS API for reconfiguring the HA configuration, I have created a PowerShell script called configure_small_high_availability_for_vcf_management_services.ps1 which will require VCFMS credentials.
By default, the script runs in validation mode and returns the Component ID for the VCFMS deployment, which is required before proceeding further. It also outputs the current VCFMS FQDN (ensuring you select the right one), current deployment size and HA status, which will be false for a Small deployment as you can see from the screenshot when running the script in validation mode.

Once you have the VCFMS ComponentId, update the $VCFManagementServicesComponentID variable and change $ValidateOnly to $false. You can now re-run the script to perform the VCFMS reconfiguration.
Note: One thing I noticed after enabling HA for VCFMS is that the VCF Operations Fleet LCM UI does not immediately update to reflect the additional VCFMS Control Plane nodes. While the inventory eventually refreshes on its own, the only way I have found to force an immediate update is by restarting the Fleet LCM service. To simplify this step, the script includes a $RestartFleetLcm variable that can be set to $true before enabling HA.

If you open the vSphere UI, you will shortly see some cloning operations as the two additional VCFMS Control Plane nodes are being deployed, converting the original Small deployment into a HA configuration without increasing the current worker node count.

The script will automatically monitor the deployment task, as shown in the screenshot above and provide status updates until completion, which can take up to 25 minutes for the HA deployment and restarting the Fleet LCM service can take another 5-7 minutes.
If you login to VCF Operations and navigate to Build->Lifecycle->VCF Management and select VCF Services Runtime, you should now see the new VCFMS Control Plane Nodes along with the existing Worker Nodes!

Note: While not recommended, I have found that I could use the same method to scale back down to a single VCFMS Control Plane node, this was primarily used for testing and validation purposes.
For users looking to add High Availability to their VCFMS deployment without increasing the overall deployment size, this provides a practical near-term solution. The good news is that, after speaking with Product Management, support for High Availability with the VCF Simple Deployment model is already planned for a future VCF 9.1.x release.
Thanks for the comment!