One of the exciting new features in the latest VMware Cloud Foundation (VCF) 5.1 release is the support for the vSphere 8.0 Update 2 and the new vSAN Express Storage Architecture (ESA), which can be enabled for both the VCF Management and Workload Domain.
As many of you already know, one of the easiest way to explore and play with new VCF releases is by leveraging Nested ESXi, which dramatically reduces the amount of time for setting up the infrastructure before you can start deploying VCF. This is how I initially played with VCF 5.0 and I had assumed the same would also work for the latest VCF 5.1 release.
Shortly after kicking off the VCF Bringup process, I noticed it failed immediately with an error about validating the virtual disks on my Nested ESXi VM against the vSAN HCL!? 😧
I thought this was really strange, especially in a non-VCF deployment, enabling vSAN ESA using vCenter Server only gives you a warning about your hardware not being on the vSAN HCL but does not stop you from continuing with the deployment. For testing and homelab purposes, this is completely acceptable and the fact that vCenter Server allows this operation but VCF blocks it, was an interesting UX decision.
If hardware validation against the vSAN HCL is required for VCF 5.1 when enabling vSAN ESA, then this would severely impact who can play with the latest VCF release, at least if you wanted to try out vSAN ESA.
UPDATE (05/28/24) - If you are using Nested ESXi and wish to enable vSAN ESA for a VCF Workload Domain, please take a look at this blog post HERE for more details.
While going through the VCF 5.1 documentation, I noticed that you could supply your own vSAN HCL JSON file, primary for air-gapped environments since the vSAN HCL is still required for VCF Bringup validation and this gave me an idea? 🤔
After several days of banging my head with my buddy Paudie O'Riordan and many many snapshot reverts, we finally figured out how the vSAN HCL validation works with VCF when using vSAN ESA. With that information, we have found a workaround that can be used to enable the use of Nested ESXi with VCF 5.1 and vSAN ESA through a custom user provided vSAN HCL JSON file.
Note: If you are using Nested ESXi with VCF 5.1 and vSAN OSA, then there are no changes required as vSAN HCL validation is not used with vSAN OSA.
Step 1 - You will need to ensure your that Nested ESXi VM only has an NVMe storage controller configured. If you are using my Nested ESXi Virtual Appliance, you will need to add NVMe controller and then update each disk to use the NVMe controller and then remove the SCSI controller before continuing. You may also need to remove the CD-ROM device if you have that configured in your Nested ESXi VM.
If you run into any issues, make sure you do NOT have any other storage adapters except ones using the nvme_pcie driver when running the following command: esxcli storage core adapter list
Step 2 - Download my custom nested-esxi-vsan-esa-hcl.json and transfer that to the Cloud Builder appliance under /opt/vmware/bringup/tmp/nested-esxi-vsan-esa-hcl.json
Note: For the custom vSAN HCL file to be used the jsonUpdatedTime value MUST match the latest value found in the online version of the vSAN HCL (all.json) from VMware. As of this blog post, the value should be "November 14, 2023, 11:44 PM PST" and may need to be updated if a newer vSAN HCL has been published since Cloud Builder will perform a query, if it has internet connectivity. If not, then Cloud Builder will look at the timestamp epoch value and will only use the custom vSAN HCL file if it is <=90 days compared to the current time on the Cloud Builder appliance, so if that date is greater than 90 days, simply update that with a newer value which you can run this on any Linux system: date +%s
Step 3 - Switch to root user on the Cloud Builder appliance by running the following command:
su -
Step 4 - Update the file ownership and permission of the custom vSAN HCL file so that the VCF Bringup service can access the file by running the following commands:
chmod 644 /opt/vmware/bringup/tmp/nested-esxi-vsan-esa-hcl.json
chown vcf_bringup:vcf /opt/vmware/bringup/tmp/nested-esxi-vsan-esa-hcl.json
Step 5 - Update your VCF Deployment Workbook with the full path to the custom vSAN HCL file. In my setup, I am using the VCF Deployment JSON file and you will need to add the hclFile parameter with the full path to the custom vSAN HCL file as shown in the snippet below:
"vsanSpec": { "vsanName": "vsan-1", "licenseFile": "XXXX", "vsanDedup": false, "datastoreName": "sfo-m01-cl01-ds-vsan01", "esaConfig": { "enabled": "True" }, "hclFile": "/opt/vmware/bringup/tmp/nested-esxi-vsan-esa-hcl.json" }
Step 6 - Finally, begin the VCF Bringup process, which I have initiated using the Cloud Builder API with my VCF Deployment JSON file.
If everything was configured correctly, the VCF Bringup process should now be able to successfully validate the Nested ESXi virtual hardware using the custom vSAN HCL file which you can see using both in the Cloud Builder UI or by watching the Cloud Builder log file (/var/log/vmware/vcf/bringup/vcf-bringup-debug.log):
2023-11-17T02:05:48.260+0000 [bringup,733b207aa563f28a,b140] INFO [c.v.e.s.c.c.v.vsphere.VsphereClient,pool-3-thread-4] Successfully logged in to https://vcf-m01-esx02.primp-industries.local:443/sdk
2023-11-17T02:05:49.954+0000 [bringup,733b207aa563f28a,131f] INFO [c.v.e.s.b.v.a.i.DownloadVsanHclJson,pool-3-thread-2] Retrieving the VSAN HCL file validity from isHCLFileValid true
2023-11-17T02:05:49.954+0000 [bringup,733b207aa563f28a,131f] INFO [c.v.e.s.b.v.a.i.DownloadVsanHclJson,pool-3-thread-2] Copying vSAN HCL file from User provided local path to cloud builder tmp path
2023-11-17T02:05:49.955+0000 [bringup,733b207aa563f28a,131f] INFO [c.v.e.s.b.v.a.i.DownloadVsanHclJson,pool-3-thread-2] Setting the proxy fields {"hclFilePath":"/opt/vmware/bringup/tmp/nested-esxi-vsan-esa-hcl.json"}
2023-11-17T02:05:56.123+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.v.vsphere.VsphereClient,pool-3-thread-15] Successfully logged in to https://vcf-m01-esx02.primp-industries.local:443/sdk
2023-11-17T02:05:56.124+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.vmware.vsan.VsanClient,pool-3-thread-15] Successfully login to https://vcf-m01-esx02.primp-industries.local:443/vsan
2023-11-17T02:05:56.164+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.v.vsan.VsanManagerBase,pool-3-thread-15] Checking if vSAN is enabled on the vSAN system of vcf-m01-esx02.primp-industries.local.
2023-11-17T02:06:18.311+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.v.vsphere.VcManagerBase,pool-3-thread-15] Task: (MOR:haTask--vim.host.VsanSystem.update-3521218305) (Name:update) Entity: (MOR:ha-host) (Name:ha-host) is complete
2023-11-17T02:06:18.312+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.v.vsan.VsanManagerBase,pool-3-thread-15] Uploading VSAN HCL. Attempt number 1
2023-11-17T02:06:21.011+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.v.vsan.VsanManagerBase,pool-3-thread-15] VSAN HCL was successfully uploaded.
2023-11-17T02:06:38.090+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.v.vsphere.VcManagerBase,pool-3-thread-15] Task: (MOR:haTask--vim.host.VsanSystem.update-3521218352) (Name:update) Entity: (MOR:ha-host) (Name:ha-host) is complete
2023-11-17T02:06:38.579+0000 [bringup,733b207aa563f28a,87cb] INFO [c.v.e.s.c.c.v.vsan.VsanManagerBase,pool-3-thread-15] Successfully verified the HCL compatibility.
This definitely was not a trivial thing to figure out, so I hopefully this helps anyone that might be interested in playing with VCF 5.1 and vSAN ESA using Nested ESXi!
The other cool thing about this custom Nested ESXi vSAN HCL file, is that you can also apply it to a non-VCF deployment. Login to the vSphere UI of your vCenter Server and then navigate to the vCenter Server inventory object and under Configure->vSAN->Update->HCL Database, you can replace the vSAN HCL with our custom vSAN HCL file.
If you attempt to enable vSAN ESA on a vSphere Cluster that has Nested ESXi VM attached, you would see the following warning and you can certainly ignore itand proceed with the vSAN ESA enablement.
After replacing the default vSAN HCL with custom vSAN HCL file, when you enable vSAN ESA, you will no longer see the warning message and you will be taken straight into the vSAN ESA enablement wizard!
Carsten Philipp says
Hi William,
thank you so much for sharing your experience. We had issues setting up a VCF with vSAN ESA as the hardware was not 100% fully vSAN RN certified. So the only solution to continue the VCF bringup was to inject the customized HCL JSON from your PS1 Script.
but now we fail extending the Workload Domain Cluster as the vSAN ESA HCL check fails again.
I think the SDDC Manager is now using the official/public vSAN HCL somehow for further tasks, even though the customized HCL JSON is located on the SDDC Manager under /nfs/vmware/vcf/nfs-mount/vsan-hcl/custom.json
KevinS says
Following these steps, My deployments to a nested lab are still failing the ESXi host vSAN HCL compatibility validation when deploying 5.1.1.
Here is the only error I can find in the debug log:
2024-07-29T19:24:53.172+0000 [bringup,66a7ec2c0745588b46473b417fd32d19,2fcc] DEBUG [c.v.e.s.c.v.util.ResponseUtil,pool-3-thread-18] Build validation response: {"errorCode":"VSAN_HCL_STATUS_NOT_VERIFIED","arguments":["nest-04.vlab.com"],"context":{"severity":"ERROR","validation.taskId":"7f000001-90ff-1721-8190-fff1acbb007c"},"message":"Failed to verify HCL status on ESXi Host nest-04.vlab.com","remediationMessage":"Ensure that the ESXi Host nest-04.vlab.com is reachable and the provided credentials are correct"}
Why would it be able to reach the host for everything except this step?