Quick Tip - VMware Cloud Foundation (VCF) Bringup fails without persistent ESX-OSData

You will never run into this problem if you follow current recommended practices to install the ESX-OSData volume on a persistent storage device that could either be dedicated and/or co-located your ESXi installation.

For those deploying VMware Cloud Foundation (VCF) in a lab environment, you might attempt to reserve the limited number of storage devices for use with vSAN and decide to install ESXi on a USB device, which is perfectly fine but if you do not select a persistent storage device for the ESX-OSData volume, then it will default to use the ESXi ramdisk.

I recently observed that if you have such a configuration, the VCF Cloud Builder Bringup process will fail after attempting (three times) to re-deploy the vCenter Server Appliance (VCSA).

As you can see from the screenshot above, VCF Cloud Builder UI does not provide any details and ask users to look at the vCenter Server installation logs.

If you go into one of the VCSA installation directories (e.g. /var/log/vmware/vcf/bringup/ci-installer-08-03-25-15-48-755/workflow_1741448914120/vcsa-cli-installer.log) and you look at the installer log, at the very bottom you would see the following:

2025-03-08 16:04:57,760 - vCSACliInstallLogger - ERROR - Exception message: Host seeding failed:(vmodl.MethodFault) {
   dynamicType = <unset>,
   dynamicProperty = (vmodl.DynamicProperty) [],
   msg = 'MethodFault.summary',
   faultCause = <unset>,
   faultMessage = (vmodl.LocalizableMessage) [
      (vmodl.LocalizableMessage) {
         dynamicType = <unset>,
         dynamicProperty = (vmodl.DynamicProperty) [],
         key = 'com.vmware.vcint.error_from_vlcm',
         arg = (vmodl.KeyAnyValue) [
            (vmodl.KeyAnyValue) {
               dynamicType = <unset>,
               dynamicProperty = (vmodl.DynamicProperty) [],
               key = 'vlcm_error',
               value = 'Error:\n   com.vmware.vapi.std.errors.error\nMessages:\n   com.vmware.vcIntegrity.lifecycle.ExtractDepotTask.HostExtractDepotFailed<Extraction of image from host esx06.williamlam.local failed.>\n   com.vmware.vcIntegrity.lifecycle.EsxImage.DataStorageNotFound<No OSData storage partition is available to extract image. Configure persistent storage for the host and retry.>\n'
            }
         ],
         message = "An internal error occurred: 'Error:\n   com.vmware.vapi.std.errors.error\nMessages:\n   com.vmware.vcIntegrity.lifecycle.ExtractDepotTask.HostExtractDepotFailed<Extraction of image from host esx06.williamlam.local failed.>\n   com.vmware.vcIntegrity.lifecycle.EsxImage.DataStorageNotFound<No OSData storage partition is available to extract image. Configure persistent storage for the host and retry.>\n'"
      }
   ]
}

The error is pretty clear indicating that vSphere Lifecycle Manager (vLCM) service is unable to extract the ESXi Image Profile due to fact that the ESXi host does not have a persistent ESX-OSData volume, which is used to save the image definition.

Note: You can also quickly search for "No OSData" in VCF Cloud Builder bringup debug log /var/log/vmware/vcf/bringup/vcf-bringup-debug.log to see if you are affected by this issue as well.

This problem is not unique to VCF, you will have the same issue using vCenter Server directly to create a new vSphere Cluster and you wish to enable vLCM by defining the image using an existing ESXi host. In earlier vSphere releases, the need for a persistent ESX-OSData volume was not required but it seems this behavior has changed in recent releases and this is just another reason to ensure you are following the latest recommendations to ensure you have ESX-OSData running on a reliable and persistent storage device.

Comments

David Vincent says

03/17/2025 at 12:20 pm

having a terrible time trying to bringup VCF in a lab environment and getting stuck on the following task: "Enable/Disable SSH on NSX Manager Nodes". if i check the bringup.log i can see the following line: "FAILED_TO_ENABLE_DISABLE_SSH_ON_EDGE_NODES Failed to validate ssh status on edge node(s)". i'm using the latest version of the vcf appliance we of this date. any way to bypass this step or determine why it can't validate the ssh status on the manager appliance if i can easily ssh into the nsx manager? if anyone knows the best place to ask this question, that would be of great help as i don't see calling broadcom support for a test lab environment...

- William Lam says
  
  03/17/2025 at 12:53 pm
  
  Have you look at the bring-debug.log to see if there's more details. If I had to guess, its probably connectivity between Cloud Builder Appliance and deployed NSX Manager ... majority of the issues that I've come across is infrastructure related issues (e.g. DNS/NTP) or resource constrained, since you're mentioning you're deploying in lab. I know early on you could use "small" size to reduce resources but with recent VCF 5.x releases, several have shared you need to go up to "medium", so thats what I've been using and could potentially impact whether all services are up and running
  
  You may also want to post on the VCF Broadcom Communities under VCF https://community.broadcom.com/vmware-cloud-foundation/communities/community-home/digestviewer?communitykey=7a6de81a-f179-4310-9343-3e07cd105273&Page=&ContributionType=1&Nested=1&ResultsPerPage=25&SortBy=1 in case others may have come across this issue
  
Mandarinas says

04/08/2025 at 7:58 am

There is another issue with vLCM: com.vmware.vcIntegrity.lifecycle.EsxImage.ReservedVibExtractError
Failed to extract image from the host: no stored copy available for inactive VIB VMW_bootbank_pensandoatlas_1.46.0.E.41.1.326-2vmw.803.0.0.0.23797590

This specific VIB is mentioned in 8.0U3b release notes

- Mandarinas says
  
  04/11/2025 at 1:53 am
  
  To be honest I was setting up VCF 5.2.1 Management domain with vSAN ESA and vLCM (which is requirement for vSAN ESA).
  
  Tricky part was that during bring-up when Management vCenter is deployed - vLCM does not manage to sync all VIBs/patches/Image profiles (it starts ~70% of Firstboot scripts execution on Cloud Builder) in time till Cloud Builder initializes vLCM extraction Image profile from first host (good luck with token based URLs 🙂 ). If extraction step fails - CB deletes vCenter end tries everything from vCenter deployment again.
  
  Workaround was to give more time for vLCM to sync everything/or import image offline bundle - didn't find a way to do that in Cloud Builder, so simply as soon as vLCM plugin was deployed into vCenter and it started to sync - I suspended Cloud Builder VM. This gives time to update URLs with token, import offline bundles and sync everything.
  
  - vshaneShane says
    
    04/29/2025 at 5:41 pm
    
    I tried this, but after starting the CB VM again, I am seeing this error on the logs, and vCenter Deployment is stuck
    "Cannot find message for error code VSAN_MIN_BOOT_DISKS.error"

More from my site

Comments

Thanks for the comment!Cancel reply