One of my pet projects that I have been looking into is to easily deploy the required infrastructure, using Nested ESXi of course, to be able to quickly standup a "basic" VMware Cloud Foundation (VCF) environment. There are a couple of solutions that currently exists in the community that can help take a user from having no infrastructure to setting up all the components required to standup a complete functional VCF envionmrent, similar to that of a physical VCF deployment. As such, the pre-requisites for using those tools was a bit more than what I was looking for and can also feel overwhelming for a new user. I certainly fell into that category while looking at some of the existing tools.
Ultimately, my use case was slightly different and I also did not need all the bells and whistles such as configuring Application Virtual Networks (VCN) and this also meant that I could dramatically simplify the deployment. For example, instead of deploying the ESXi hosts from scratch, I could simply take advantage of my Nested ESXi Virtual Appliance and use that as a starting point. For those familiar with my various PowerCLI automated lab deployment scripts, I have created a simliar experience for VCF that will deploy a set of Nested ESXi Appliances along with the VMware Cloud Builder appliance, which is then used to deploy VCF on top of the Nested ESXi VMs. To ensure the user experience is as painless and simple, I also use the customer supplied configurations within the script to automagically generate the VCF configuration JSON file that can then be uploaded directly to the Cloud Builder appliance to begin the VCF deployment once the initial infrastructure has been deployed by the automation script.
Note: Although AVN and the respective NSX-T configuration is not in scope for the automation script, it is definitely possible to use a solution like VyOS or pfSense and using techniques like the following to automate the additional infrastructure to enable the ability to deploy a complete VCF environment. I will leave this as as fun and interesting learning exercise for the reader.
You can find the complete details on VCF Automation script and how to use it at the following Github Repo: https://github.com/lamw/vcf-automated-lab-deployment
Here is an example of running the script which took about 13 minutes on my environment to deploy requirement components for Cloud Builder to begin VCF deployment.
After the script has completed, you should see a vApp that contains the following 5 VMs and you can now connect to the Cloud Builder UI.
From here, you will upload the auto-generated vcf-config.json configuration file from the automation script and just sit back and watch your VCF environment get built.
This process will vary based on your underlying hardware and for my setup, it took a little over 1.5 hours to complete. Once completed, you will have vCenter Server, vSAN and NSX-T fully configured along with SDDC Manager which will be accessible by using the vSphere credentials that you had defined in the automation script as shown in the screenshot below.
Hi William,
Great work, great work indeed. I will start this implementation today.
Thanks for this, and also for the great nested environments that you have built over the years.
Hi
I've flowed the deployment lab but during and during the Bring-Up step I have got the following error during the preparation of the VSAN Virtual disks ,
" Host preparation failed with the following errors: com.vmware.evo.sddc.common.hostservices.error.HostPreparationException: Failed to delete partitions on ESXi host vcf-m01-esx01. com.vmware.evo.sddc.common.hostservices.error.HostPreparationException:
Failed to delete partitions on ESXi host vcf-m01-esx02. com.vmware.evo.sddc.common.hostservices.error.HostPreparationException:
Failed to delete partitions on ESXi host vcf-m01-esx03.. com.vmware.evo.sddc.common.hostservices.error.HostPreparationException:
Failed to delete partitions on ESXi host vcf-m01-esx04.
Any idea about the cause ?
I ran into this same issue and was able to get passed it by simply rebooting the (4) esxi vms then retrying the deployment from cloud builder. moving forward, i think it's a good idea to just reboot any nested esxi vms after they've been deployed.
I am getting this error NSX-T Manager operation status is false on 172.23.XX.XXX Failed to deploy NSX-T Manager vcf-m01-nsx01a on 172.23.XX.XXX anyone have an idea?
Good Day William...great post and a HUGE fan of your site for years. I was able to get the script to run without any errors, but when tried to start the VCF deployment, I noticed I was not able to reach any of the host. All for hosts are connected to the same switch as the VCF CB, but I'm not able to ping/connect to them. I've rebooted the hosts, but I am still not able t access them and thus stuck....any ideas?
Hi William,
will the script also work for VCF 4.3?
There was a recent report internally about some changes with latest VCF 4.3 release which expects use of ECDSA keys for SSH which would require a reboot of Nested ESXi Appliance prior to enablement. I’m working with someone to verify the fix and may publish a new version of 7.0u2a OVA
Hello William,
As I was running through your script I ran into an issue with the Cloud Builder. When I try to deploy the SDDC, it errors out in the "Deploy and Configure vCenter Server" section. It fails at the "Download SSH Keys using Guest Program for vCenter" step. Not sure what's going on here. I have verified the vCenter appliance has been successfully and accessible. I tried rebooting the appliance and the esxi host with no luck.
Any idea where I'm going wrong?
Thanks
i have same case...did you solve it?
So you may have already figured this out but I found the for some reason it was looking for the vm name that was the shortname inside of what it was deployed as with FQDN. Just changing the name of the vm to shortname allowed it to proceed past that error message. I don't know 100% the implications but it got us past that issue. It was also noted to retry with modified input spec but that wasn't look like needed.
For your reference here is that info https://www.vstellar.com/2020/06/04/retry-failed-bringup-with-modified-input-spec-in-vcf/
Same here. Getting stuck at the exact same message. Any response?
Are there any updates on this topic? I know a few people (including myself) are stuck at this point.
Thanks,
MG
Is there an update available how to resolve this issue?
I have two Mac Pros with 128GB RAM each. Can I spread the load across both, as I don't have 192GB RAM on a single server?
Yes. In fact, for this particular setup, it was spread across several physical ESXi host 🙂