One of my pet projects that I have been looking into is to easily deploy the required infrastructure, using Nested ESXi of course, to be able to quickly standup a "basic" VMware Cloud Foundation (VCF) environment. There are a couple of solutions that currently exists in the community that can help take a user from having no infrastructure to setting up all the components required to standup a complete functional VCF envionmrent, similar to that of a physical VCF deployment. As such, the pre-requisites for using those tools was a bit more than what I was looking for and can also feel overwhelming for a new user. I certainly fell into that category while looking at some of the existing tools.
Ultimately, my use case was slightly different and I also did not need all the bells and whistles such as configuring Application Virtual Networks (VCN) and this also meant that I could dramatically simplify the deployment. For example, instead of deploying the ESXi hosts from scratch, I could simply take advantage of my Nested ESXi Virtual Appliance and use that as a starting point. For those familiar with my various PowerCLI automated lab deployment scripts, I have created a simliar experience for VCF that will deploy a set of Nested ESXi Appliances along with the VMware Cloud Builder appliance, which is then used to deploy VCF on top of the Nested ESXi VMs. To ensure the user experience is as painless and simple, I also use the customer supplied configurations within the script to automagically generate the VCF configuration JSON file that can then be uploaded directly to the Cloud Builder appliance to begin the VCF deployment once the initial infrastructure has been deployed by the automation script.
Note: Although AVN and the respective NSX-T configuration is not in scope for the automation script, it is definitely possible to use a solution like VyOS or pfSense and using techniques like the following to automate the additional infrastructure to enable the ability to deploy a complete VCF environment. I will leave this as as fun and interesting learning exercise for the reader.
You can find the complete details on VCF Automation script and how to use it at the following Github Repo: https://github.com/lamw/vcf-automated-lab-deployment
Here is an example of running the script which took about 13 minutes on my environment to deploy requirement components for Cloud Builder to begin VCF deployment.
After the script has completed, you should see a vApp that contains the following 5 VMs and you can now connect to the Cloud Builder UI.
From here, you will upload the auto-generated vcf-config.json configuration file from the automation script and just sit back and watch your VCF environment get built.
This process will vary based on your underlying hardware and for my setup, it took a little over 1.5 hours to complete. Once completed, you will have vCenter Server, vSAN and NSX-T fully configured along with SDDC Manager which will be accessible by using the vSphere credentials that you had defined in the automation script as shown in the screenshot below.
Luciano Patrao says
Hi William,
Great work, great work indeed. I will start this implementation today.
Thanks for this, and also for the great nested environments that you have built over the years.
sofianetech says
Hi
I've flowed the deployment lab but during and during the Bring-Up step I have got the following error during the preparation of the VSAN Virtual disks ,
" Host preparation failed with the following errors: com.vmware.evo.sddc.common.hostservices.error.HostPreparationException: Failed to delete partitions on ESXi host vcf-m01-esx01. com.vmware.evo.sddc.common.hostservices.error.HostPreparationException:
Failed to delete partitions on ESXi host vcf-m01-esx02. com.vmware.evo.sddc.common.hostservices.error.HostPreparationException:
Failed to delete partitions on ESXi host vcf-m01-esx03.. com.vmware.evo.sddc.common.hostservices.error.HostPreparationException:
Failed to delete partitions on ESXi host vcf-m01-esx04.
Any idea about the cause ?
virtualex says
I ran into this same issue and was able to get passed it by simply rebooting the (4) esxi vms then retrying the deployment from cloud builder. moving forward, i think it's a good idea to just reboot any nested esxi vms after they've been deployed.
Leef Torres says
I am getting this error NSX-T Manager operation status is false on 172.23.XX.XXX Failed to deploy NSX-T Manager vcf-m01-nsx01a on 172.23.XX.XXX anyone have an idea?
Michael Greene says
Good Day William...great post and a HUGE fan of your site for years. I was able to get the script to run without any errors, but when tried to start the VCF deployment, I noticed I was not able to reach any of the host. All for hosts are connected to the same switch as the VCF CB, but I'm not able to ping/connect to them. I've rebooted the hosts, but I am still not able t access them and thus stuck....any ideas?
Volker K. says
Hi William,
will the script also work for VCF 4.3?
William Lam says
There was a recent report internally about some changes with latest VCF 4.3 release which expects use of ECDSA keys for SSH which would require a reboot of Nested ESXi Appliance prior to enablement. I’m working with someone to verify the fix and may publish a new version of 7.0u2a OVA
MG says
Hello William,
As I was running through your script I ran into an issue with the Cloud Builder. When I try to deploy the SDDC, it errors out in the "Deploy and Configure vCenter Server" section. It fails at the "Download SSH Keys using Guest Program for vCenter" step. Not sure what's going on here. I have verified the vCenter appliance has been successfully and accessible. I tried rebooting the appliance and the esxi host with no luck.
Any idea where I'm going wrong?
Thanks
directoryun says
i have same case...did you solve it?
luhnyclimbr says
So you may have already figured this out but I found the for some reason it was looking for the vm name that was the shortname inside of what it was deployed as with FQDN. Just changing the name of the vm to shortname allowed it to proceed past that error message. I don't know 100% the implications but it got us past that issue. It was also noted to retry with modified input spec but that wasn't look like needed.
For your reference here is that info https://www.vstellar.com/2020/06/04/retry-failed-bringup-with-modified-input-spec-in-vcf/
Sachin says
Same here. Getting stuck at the exact same message. Any response?
Michael Greene says
Are there any updates on this topic? I know a few people (including myself) are stuck at this point.
Thanks,
MG
webgangster says
Is there an update available how to resolve this issue?
markchasemackenziecom says
I have two Mac Pros with 128GB RAM each. Can I spread the load across both, as I don't have 192GB RAM on a single server?
William Lam says
Yes. In fact, for this particular setup, it was spread across several physical ESXi host 🙂
haikalshiddiq says
Hi William,
thanks for sharing. Is it possible to deploying with your script using VCF v4.4?
Regards,
Haikal
William Lam says
Try it!
haikalshiddiq says
Hi William,
Sure, i've tried v4.3.1 or .4.4.1 but stuck when uploading JSON file on Cloud Builder. Here is for error message :
Http failure response for https://vcf-m01-hs01.hicall.local/api/v1/system/sddc-spec-converter?design=ems: 0 Unknown Error, make sure the xlsx/json file is valid
Any solution maybe?
Lorenzo Ramirez says
Hi William,
This was a great write up. Would you happen to have something for physical environments and not nested?
William Lam says
I think they just called that VCF 😉
Take a look at VCF Automation https://blogs.vmware.com/cloud/2020/10/21/automating-vmware-cloud-foundation-consumable-services-cloud/
imthiachulu says
Hello William,
I am trying to deploy VCF 4.5, on a Single ESXi with VC.
Script stops after
Creating vApp Nested-VCF-Lab-qehgrJlP ...
[01-19-2023_09:37:53] Moving Nested ESXi VMs into Nested-VCF-Lab-qehgrJlP vApp ...
Move-VM: /home/jumpbox/Downloads/vcf-deploy.ps1:277
Line |
277 | Move-VM -VM $vm -Server $viConnection -Destination $VApp …
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| 19/01/2023 21:37:56 Move-VM vSphere single sign-on failed for connection
| '/VIServer=vsphere.local\[email protected]:443/' during a previous
| operation. The current operation requires such single sign-on and therefore failed.
| Future operations which require single sign-on on this connection will fail. The
| underlying cause was available in the error message which initially reported the
| single sign-on failure.
-------------------------------
I am using the same domain. (vsphere.local)
Abbed says
Hi William,
Thank you so much for all the good work,
in VCF 4.5 there is few change in the validation
1) VSAN bootdisk need to be change from 12 to 32GB
2) passwords of SddcManager and NSX need to be stronger like VMw@re123!VMw@re123!
3) "Gateway IP Management not contactable" a patch is in KB 89990 (release notes)
4) Failed VSAN Diskgroup error can be corrected with your article on FakeSCSIReservations
5) Instead of DHCP, can use IP Pool -> VMware Cloud Foundation API Reference Guide SDDC
i noted a few more in my environment
Cheers
Abbed
MG says
Greetings,
For those that got this working, what version of nested esxi and cloud builder did you use? I'm trying to narrow the scope of troubleshooting to something that works.
Regards,
MG
Abbed says
Greetings,
I use "ESXi 7.0 Update 3g Virtual Appliance" but remember modify, after or before executing the .ps1, the 1st disk from 12GB to 32GB or you'll find in CB 4.5 debug log "VSAN_MIN_BOOT_DISKS.error".
It's worth stating your issue(s), someone might have had the same.
Cheers
Abbed
William Lam says
There was a recent PR that was merged into the repo just the other day that enables support for VCF 4.5, please check the repo and download the latest version of the script which should work w/o further modifications
Abbed says
Cool thanks William for the fast merge very much appreciated
Thanks Mohamed Imthiyaz for the fast PR
Michael Greene says
Thanks for the quick response Abbed. I'm getting stuck on "Configure NSX-T Data Center Transport Node" during the SDDC deployment. The deployment gets stuck on this step for over an hour and then the entire deployment fails. I have using ESXi 7.0 U3i and Cloud Builder 4.5. Would it be possible to reach you offline?
Regards,
MG
Abbed says
Michael what you experience is two fold
on one hand NSX need alot beyond small to boot and prepare hosts (20Ghz and 20GB RAM)
with 3.2Ghz per core it means 6vCPUs to achieve that
(we can't use medium because its 24GB and you'll endup with insufficient error)
on the other ESXi host need enough memory to install NSX bits (16GB RAM)
it's doable with the default 38GB per ESXi but
you have to change NSX after its deployment small from (4vCPUs 16GB RAM)
to (6vCPUs 20GB RAM) and on the fly !
what i do is :
after the script deployment, modify/add Cloudbuilder timeout
vim /opt/vmware/bringup/webapps/bringup-app/conf/application.properties
ovf.deployment.timeout.period.in.minutes=180
nsxt.manager.wait.minutes=180
systemctl restart vcf-bringup
systemctl status vcf-bringup
tail -f /opt/vmware/bringup/logs/vcf-bringup-debug.log
Start the bringup deployment
after vcenter deployment, log into it and Disable Automatic DRS for vc and nsx
and after NSX ova deployment is complete
suspend Cloudbuilder VM,
shutdown NSX,
change the spec,
start NSX wait 15-25min,
ssh into it do “get cluster status”
if not all UP “start service manager” , “start service http” then
Power on from suspend Cloudbuilder.
I know it's not ideal,
it would be great if we could override CPU and Memory of NSX in the script...
Huh how offline ? I live in France. I've described this on my page at strivevirtually dot net
Cheers
Michael Greene says
Abbed,
Thanks for the detailed breakdown. I will give this a shot in my lab tomorrow and let everyone know the results.
regards,
MG
Luciano Patrao says
Stopping the NSX while deploying, did the trick for me.
Abbed Sedkaoui says
AFAIK CB 4.5 detect NSX down and delete it and
redeploy the OVA again despite the timeout set,
happened to me many times before i choose
the suspend route, thought now i'm looking
for an alternative, i'll keep everyone posted if it succeed.
Abbed Sedkaoui says
My bad Luciano you were right !
issued a
shutdown -r now
and corfu-server got out of its deadlock
without CloudBuilder intervening
All good 😀
Mohamed Imthiyaz says
Hello Abbed,
It's going to a long script if we have to wait for the VC and NSX to deploy. It could be another script to wait for the VC, SDDC mgr and NSX to deploy and edit the CPU and Mem for NSX.
Also, instead of suspending the cloud builder VM. You can wait until the NSX is deployed and VM powered on and when it completes the task "Add vCenter Server to NSX-T Data Center Management Cluster" you can poweroff the VM and edit the CPU and Mem
I just tested the latest script. NSX small deployment worked for me.
https://imthiyaz.cloud/automated-vcf-deployment-script-with-nested-esxi
Abbed Sedkaoui says
Hi Mohamed,
It's not really what we want, to react on the NSX deployment,
especially when we can proact by giving it exactly its needed
resources.
I'm re doing the lab right now with medium and a tweaked resource pools.
As i monitored it, to be 20 000 Mhz and 20480 MB i'll try with that.
@ William , Michael , i experienced " ESXI_BUILD.error" while trying
"ESXi 7.0 Update 3i Virtual Appliance" so be aware of it,
i reverted back to version 3g i had success with.
http://strivevirtually.net
Vmwhere says
Hey guys, I am facing an Issue while deploying VCF 4.5. The deployed vCenter seems to can’t connect to the network. I get the error „network configuration failed“ the vami_config_network did not help unfortunately 🙁
Abbed Sedkaoui says
Hey again, you might as well grep "ERROR"
Abbed Sedkaoui says
Hey, can you check the network configuration input and paste it here using `cat /opt/vmware/bringup/logs/vcf-bringup-debug.2023-02-20.0.log | grep "task Network Configuration Validation input" | tail -n 1`
change the name of the log yours.