The idea of "Instant Cloning" a Nested ESXi VM (running ESXi in a VM) is not a new concept. In fact, I had shared a solution back in 2015 using the private VMFork APIs. However, what has changed is the ease of consumption, primarily due to the re-architecture of Instant Clone in vSphere 6.7 (more details here and here) which resulted in a public and simplified API. Some of you might ask, why not simply clone a Nested ESXi VM or create a Link Clone? What benefit would I get by using Instant Clone?
The answer is not only speed, but the fact that the instantiated VM is fully operational and ready to start executing where as a traditional full clone or linked clone requires a full OS boot up that can take up to several minutes to deploy and configure. This may not sound like much for a small number of Nested ESXi VMs, but as you increase the number of instances, Instant Clone really shines while still maintaining speed and the instant availability of the VM. As you can imagine, this definitely opens up for some interesting use cases whether it be for personal home lab or educational purposes like VMware HOL. In addition, we also have customers who deploy Nested ESXi not only at high scale but also with a high churn rate for development purposes, think CI/CD type of a workload who can also benefit from Instant Clone.
So how fast are we talking about? Lets say you wanted to test out the latest version of VSAN in vSphere 6.7, you would normally deploy 3 Nested ESXi VMs, power them up and wait for them to be ready on the network. With Instant Clone, you can deploy three fully functional Nested ESXi VMs in just 30seconds! As the VMs are instantly available for consumption, you can start the VSAN enablement workflow immediately and even parts of that can be baked into the Instant Clone workflow. With the ease of provisioning Nested ESXi VMs, you can simply maintain a catalog of ESXi templates which are in "frozen" states and then leverage Instant Clone to deploy just-in-time Nested ESXi environments and discard them once you are done. Pretty slick if you ask me! and something I plan on using going forward.
Disclaimer: Nested ESXi is still not officially supported by VMware. Please use at your own risk.
Step 1 - Create the base Nested ESXi template (I have tested both 6.5u2 and 6.7) which will we will use to Instant Clone from. You can either install ESXi in a VM by hand OR you can use one of my existing Nested ESXi Virtual Appliances (here and here). Make sure the VM is only configured with a single VMDK which should just contain the ESXi installation. If you are using my appliance, make sure to delete the second and third VMDK, the reason for this is that when we Instant Clone, the disk UUIDs will be duplicated and you will have conflicts when trying to create a VMFS volume or setup VSAN. The way we handle this is part of the Instant Clone instantiation where we will hot-add additional VMDKs based on your use case and this will ensure each VM will have unique UUIDs for each disk.
Step 2 - Download the Nested ESXi customize.sh script from my Instant Clone community repo and upload that to our base Nested ESXI template and ensure it has the execute permission (chmod +x customize.sh) before running. This script is responsible for prepping the VM prior to initiating the "freeze" operation and cleaning out any unique identities like the host UUID and vmkernel interfaces which is needed to ensure we do not have duplication in our Instant Clones. At this point, you can now run the script as shown in the screenshot below and it will perform a series of operations and then freeze the VM.
\
Step 3 - Next, download the PowerCLI driver script InstantClone-ESXi.ps1 which will be used to deploy new Instant Clones from our template. It will expect that you have access to my Instant Clone PowerCLI module, if not, please download that from here. The script has a number of variables, they should be pretty self explanatory but I will quickly go over them below:
- $SourceVM - This is the name of your base Nested ESXi template, replace it with whatever
-
$numOfVMs - This is the number of Instant Clone you wish to deploy, I recommend setting this to 1 to make sure it works before creating more
-
$ipNetwork - This defines the first three octets of your network (e.g. 192.168.30) if you are using static assignment or else you can ignore
-
$ipStartingCount - This defines the initial starting address (e.g. 50) and will increment by one based on the $numOfVMs variable, this is only applicable for static assignment or else you can ignore
-
$netmask - This defines the netmask for your network if you are using static assignment or else you can ignore
-
$dns - This defines the DNS server to use if you are using static assignment or else you can ignore
-
$gw - This defines the network gateway if you are using static assignment or else you can ignore
-
$networktype - This can be value of static or dhcp, where as static will require the above properties to be set. If you specify dhcp, then VM network that you have placed your base template must support DHCP or you will not receive IP Addresses when you deploy your Instant Clones
In addition to generating the appropriate guestinfo properties which will be fed to each of the Instant Clone for customization, we also need to generate a random UUID which will be used to configure each Instant Clone and ensure that they have unique identities, especially important if you plan to enable VSAN. I will not bore you with the details, you can refer to the code for the specifics but if this is not performed, you will definitely into a number of issues and this actually took me a bit of trial/error to figure out, so saved you a lot of the pain 🙂
Note: The PowerCLI script is just an example of what can be done, you can easily modify this to perform other tasks as part of the deployment workflow
The script currently assumes you will use these Instant Clones for VSAN, so it also hot-adds two VMDKs (4 and 8 GB respectively). If you do not want this to happen or if you want to change the size, go ahead and update the PowerCLI script. Once you have saved all your changes, we are now ready to run the script. If everything was configured correctly, your new Instant Clone Nested ESXi VMs should be up and running immediately after the script has completed. In the example below, I created three Nested ESXi Instant Clones.
If open the VM Console for one of our Instant Clones, you can get more details on what has occurred from within the ESXi VM such as updating the UUID, recreating the management VMkernel interface among other things. If you run into customization issues, they will show up here as well as stored in the logs under /ic-customization
Here is a screenshot of the three Nested ESXi Instant Clones which I have created from my base template and then added to my vSphere Inventory and successfully configured VSAN.
Just for fun, I also deployed 64 Nested ESXi Instant Clones (each was configured with 6GB memory) which took about 11 minutes 🙂
I was also monitoring esxtop to see how much memory I was saving. The top image is the physical ESXi host prior to deploying the 64 Instant Clone (4GB) and the bottom is after deploying the 64 VMs and we can see we are saving a whopping 339GB of shared memory which is pretty insane given this was deployed to single SuperMicro E200-8D with just 128GB of physical memory!
Gary says
William, could we do this in conjunction with other elements such as NSX-T / NSX-V ? what would we need to add into the script to ensure that we have the VIBs properly installed and configured. IT would be really nice to have this as a much more lightweight way to spin up specific lab topologies in a constrained environment..
William Lam says
As mentioned in Slack, sure 🙂 Though this has nothing to do with Instant Clone nor the script but as part of your "preparation" step which happens prior to performing the Instant Clone. You can install the VIBs on your ESXi VM and then after Instant Clone, they would all be prepped with the VIBs (though you still need to enable it on NSX-T/V Manager side which will then create the VTEPs)
Abe says
Hi William, is it possible to leverage instant clone to deploy a guest into a nested ESXi instance? If so, are there any guides or docs you can suggest to follow? Thanks!
William Lam says
If the Nested ESXi host is being managed by the VC in which you issue the Instant Clone, then yes. Its no different if this was a physical ESXi host. The process is exactly the same 🙂
Brad Lay says
Firstly thank you for this. As usual you provide the community with such great resources.
I've spent hours trying to get this working but continue to hit the same issue which I cannot figure out.
The only variable I have that's different now is I'm on 6.7u1 so I'm hoping it's not related to that.
I did try using your 6.7 appliance (but still 6.7u1 VC/physical ESXi) with both VSS and VDS (using the new maclearning feature!).
No matter what I do, after doing the instant clone the clone network doesn't come up - from the console it shows "sendto() failed (Host is down)" when attempting to ping the default gateway. Nothing obvious in the logs except I did find you have a bug in your customize.sh script - line 76 if [ ${TYPE} == "static" ]; then should read if [ ${NETWORK_TYPE} == "static" ]; then
It would be great to know if this works for 6.7u1 or if I've misconfigured something on my end.
Thanks,
Brad
Brad Lay says
Retried this using a standard switch backed by a physical port on 6.7u1 and it worked - so it must be something to do with the dvs pg.
Brad Lay says
When I use your Set-MacLearning as follows:
Set-MacLearn -DVPortgroupName @("Nested-01-DVPG") -EnableMacLearn $true -EnablePromiscuous $true -EnableForgedTransmit $true -EnableMacChange $true
It doesn't allow the clone to get network - as above.
If I simply use the legacy options for Promiscuous/ForgedTransmit and MacChange it works, so it seems related to the new method of setting the options.
Also once it's been set even if I set everything back to false and only use the legacy options it still doesn't work and the dvpg needs to be recreated again.
Brad Lay says
Do you know if there is a way to Hot-Add disks as SSD as when I hot add the disks on a vsanDatastore they get added as HDD and the only way to get them marked as SSD is to set the New-AdvancedSetting -Entity $nestedESXiVM -Name 'scsi0:1.virtualSSD' -Value $true and power off / on the VM (making the instant clone less instant...)
If I use a normal VMFS datastore this doesn't happen.
Andres B says
I don't get why it isn't working. I used this scripts at my homelab and it works just fine, but I'm building a new lab at work and it won't work. The clones don't refresh the MAC address of the NICs, it requires a reboot to work...
V says
this is a pretty neat feature however all the clones show up with the alarm "Virtual machine Consolidation Needed status". Since the snapshot is by design used as a base disk by the clone I would expect this alarm behaviour to be corrected in the future
lamw says
Thanks for the feedback, I've shared this with the PM for awareness
Mamata Desai says
William, thank you, this is super helpful! Creating a test scale ESXi cluster 🙂 and I followed your scripts.
Ran into one problem tho: the VMFS datastore that contains the ESXi install retains its VMFS UUID. Adding the resulting clones to vCenter fails for me with:
Datastore 'local-esx1' conflicts with an existing datastore in the datacenter that has the same URL (ds:///vmfs/volumes//), but is backed by different physical storage.
Reading online, I see the VMFS UUID is the same, probably, and it needs resignature, but being online with the ESXi booted on, we can't do an unmount for resignature.
Is there a quick easy solution that I may have overlooked?
Andy says
Had the same issue. Ended up removing the offending datastore1 from each host and all was good.
$vmhost = 'esxiHostname'
Connect-viserver $vmhost
Get-vmhost -server $vmhost | Get-Datastore local-esx1 | Remove-Datastore
Elad K says
Hi,
Thank you for a great article.
One problem that I’m facing is when you clone ESXi with installed VMs inside the ESXi I’m getting duplicate MAC on the VMs that are connected on the ESXi vswitch0 (External network interface).
How can I re-generate a new MAC address to the specific VM’s that connected to vswitch0 ?
Regards,
Elad
Sébastien TINETTI says
Hello,
It's a normal behavior. I made some modifications to the script to make your requested changes ;
- reboot the Nested ESXi after the deployment so they will be reachable on the network at the end of the script, with a progress bar to wait and
- Connect automatically to the nested esxi, listing the VM, and modify the Mac Address randomly according to the VMWare autorised range.
If William want the updated script I can send him for an update to this great article.
Have a nice day
Seb
V says
interesting modifications. However I presume once you reboot the nested ESX you’ll lose the advantage of the shared memory with the master VM, so the ESX becomes “fully inflated” in RAM.
Seb says
Hello,
I'm not sure about it but you should be true, in my case RAM isn't problematical. Usage is dynamic and you can still use TPS in case you haven't enough free RAM.
However in my case, no chance to see it working without a reboot of all the nested ESXi, I don't find the reason. I'm using 6.7u2 both physical & nested.
If you want the updated script give me an address and I will send it to you.
Alonzo Mercer says
Thanks William! Any way to keep the vlan id intact on the vmk0 on the produced vms?
k0rin says
This is such a useful feature! I've been using it to clone environments for my computer club to practice for cyber competitions. The biggest difficulty I've run into is that I can't connect the cloned environments to vCenter due to the conflicting UUIDs.
I saw an earlier post where "Andy" said they were able to delete the conflicting datastore, but my nested environments only have a single datastore.
I also tried to resignature the datastore, but the host returns "No unresolved VMFS snapshots"
Has anyone managed to connect multiple nested clones to a vCenter server?