Instant Clone or VMFork (as it is referred internally) has been around for a number of years now. It was initially available as part of vSphere 6.0 with the primary consumer being Horizon View and their just-in-time desktop solution. Although Instant Clone was part of the core vSphere platform, public APIs were not available for external consumption. Many customers were interested in the technology to enable other non-VDI use cases such as Dev/Test, Continuous Integration/Continuous Development (CI/CD) and even Container workloads. Part of the reason for not exposing the API was partially due to the original Instant Clone architecture which has certain limitations and constraints.
In addition, VMware was also interested in getting feedback from customers on how they would like to consume Instant Clone from an Automation standpoint, this was important because the current workflows were also some what complex. This started out with the release of a PowerCLI Instant Clone Extension Fling that provided an abstraction on top of the private APIs. Based on that and other feedback, VMware followed that up by releasing Instant Clone for pyvmomi (vSphere SDK for Python) Fling which gave customers more programmatic access to the private APIs. Both Flings were a huge success and we even had customers using the pyvmomi Instant Clone modules in Production to deploy several hundred Instant Clone VMs per day for their CI/CD workloads.
Taking the learnings from both Horizon View and the feedback from customers using the Flings, the Instant Clone Product/Engineering team has been hard at work behind the scenes on simplifying the Instant Clone architecture and removing limitations and constraints that had existed in earlier versions. As you can imagine, this was a non-trivial amount of work that would need to be released in phases, especially as VM lifecycle management touches almost every part of the vSphere stack. The team really focused on ease of consumption, especially from an Automation standpoint which is how most customers prefer to consume Instant Clone.
With the release of vSphere 6.7, the re-architected Instant Clone feature is now available to customers and can be consumed initially, using a public vSphere API. This first re-release of Instant Clone really provides the core foundation for future enhancements of Instant Clone which not only expand its own functionality but enabling other brand new capabilities within the vSphere platform. All I can say, is this is just the beginning!
Architecture:
The biggest key difference from prior versions of Instant Clone is that there is no longer a tight coupling between the SourceVM (Parent) and the DestinationVM (Child) which imposed a number of restrictions preventing Instant Clones from taking full advantage of core vSphere capabilities like vMotion, Cross vMotion, Storage vMotion, DRS, HA, etc. In this new version of Instant Clone also known as a "Parentless" Instant Clone, the instantiated VM no longer depends on the SourceVM.
Once instantiated, the Instant Clone is an independent VM that starts executing from the exact running state of the sourceVM which enables rapid provisioning of VMs that are immediately available for consumption, unlike traditional full clones. This instant provisioning is made possible by sharing both the memory and disk state of the SourceVM. From a memory standpoint, all Instant Clones will share the same physical memory pages as its SourceVM. This is true even if Transparent Memory Sharing (TPS) is disabled, which is actually really cool (more details will be updated in KB 2080735 for those interested). In other words, TPS is just one of the techniques that Instant Clone can take advantage of but not solely rely on to help deliver maximum memory efficiencies which ultimately enables greater consolidation ratios.
From a storage standpoint, delta disks are leveraged for disk savings which behave similarly like a Linked Clone. However, unlike a Linked Clone which uses snapshot based delta disk, Instant Clone uses a different technique and is not limited by the traditional disk chain length of 30 but rather 255 which is the vSphere platform limit. Lastly, you can now create an Instant Clone from either a running or a frozen SourceVM. In the past, you had to "freeze" the SourceVM which meant it was no longer accessible as part of the Instant Clone creation workflow.
Instant Clone Workflows:
In the first scenario, when a new Instant Clone is created, the SourceVM is briefly stunned and a new delta disk is created for each virtual disks and references the original SourceVM. A memory checkpoint is then captured from the SourceVM and transfered the DestinationVM. New delta disks are also created for the DestinationVM which also references the original SourceVM. Once the Instant Clone VM is up and running, the SourceVM is un-stunned and is available for execution again. This all happens in ~1second, which is just mind blowing if you ask me! If you create additional Instant Clones, the process will be similiar, there will be additional delta disks that are created on the SourceVM (since is it powered on and running) and the DestinationVMs will then reference the new delta disks as shown in the diagram below.
In the second scenario, an Instant Clone is actually created from a "frozen" SourceVM. This is the same behavior as the old Instant Clone workflow where a freeze operation is initiated from within the GuestOS using the VMware Tools vmware-rpctool utility and specifying the "instantclone.freeze" command. Since the SourceVM is no longer running, its original disks is already in read-only mode and only new delta disks are created for the DestinationVM. When the SourceVM is frozen, the VM will still be in powered on state but will no longer be executing instructions. This state will be reflected within the vSphere API for the SourceVM, so customers can easily identify when a VM is frozen. The SourceVM will continue to be in this frozen state until is is either powered off or reseted, which will bring the VM back to its original state (e.g. not frozen). Customers can also re-run the freeze operation a number of times and create new Instant Clones from different controlled points in time which can be extremely useful for development or debugging purposes. At the end, this workflow yields the exact same behaviors and sharing benefits as the first scenario without incurring additional latency from a deployment standpoint as well additional delta disks being created for each subsequent Instant Clone.
Customers with a keen eye or those with past experiences working with either Lab Manager (the good ol' days) or vCloud Director, you may immediately pickup on the fact that the second scenarios is the more efficient method when needing to deploy a large number of Instant Clones. The potential issue with the first workflow is because a new delta disk is created for each new Instant Clone, the SourceVM's disk chain depth can get deep, very quickly and incurring additional latencies due to deep traversals of the disk chain (maximum of 255). This also limits the number of Instant Clones you can create from a single SourceVM as there are disk chain depth limits. For a small number of VMs, the first workflow is perfectly fine but if you intend to deploy many more VMs, the second workflow is the recommended option for the reasons listed above.
Guest Customization:
In this initial release of Parentless Instant Clone, integration with the existing vSphere's GuestOS Customization (GOSC) engine is currently not available. However, guest customization is still possible and can be fully automated until GOSC is natively integrated. When a new Instant Clone is created, it will receive a new MAC Address but because it is inheriting the SourceVM's OS state and configuration, within the GuestOS, the IP Address and MAC Address is still the same and can cause a network conflict.
To prevent a conflict, the following workflow is recommended when the SourceVM is powered on:
- Create a new Instant Clone and within the create spec, you can disconnect the network interface by setting a new property under VirtualEthernetCard->connectable->migrateConnect to disconnect
- Within the Instant Clone, you will need to refresh the network interface so that the GuestOS will pickup the newly assigned MAC Address. This can be automated using the Guest Operations API and the process to refresh the network interface will depend on the GuestOS type. See some examples below on how to refresh the MAC Address
- Reconfigure the Instant Clone and update VirtualEthernetCard->connectable->connected to true so that network interface will be re-connected.
Application/Custom Customization:
Although GOSC is not available today, the short term method for customizing the network within the GuestOS can also be thought of as a generic interface enabling customers to pass external custom metadata directly into the GuestOS for customization. This is done through the use of guestinfo properties which can be included as part of the Instant Clone creation spec and is then consumed by a customer created script running within the GuestOS.
In my opinion, this is a really powerful capability that can unlock a number of interesting use cases. You could imagine for a CI/CDI workflow, where an external build system might produce a new artifact (library or executable) for each run and that information such as a build number or version can then be passed through this interface and used to dynamically influence either the application or system configuration dynamically. In addition, since each Instant Clone is an independent VM from the SourceVM, another use of this interface is simply storing the name of the SourceVM that could be reference later. Since this guestinfo properties is part of the VM object itself, administrators can also retrieve this information from outside of the GuestOS using existing tooling such as vSphere or other vSphere Automation SDKs. The scenarios are truly endless and I am really excited to see how customers will leverage this additional capability.
Refreshing GuestOS MAC Address:
Linux: For each network device, unbind and bind the driver.
#!/bin/bash for NETDEV in /sys/class/net/e* do DEVICE_LABEL=$(basename $(readlink -f $NETDEV/device)) DEVICE_DRIVER=$(basename $(readlink -f $NETDEV/device/driver)) echo $DEVICE_LABEL > /sys/bus/pci/drivers/$DEVICE_DRIVER/unbind echo $DEVICE_LABEL > /sys/bus/pci/drivers/$DEVICE_DRIVER/bind done
If DHCP is being used, you will probably need to force a DHCP refresh after updating the MAC Address.
Windows: Disable and re-enable network adapter
In PowerShell, this can be done using the Restart-NetAdapter cmdlet using the following:
Restart-NetAdapter -Name "Ethernet 2"
Note: There are probably other ways to accomplish the same behavior, these are just two examples.
In Part 2 of this article, I will demonstrate how to use the new Instant Clone API using PowerCLI (other vSphere Automation SDKs can also be used) along with a sample GuestOS script to give you an idea on some of the things that you can do using this new API. For those planning to follow along, in addition to having vSphere 6.7 environment, you will also need to update to the latest PowerCLI 10.1.0 release which supports vSphere 6.7.
k says
Very cool that vmware's officially supporting some of the previously hidden functionality in their SOAP API and in a way that's a fair bit cleaner than the {Create,Enable,Disable,Retrieve}Fork*_Tasks. It's kind of interesting that GOSC isn't available, I've been meaning to look at that for a while for regular clones (been doing host renames directly for my orchestration).
Now if only they'd add tag support to SOAP (or make it visible...) then I'd be really happy 😛
tdu says
Is there any limitations/caveats to be aware of in terms of Guest Customization comparing to the legacy Linked-Clones ?
Doug Baer says
Great article William!
Wee Kiong Tan says
Does that mean that running source VM will incur double the disk space it seems that two delta files is created per generated VM.
Ian P says
How will this work if the source VM is domained?
Would the actual scenario be clone non-domained source vm -> rename/renetwork -> reboot (cause of rename) -> join domain -> reboot -> use machine?
The savings are in the faster clone time?
David Lehrner says
Any integration with vRA?