At VMworld this year, I had received several questions from customers asking whether it was possible to move an ESXi host configured using LACP/LAG from one vCenter Server to another, similar to the workflows outlined here or here. Not having spent much time with LACP/LAG, I reached out to a fellow colleague who I knew would know the answer, Anothony Burke, who you may know as one of the co-creators of the popular Automation tool PowerNSX.
Anthony not only verified that there was indeed a workflow for this scenario, but he was also kind enough to test and verify this in his lab. Below is the procedure that he had shared with me and I merely "prettified" the graphics he initially drafted up 🙂
At a high level, the workflow is similar to the ones shared earlier. The main difference is that for an LACP/LAG-based configuration, you must convert from VDS to VSS and then disconnect from one vCenter Server to the other, you can not simply disconnect and "swing" the ESXi host like you could for non-LACP/LAG configuration or you will run into issues. Once you have re-added the ESXi host to the new vCenter Server, you simply reverse the procedure from VSS to VDS and re-create your LACP/LAG configuration.
Step 1 - Here is an example of a starting point, where we have an ESXi host with 2 pNICs (vmnic0 and vmnic1) connected to an LACP bundle which is then associated with a physical switch.
Step 2 - Move vmnic1 from LACP/LAG configuration and then create a new VSS which it is associated with. To allow existing connections to drain gracefully, place the pNIC into standby rather than just removing it which will simply terminate all existing flows.
Step 3 - Now, that we have a pNIC on both the VDS and VSS, we can now migrate all VMkernel and VM Networking interfaces from the VDS over to the VSS.
Step 4 - Once you have completed all VMK and VM Networking migrations, you can now remove vmnic0 from VDS and associate that to the VSS.
Step 5 - At this point, you can now safely disconnect the ESXi hosts from the current vCenter Server and add that to the new vCenter Server
Step 6 - Now, we just simply perform the reverse set of steps from 2-4, by going from VSS to VDS and re-creating our LACP bundle on the new vCenter Server.
Scott Elliott says
We ran into this a while back and the way we handle it was to Export the VDS to a file and restore it to the new vCenter. Next move one host over to the new vCenter (without VMs) and configure it to the VDS. Once that was complete we took a host from the old vCenter, added it to the new vCenter and used vMotion to move VM to the first host added. Then configure the new host with the VDS and repeat for the remainder of your host. Work like a champ but it isn't a supported method. I can say we moved over 500 VMs this way and although tedious it worked.
William Lam says
Thanks for sharing your story Scott! Glad to hear you got it working
Michael says
Thanks for sharing. Actually, step 4 is unnecessary, since you will remove vmnic from vss in step 7.
Chandan says
But, if pNIC isn't removed from the vDS as mentioned in the step 4 then pNIC might be still associated with vDS portgroup in source vCenter & may not be added to the target vCenter vDS?
Michael Rottlander says
In step 2 before adding the pnic to the vss you’ll have to reconfigure the physical switch to remove the port from lacp. Otherwise the traffic will be blocked.
Before re-adding the nic to the vds in step 6, you’ll have to re-configure the seitch again.
For the move you’ll have to work closely together with networking team as timing is critical.
Thomas Staeck says
Hello William,
we also had to move several cluster from one vCenter to another vCenter. In principle the procedure works without a problem. But there are several issues to consider.
- Templates are "special". In our scenario the Cluster had datastore cluster and when we moved the ESXi server to the new vCenter we experienced during tests that we lose the templates on the datastore cluster from the inventory. I am not sure whether this situation also occur on normal datastores. We had to convert them to a VM before the migration and back to template as the hosts are registered at the new vCenter.
- Templates are real "special" . In our scenario the template where connected to portgroups on the distributed Switch. You must also migrate the templates to the "migration switch" otherwise you cannot remove the ESXi server from the distributed Switch. Interestingly the template remains connected to the port is was connected to as a VM before it got converted to a template. This means even the template is not on the ESXi server anymore it still blocks removing the ESXi from the distributed switch.
- if you have overrides on VM bases for the compute and datastore cluster they got lost during the migration. RVTools is a great tool to do the documentation and gives the data for mass changes afterwards.
- Another point to consider is the network interruption when moving from VDS to VSS and back. During our PoC we had network timeouts from 0 - 6 seconds. For RDP session we saw successful reconnections. For the applications inside the RDP session luckily it was also not a problem.
All over vSphere proved again that in principle (mostly) every (mis)configuration could be changed without an outage.
Best Regards
Thomas
Jignesh says
Hi William,
Thanks for the info. I would like to know if in Step 2 before moving the pnic to VSS do I need to reconfigure the pswitch to remove that port from LACP? and then reverse when recreating the LACP on new VC?
Regards
Jignesh
Sachin says
Hi Jignesh,
Have you got your answer as I am also in the same situation
Regards,
Sachin
Jignesh says
Hi Sachin
No I haven't got any response
Regards
Jignesh
Chandan says
I think we may have to break the LAG before we move the pNIC from the LAG bundle. Even I am in the same situation.
manu says
Hello,
We have tested Live Migrating Hosts and Clusters with LAG/LACP between vcenters. We did 6.0 to 6.7 u2 vcenter migration with lag. We had only couple ping loss max. You need to export and import the same dvs s/w to new vcenter, make lacp config to lacp fallback on core switch side and change lacp timeout. That way even if switch dont receive lacp it will still have connectivity.
Once you connect hosts to new cluster move all hosts to new dvs with same config.
Thanks