ESXi host with network redundancy using NSX-T and only 2 pNICs?

In todays data centers, it is not uncommon to find servers with only 2 x 10GbE network interfaces, this is especially true with the rise of Hyper-Converged Infrastructure over the last several years. For customers looking to deploy NSX-T with ESXi, there is an important physical network constraint to be aware of which is quickly mentioned in the NSX-T documentation here.

For example, your hypervisor host has two physical links that are up: vmnic0 and vmnic1. Suppose vmnic0 is used for management and storage networks, while vmnic1 is unused. This would mean that vmnic1 can be used as an NSX-T uplink, but vmnic0 cannot. To do link teaming, you must have two unused physical links available, such as vmnic1 and vmnic2.

As shown in the diagram below, an ESXi host with only two physical NICs can not provide complete network redundancy as each pNIC can only be associated with a single switch (VSS/VDS or the new N-VDS) as pNICs can not be shared across switches.

For customers, this means that you need to allocate a minimum of 4 pNICs to provide redundancy for both overlay traffic and non-overlay VMkernel traffic such as Management, vMotion, VSAN, etc. This is much easier said than done as not all hardware platforms can easily be expanded and even if they can, there still is a huge cost in expanding the physical network footprint (switch port, cabling, etc).

UPDATE (06/12/18) - As of NSX-T 2.2, which was recently released, there is now a UI in NSX-T Manager for managing the migration of VMkernel interfaces to the N-VDS. For automation purposes, you may still find this article useful but now you have option of using the UI.

I had recently learned about an NSX-T deployment where the customer had ESXi hosts with only 2 pNICs but they had full network redundancy for both traffic types all running on NSX-T's N-VDS. It turns out as of NSX-T 2.0, you can now migrate your existing VMkernel interfaces from an existing VSS/VDS over to the N-VDS. This workflow should sound familiar as it is similar to the process of migrating from traditional VSS to VDS. This is actually called out in the NSX-T Reference Design (Appendix 1) which looks to have been put together by my good NSBU buddy, Nimesh Desai. One thing to be aware of which the whitepaper calls out is that migration today is ONLY available using the NSX-T REST API. I know NSBU folks are working on a way to make this easier to consume, perhaps a nice UI in the future ...

Even with just the API, details in the whitepaper was still a bit light. Since I already had an NSX-T environment deployed (courtesy of my Automated NSX-T Lab Deployment script) which is configured with 2 pNICs by default. I figure I would try out the workflow and hopefully expand on the "details" a bit more 😉 as I know this is something that has been coming up as customers are starting to design and deploy NSX-T into their Production environments.

Before attempting a migration, I should mention there is a few caveats to be aware of ...

Depending on how you plan to deploy NSX-T into your environment. The NSX-T Edge can NOT reside on an NSX-T Logical Switch, this is important if you only have a single Cluster that will be used to house all your management VMs, including the NSX-T components. You will need additional pNICs which are connected to either VSS/VDS to provide networking to the NSX-T Edge. Another option is if you have a Management Cluster that is not Host Prep with NSX-T or if they also have additional pNICs which are connected to VSS/VDS, then the NSX-T Edge can reside there. Lastly, you also have ability to deploy a bare-metal NSX-T Edge, but this may not be ideal if you want to have consistent VM management capabilities. For this particular customer, they had a single "Consolidated" Cluster for both Management and Compute and so they ended up adding two additional standalone ESXi hosts which just ran their NSX-T Edges.
Once you migrate a VMkernel interface onto a N-VDS, you will no longer be able to manage the interface like you would using the vSphere Web/Flex Client. This may also include any stats or troubleshooting tools that you might consume through the traditional VDS API. Some of this functionality is available in NSX-T itself, but be aware that you will not be able to change the configurations once its outside of vCenter Servers control.
The ESXi hosts that I had migrated were also running VSAN and had a dedicated VMkernel interface for the VSAN traffic. I noticed after the first host migration, that the VSAN cluster was complaining about connectivity to the rest of the VSAN hosts. This continued to persist even after migrating all ESXi hosts. To resolve this, I simply had to disable/re-enable VSAN on the Cluster and the networking warnings all went away. For greenfield deployments, this may not be an issue as VMs may not be running but for existing brownfield deployments, you may take some extra caution to ensure VMs will not be affected as part of the migration. As far as I can tell, there is no impact nor any lost of network connectivity for the VMkernel traffic.

Here is the high level workflow:

Host Prep ESXi hosts like you normally would (use an Uplink Profile that contains 2 pNIC but only assign one of the pNICs to N-VDS). Also ensure, that the Uplink is connected to BOTH the Overlay and VLAN Transport Zone for proper connectivity when migrating VMkernel interfaces
Migrate each VMkernel interface from VSS/VDS to N-VDS for each ESXi host using the NSX-T API
Detach the last pNIC that is connected to either the VSS/VDS via vCenter Server or ESXi
Attach the last pNIC to N-VDS using the NSX-T API

Below are the instructions that I used to migrate the 3 VMkernel interfaces (Management, VSAN and vMotion) for each of my ESXi hosts that is connected to VDS which I will then move to an NSX-T N-VDS with both pNICs. I will assume you have already deployed NSX-T and have either host prep ESXi hosts or will go through the process (but I will not be walking you through that step, you can refer to the official NSX-T documentation if you want step by step guidance).

Step 1 - If you already have both pNICs on your ESXi hosts attached to either a VSS or VDS, be sure to detach one of the pNICs from the switch which can then be used. In our example, we will detach vmnic0.

Step 2 - Configure (host prep) your ESXi hosts as Transport Nodes and ensure you are using either the default Uplink Profile which contains 2 pNICs by default or create a new Uplink Profile and ensure it uses 2 pNICs. You can either host prep each ESXi host by hand or you can add vCenter Server as a Compute Manager, then you can configure NSX-T to to automatically host prep all ESXi hosts within a vSphere Cluster and automatically have NSX-T create the Transport Nodes, which I recommend. As you can see below, when you specify the pNIC from your ESXi host to the logical "Uplink" interface 1, we are selecting vmnic0 which is currently not in use per Step 1.

Step 2 - Next, we need to update the Transport Node (which should have been automatically created for you if you selected the option) configuration to ensure it is also connected to the VLAN Transport Zone so that when we migrate the VMkernel interfaces, networking connectivity will still function.

At this point, we now have vmnic0 connected to our N-VDS and vminc1 is still connected to our VDS. We can confirm this by selecting a specific ESXi host under the Physical Adapters as shown in the screenshot below.

Step 3 - We now need to create our Logical Switches simliar to the Distributed Portgruops on the VDS that maps to the VLANs used for our various VMkernel interfaces. In my example below, I have following three:

Management-LS: VLAN 333
VSAN-LS: VLAN 3252
vMotioin-LS: VLSAN3253

Step 4 - Lastly, we need to document the unique identifiers (UUID) for each of our ESXi Transport Nodes as well as the Logical Switches that we had just created which will be needed when using the NSX-T API to migrate the VMkernel interfaces. You can easily obtain these by using the NSX-T UI.

For Transport Node IDs: Navigate to Fabric->Nodes->Transport Nodes and click on the ID field for each ESXi Transport Node and record the UUID value
For Logical Switch IDs: Navigate to Switching->Switches and click on the ID field for each Logical Switch and record the UUID value

For my lab environment, the IDs are as follows:

ESXi Transport Nodes

ESXi Transport Node	UUID
esxi-01	dd923989-f17d-4b97-b115-87b37b788305
esxi-02	ad1e005d-acfd-4db1-954f-a3d591717a56
esxi-03	a5bf8710-0b1b-46ad-95bb-f4d1eaec979e

Logical Switches

Logical Switch	UUID
Management-LS	3f17da92-63d4-4854-9e2b-49c345a62481
VSAN-LS	64ade5e9-528d-44c7-bce7-6810afdf08c6
vMotion-LS	28593dde-841b-4f40-b84c-e2bba9bd446b

At this point, we no longer require the NSX-T UI. We are now ready to use the NSX-T REST API to perform the VMkernel migrations.

Step 4 - To interact with the NSX-T API, we will be using a simple REST Client called Postman which you can download here.

Step 5 - Install and open up Postman and the first thing we will do is specify the operation to be a "GET" since we need to retrieve some information before we can migrate our first VMkernel interface. In my example, this is esxi-01, and so the URL will be the following: https://nsxt-mgr.primp-industries.com/api/v1/transport-nodes/[ID-OF-FIRST-ESXi-HOST]

Next, we need to authorize ourselves to the NSX-T API, so we will select "Basic Auth" type and that will be the admin username and password to the NSX-T Manager.

Step 6 - After that, we need to specify the Content-Type to be application/json and then we can hit the "Send" button which will perform a GET (read-only) operation to retrieve information about our first ESXi Transport Node as shown in the screenshot below. Go ahead and copy the body response (shown highlighted in blue below) which will be used in the next step.

Step 7 - Go ahead and now change the operation from "GET" to "PUT" which will allow us to modify our first ESXi Transport Node. After that, click on the "Body" option and select the "raw" and paste the contents from previous step. Do not hit send yet.

Step 8 - Finally, we need to append the migration options to the URL which will specify the ID of the VMkernel interface we wish to migrate (e.g. vmk0, vmk1, etc) and the UUID of the Logical Switch that we will move the VMkernel interface to (obtained in Step 4)

In my example below, the URL will look like the following: https://nsxt-mgr.primp-industries.com/api/v1/transport-nodes/dd923989-f17d-4b97-b115-87b37b788305?if_id=vmk0&esx_mgmt_if_migration_dest=3f17da92-63d4-4854-9e2b-49c345a62481

The highlighted portion in orange above is what you will append to your specific URL. if_id and esx_mgmt_if_migration_dest option is documented in the NSX-T API here if you want further details. This API works bi-directionally and can be used to migrate back from an N-VDS to VSS (VDS is currently not supported).

Note: To ensure migrations are successful and that you can verify its connectivity before proceeding to the Management VMkernel interface, it is recommended that you move a non-critical VMkernel interface to test first such as the vMotion network. Once migrated, you can perform basic checks like being able to vmkping the interface to and from that ESXi host. Once you have validated that the process works, you can migrate the remainder interfaces including vmk0.

Once you are ready to perform the migration, go ahead and click on the "Send" button to perform the operation.

Step 9 - To confirm the migration was successful, you should still have connectivity to your ESXi host (assuming you started with vmk0 which is generally the default Management VMkernel) but you can also head over to your vSphere Web/Flex Client) and see that our vmk0 interface is now attached to the N-VDS. To avoi

Step 10 - Once you are ready to migrate the remainder VMkernel interfaces, the process is similiar except that you need to update the "_revision" number and increment it by one before you click on the Send button. In our first migration, we performed a "GET", so the current revision number is 1 and we can submit the migration without any revision number updates. If you wish to migrate another VMkernel, then we need to change the revision to 2 and then migrate and for the third VMkernel, the revision would be 3. If you are unsure, you can always simply perform a "GET" on the current Transport Node and then simply send the PUT request without touching the revision ID.

For reference, here are the migration URLs for my vmk1 and vmk2 respectively:

vmk1 URL: https://nsxt-mgr.primp-industries.com/api/v1/transport-nodes/dd923989-f17d-4b97-b115-87b37b788305?if_id=vmk1&esx_mgmt_if_migration_dest=64ade5e9-528d-44c7-bce7-6810afdf08c6
vmk2 URL: https://nsxt-mgr.primp-industries.com/api/v1/transport-nodes/dd923989-f17d-4b97-b115-87b37b788305?if_id=vmk2&esx_mgmt_if_migration_dest=28593dde-841b-4f40-b84c-e2bba9bd446b

If everything was succesful, we can refresh our vSphere Web/Flex Client to see that all VMkernel interfaces have been migrated off of our VDS and onto the N-VDS as shown in the screenshot below.

Step 11 - The last step before moving on to the next ESXi host is to attach the remainder pNIC from the VDS onto our N-VDS. You will need to detach the pNIC from the vSphere Web/Flex Client before attempting the next operation or the pNIC will not move.

We will use the exact same API and adjust the body to now attach the pNIC. We can delete the ?if_id=..... which we had appended earlier to the URL. You simply need to just reference the specific ESXi Transport Node UUID as shown in the screenshot below. Next, we need to scroll down to the "pnics" section of the JSON body and append the following:

,
{
    "device_name": "vmnic1",
    "uplink_name": "uplink-2"
}

Note the comma and also the specific vmnic and uplink name. If you followed the steps above, then the remainder pNIC to attach to the N-VDS is vmnic1 which will map to uplink-2 in our Uplink Profile. If you used other naming conventions, you just need to make sure you update the JSON with the respective name and then you can click Send to submit the request.

Step 12 - You confirm the operation was successful by looking at the Physical adapters for the ESXi host and you will now see both pNICs attached to our N-VDS.

Although the workflow can be a little daunting at first, once you understand the process, it is fairly straight forward. I definitely recommend you try this out in a development/test lab before trying this in Production. In fact, a great way to get familiar with this is to try out my Automated NSX-T Lab Deployment script, which funny enough, is what I used to run through this workflow myself. Since the VMkernel migration is simply using the public NSX-T API, you can easily automate this workflow using any language that supports REST, which also includes the NSX-T PowerCLI cmdlets. As you can imagine, you can use a bit of PowerCLI and NSX-T cmdlets to create simple function that takes in a single ESXi host and migrate all the VMkernel interfaces to an N-VDS without even breaking a sweat 🙂 I will leave that as an exercise for the reader.

Comments

Dumlu Timuralp says

04/19/2018 at 4:32 pm

I think there is a typo in the following sentence : it should sayd vmnic1 is still connected to the vDS.

“At this point, we now have vmnic0 connected to our N-VDS and vminc0 still connected to our VDS.”

- William Lam says
  
  04/19/2018 at 5:13 pm
  
  Thanks, just fixed
  
Antony Stefanov says

06/12/2018 at 5:30 am

Hey William,
the NSX-T 2.2 is already released and there is UI wizard to migrate vmkernels and physical adapters to n-VDS.

- William Lam says
  
  06/12/2018 at 5:34 am
  
  Yup, I'm aware 🙂 I've been meaning to update the article but guess this is a good reminder
  
JoeG says

08/20/2018 at 12:33 pm

This statement seems confusing to me: The NSX-T Edge can NOT reside on an NSX-T Logical Switch. The installation docs for 2.2 seem to imply it can.

Stu Charlton says

09/12/2018 at 2:26 pm

The NSX-T Edge can reside on a VLAN-backed NSX-T logical switch, without dedicated Edge hardware, and the docs reflect this now.

But the configuration is very tricky from what I've seen and tested:
a) you need to ensure that the TEP for the Edge overlay connectivity is on a separate VLAN and Subnet from the Host TEP (but still routable!)
b) you need another VLAN transport zone that the hosts all belong to, for the VLAN-backed logical switch,
c) the Edge VM NICs attach to these VLAN logical switches.
d) you need to ensure you VLAN tag the TEP at the logical switch level rather than the uplink profile (i.e. don't use a trunk... I found that GENEVE encapsulation will not be properly enabled from the ESXi node that hosts the Edge VM, to the Edge VM node itself ),
e) you need to ensure you use a trunk logical switch for the uplinks so that your VLAN-backed logical switch uplinks can do their tagging.

I tried drawing this here: https://s3.amazonaws.com/scharlton-piv/edge+and+host+overlay+vlan+mapping.pdf

I don't claim this is the best, only, or preferred path, just relaying what we had to do for a particular scenario where it seems to be under-documented.

Simon says

10/01/2018 at 3:09 am

"Host Prep ESXi hosts like you normally would (use an Uplink Profile that contains 2 pNIC but only assign one of the pNICs to N-VDS). Also ensure, that the Uplink is connected to BOTH the Overlay and VLAN Transport Zone for proper connectivity when migrating VMkernel interfaces"

William, could you elaborate on it a little bit more? Based on my knowledge:

If we have 1 unused vmnic -> we can assign it only to one Transport Zone. If it will be the Overlay, then we cannot use GUI in 2.2 to migrate the VMkernels.

Ronald says

07/19/2019 at 1:32 am

William,

Excellent write-up. I followed the steps but am not able to migratie the NSX manager to the NVDS. The NSX manager loses network connection. All other vm's migrate without problems. Can you point me in the right direction?

Regards,

Ronald

More from my site

Comments

Thanks for the comment!Cancel reply