GPU Passthrough with Nested ESXi

Advancements in ESXi Nested Virtualization have given us the ability to run ESXi inside of a VM (Nested ESXi) and has allowed us to do just about anything you would with a physical ESXi host for development, testing and learning purposes. In fact, I have shared many tips and tricks for using Nested ESXi and Nested Virtualization over the years on my blog, which is worth bookmarking in case you are trying to do something and run into an issue which more than likely, I have come across.

Today, there is very little you can not do using Nested ESXi and is typically limited to a physical device that can not be virtualized and/or emulated in software.

I bring this up because I recently had a chat with Frank Denneman on an unrelated topic and he brought up the question about being able to double passthrough of a GPU from a physical ESXi host into a Nested ESXi VM which would then be passthrough'ed again to a VM running on that Nested ESXi system. While this was not the first time that I had heard of such a request, it does not come up often, this has only been the second time I have heard of this request. For context, his use case was for testing purposes and I can certainly see some interesting scenarios where you want to run vSphere in a Nested environment and still access all the vSphere capabilities including leveraging a physical GPU within that environment, whether that is AI/ML or other graphics process requirements.

My response to Frank was this will not work for a few reasons, one of which is that the use of Virtual Hardware-Assisted Virtualization (VHV) is not supported with DirectPath I/O and if the GPU is passthrough to a VM, even if it was running ESXi, it would be in control of the GPU, so how could one passthrough it again?

My curiosity got the better of me and given this was the second time I had ever been asked about this, I figured maybe it was worth exploring but before I go down anymore 🐇🕳️, I wanted to get quick sanity check from one of our graphics engineers on the remote feasibility of this ask.

To my complete shock, I learned that it was possible and that it has been more stable starting with vSphere 7.0 Update 3c and later! One key capability that is required from the vSphere platform was the support for Virtual Input/Output Memory Management (vIOMMU), which was added in one of the vSphere 7.x release.

Armed with this information, I wanted to see if this would work with recent Intel NUC platforms such as the Intel NUC 12 Enthusiast which includes both an iGPU and dGPU, as well as the Intel NUC 12 Extreme, which includes an iGPU and I also have it installed with an Intel Arc 750 (dGPU), which can also be used with ESXi.

My testing initially started out with the latest ESXi 8.0 Update 1 release, but I quickly ran into the following error when attempting to power on the Nested ESXi VM:

Module PCIPassthruLate power on failed. Failed to start the virtual machine. Failed to initialize Passthru world information.

So I decided to try my luck with the latest ESXi 7.0 Update 3 release, which is 7.0 Update 3g as of this writing this blog post.

It worked immediately as you can see from the screenshot above! 🥳

I was baffled and I shared my findings with Engineering and after a bit of debugging, it looks like I had found a bug that prevented this from working when using ESXi 8.x which includes the latest 8.0 Update 1 release. While a fix has already been submitted by Engineering, which I have also confirmed by testing an internal build of ESXi, it will not be available in general public until a future ESXi 8.x patch or upgrade. At least, this works for anyone running ESXi 7.x and the detailed instructions are listed below.

Note: I was only successful in using the Intel Arc 770M dGPU which is included in the Intel NUC 12 Enthusiast. Using the external Intel Arc 750 dGPU in the Intel NUC 12 Extreme, I was able to passthrough the GPU, but the inner VM was not responding. Further testing may be needed and when ESXi 8.x fix is available.

With the dGPU being functional, I also attempted to use the iGPU found in both the Intel NUC 12 Enthusiast and Extreme, however I ran into the following error when attempting to power on the Nested ESXi VM using these iGPU devices:

Can't use virtual IOMMU with devices with RMRRs sbdf=0000:00:02.0

Engineering confirmed that GPUs with Reserved Memory Region Reporting Structure (RMRRs), which seems to be limited to iGPU, will not work in passing through to Nested ESXi VM.

Disclaimer: While I was successful in passing through these specific Intel dGPU, the instructions below may or may not be applicable for other dGPU or additional configuration changes maybe required to have similiar success.

Step 1 - Install ESXi 7.0 Update 3g on your physical ESXi system. It is possible that earlier versions of ESXi 7.0 Update 3 may also work but this is the version I had successfully tested with.

Step 2 - Deploy my Nested ESXi 8.0 Update 1 Virtual Appliance OVA, which streamlines the creation of the Nested ESXi VM. You are more than welcome to create your own but this will be the fastest way to setup not to mention the Nested ESXi VM would have already been installed.

Step 3 - Enable passthrough for the dGPU found within the Intel NUC Enthusiast which should have the following VendorId:DeviceId (80806:5690)

Step 4 - Edit the Nested ESXi VM and under the CPU settings:

Uncheck Expose hardware assisted virtualization to the guest OS setting
Enable the Virtual IOMMU setting

Then add a new PCI passthrough device and select the dGPU (you can use either Direct I/O or Dynamic Direct I/O Path, makes no difference)

Note: PCI passthrough is not supported with Virtual Hardware-Assisted Virtualization (VHV) and must be disabled before we can add dGPU. This setting however is required to power on the Nested ESXi VM and the workaround is enable this via VM Advanced Setting in the next section.

Next, click on the Advanced Parameters tab and add the following four VM Advanced Settings:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = 16
vhv.allowPassthru = TRUE
vhv.enable = TRUE

Note: For AMD systems, you may also need to add amd.iommu.supportsPcip = "TRUE" or you may see following message: The virtual machine cannot be powered on because IOMMU virtualization is not compatible with PCI passthru on AMD platforms.

Step 5 - Power on the Nested ESXi VM and login to either the ESXi Embedded Host Client or add the ESXi VM to your existing vCenter Server inventory, which is what I have done in my setup.

Step 6 - Enable passthrough for the dGPU which has been passthrough from your physical ESXi host to the Nested ESXi VM which should have the same VendorId:DeviceId (80806:5690)

Step 7 - Create an Ubuntu (64-bit) VM with the desired compute and storage settings, I stuck with the defaults except the memory, which I changed to 8GB (make sure reserve all memory as this will be required for PCI passthrough).

Additionally, click on the Advanced Parameters tab and add the following two VM Advanced Settings:

pciPassthru.use64bitMMIO = TRUE
pciPassthru.64bitMMIOSizeGB = 16

Note: These additional VM Advanced Settings were needed to successfully power on the inner guest VM running on the Nested ESXi VM.

Step 8 - Download and install either Ubuntu 20.04 (Focal) or 22.04 (Jammy) and once the OS has been installed, use the following Intel documentation to install the required Intel Graphics Drivers:

With everything running correctly, you now have GPU that has been passthrough from your physical ESXi host to an ESXi VM (Nested ESXi) which in turn passes that through to an inner VM running as shown in screenshot below! 😅

Comments

Michael says

06/08/2023 at 1:38 am

Thanks for the guide.

I just tried this on ESXi 8.0 update 1a and it appears the vhv.enable = TRUE setting is not sticking in the GUI.
So I added it to the vmx directly and after powering on the VM, the setting then disappears from the VMX and is also not visible in 'advanced' GUI.

Any ideas?

- William Lam says
  
  06/08/2023 at 5:59 am
  
  Did you reload the VM after making the change? As long as the setting is in the VMX, you don't need to worry about the UI but if you attempt to make further changes in the UI, it'll ask that you remove that parameter. My suggestion is to make whatever changes you need and then as a last step, add the vhv.enable = TRUE into the VMX file, make sure you reload the VM and then power on and it should work as I've explained
  
  - Michael says
    
    06/10/2023 at 1:37 am
    
    I had to unregister the vm and re register it for the changes to take affect. Thanks for the advice.
    
    - John says
      
      08/26/2023 at 8:02 am
      
      This enabled it for me as well on 8.01a. Thanks for sharing.
      
  - Michael says
    
    06/11/2023 at 7:57 am
    
    Now I am getting the followig error -
    
    The virtual machine cannot be powered on because IOMMU virtualization is not compatible with PCI passthru on AMD platforms. To power on the VM either disable the virtual IOMMU or remove the PCI passthru device(s).
    
    Seems that nested passthrough is not possible on AMD platforms?
    
    - William Lam says
      
      06/11/2023 at 11:54 am
      
      Can you try adding:
      
      amd.iommu.supportsPcip = "TRUE"
      
      - Michael says
        
        06/12/2023 at 2:34 am
        
        Thanks, that exposes the IOMMU grouping correctly. I am passing through an Nvidia GT730 to UNRAID but seem to have hit a massive road block here. Each time I power on the nested VM, with the GPU passed through, UNRAID hangs and I cant even power off the VM in the ESXi console or in esxcli. Maybe this use-case is too niche.
      - William Lam says
        
        06/12/2023 at 5:26 am
        
        I'm not familiar with UNRAID, but quick search online shows media server? If so, what benefit are you looking for by running this inside of a Nested VM? Why not just run this as a traditional VM w/normal GPU passthrough?
Thomas says

09/25/2024 at 11:44 pm

I added an NVIDIA Tesla P4 GPU to vCenter 7.0 and configured GPU sharing for multiple VMs, but it is not working. Is there any way to do this?

More from my site

Comments

Thanks for the comment!Cancel reply