WilliamLam.com

  • About
    • About
    • Privacy
  • VMware Cloud Foundation
  • VKS
  • Homelab
    • Resources
    • Nested Virtualization
  • VMware Nostalgia
  • Apple
You are here: Home / Kubernetes / NVIDIA GPU with Dynamic DirectPath IO (Passthrough) to Tanzu Kubernetes Grid (TKG) Cluster using vSphere with Tanzu

NVIDIA GPU with Dynamic DirectPath IO (Passthrough) to Tanzu Kubernetes Grid (TKG) Cluster using vSphere with Tanzu

10.17.2023 by William Lam // Leave a Comment

When provisioning a Tanzu Kubernetes Grid Cluster (TKC) using vSphere with Tanzu, you can easily request an NVIDIA GPU resource as part of the deployment, which can either be provided by NVIDIA vGPU or using PCIe passthrough with Dynamic DirectPath IO.

vGPU is great for those with a capable NVIDIA GPU, especially if the GPU will not be utilized 100% and you can share its resources amongst several VMs. However, if you do not have a capable GPU that supports vGPU, you can still provide you TKC workloads with a GPU resource using passthrough.


While playing with the Lenovo P3 Ultra, I unfortunately came to learn that NVIDIA RTX A5500 Laptop was NOT the same as an NVIDIA RTX A5500 🙁

Not ideal, but I guess NVIDIA did not want to add this additional device to their test matrix and hence their ESXi graphics drivers would not detect the GPU as vGPU capable. I knew that I could still use the NVIDIA GPU via passthrough but to my surprise, I just needed to get the NVIDIA drivers installed onto the TKC worker nodes.

That was much easier said than done as all the documentation that I could find on both VMware and NVIDIA website had detailed instructions for vGPU configuration but there was little to no documentation on how to use NVIDIA GPU in passthrough mode with vSphere with Tanzu. I came across a number of different NVIDIA solutions when it comes to k8s, but it was not very clear on which would be interoperable with vSphere with Tanzu and I eventually figured it out with the help pointing me in the right direction.

It was actually super easy, once you knew the exact steps! 😅

Pre-Req:

  • A TKC that has already been provisioned with an NVIDIA GPU using Dynamic Path IO

The NVIDIA GPU Operator is the easiest way to get the NVIDIA driver deployed for any k8s-based deployment where you will consume an NVIDIA GPU including a TKC using vSphere with Tanzu. I initially tried the NVIDIA device plugin for Kubernetes but that would require changes to the Ubuntu TKr images, which I was really hoping, was not needed. Below are the three easy steps to get the required NVIDIA drivers running on the TKC worker node!

Step 1 - You will need Helm installed on your local system and after authorizing into your TKC cluster, run the following command to deploy the NVIDA GPU Operator. In the example below, I am deploying to a k8s namespace called gpu-demo, which I had pre-created earlier.

helm install --wait --generate-name --set operator.defaultRuntime=containerd --namespace gpu-demo nvidia/gpu-operator


Step 2 - Ensure all that all pods from the NVIDIA GPU Operator is up and running by running the following command:

kubectl -n gpu-demo get pods


Step 3 - Finally, we can confirm that NVIDIA GPU resource is consumable within the TKC worker node by deploying simple demo app that performs some CUDA operations.

Create a YAML file (gpu-demo.yaml) that contains the following:

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
  namespace: gpu-demo
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vectoradd
      image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
      resources:
        limits:
          nvidia.com/gpu: 1

Then deploy the demo application by running the following:

kubectl -n gpu-demo apply -f gpu-demo.yaml

If everything was setup correctly, we should see the following in the gpu-demo container logs by running the following command:

kubectl -n gpu-demo logs cuda-vectoradd

More from my site

  • vSphere with Tanzu using Intel Arc GPU
  • Passthrough of Integrated GPU (iGPU) for Apple Mac Mini 2018
  • Passthrough of Integrated GPU (iGPU) for standard Intel NUC
  • GPU Passthrough of Radeon RX Vega M in Intel Hades Canyon
  • Auditing vGPU Profile Reconfigurations in vSphere

Categories // Kubernetes, VMware Tanzu, vSphere 7.0, vSphere 8.0 Tags // GPU, NVIDA, Passthrough, vSphere Kubernetes Service

Thanks for the comment!Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Search

Thank Author

Author

William is Distinguished Platform Engineering Architect in the VMware Cloud Foundation (VCF) Division at Broadcom. His primary focus is helping customers and partners build, run and operate a modern Private Cloud using the VMware Cloud Foundation (VCF) platform.

Connect

  • Bluesky
  • Email
  • GitHub
  • LinkedIn
  • Mastodon
  • Reddit
  • RSS
  • Twitter
  • Vimeo

Recent

  • Programmatically accessing the Broadcom Compatibility Guide (BCG) 05/06/2025
  • Quick Tip - Validating Broadcom Download Token  05/01/2025
  • Supported chipsets for the USB Network Native Driver for ESXi Fling 04/23/2025
  • vCenter Identity Federation with Authelia 04/16/2025
  • vCenter Server Identity Federation with Kanidm 04/10/2025

Advertisment

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Copyright WilliamLam.com © 2025

 

Loading Comments...