Intel Neural Processing Unit (NPU) with ESXi

Starting with Intel Meteor Lake (14th Generation) CPUs, now part of the new Intel Core Ultra Processor (Series 1) brand, an integrated Neural Processing Unit or NPU is built right into the SoC (system-on-chip) and is optimized for low power AI Inferencing.

I found this article from Chips and Cheese about the new Intel Meteor Lake NPU to be super insightful, definitely recommend a read if you are new to NPUs.

While you can already consume the Intel integrated graphics (iGPU) in platforms like the Intel NUC with ESXi for workload inferencing, I was curious on whether this new Intel NPU could actually be used by ESXi? 🤔

I recently got access to an ASUS NUC 14 Pro (which I will do a detailed review on later) includes the new Intel NPU. After successfully installing the latest release of VMware ESXi 8.0 Update 3, I saw that the Intel NPU accelerator is just a PCIe device, which we can then enable passthrough and hopefully consume that within a VM.

For testing, I am using Ubuntu 22.04 and the Intel NPU Acceleration library to verify that I am able to access the NPU.

Step 1 - Create an Ubuntu 22.04 VM and configure memory reservation (required for PCIe passthrough), you can then add the NPU device which will show up as Meteor Lake NPU.

Note: You will need to disable Secure Boot (enabled by default), since we need to install a newer Linux kernel which is still in development. Edit the VM and navigate to VM Options->Boot Options to disable.

Once Ubuntu is up and running, you will need to install the required Intel NPU Driver to access the NPU device, however the NPU will fail to initialize, which you can see by running:

dmesg | grep vpu

After filing a Github issue with the Intel NPU Driver, it was suggested that I might be able to get the device initialized by using a new kernel option that is only available in 6.11 and later.

Step 2 - Using this reference, we can install Linux 6.11 kernel by running the following commands:

cd /tmp

wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-headers-6.11.0-061100rc6_6.11.0-061100rc6.202409010834_all.deb
wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-headers-6.11.0-061100rc6-generic_6.11.0-061100rc6.202409010834_amd64.deb
wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-image-unsigned-6.11.0-061100rc6-generic_6.11.0-061100rc6.202409010834_amd64.deb
wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-modules-6.11.0-061100rc6-generic_6.11.0-061100rc6.202409010834_amd64.deb

dpkg -i *.deb

reboot

Once your Ubuntu system has rebooted, you can confirm that it is now running 6.11 by running the uname -r command.

Step 3 - We can now install the Intel NPU Driver for Linux and as of publishing this blog post, the latest version is 1.8.0, by running the following commands:

cd /tmp

wget https://github.com/intel/linux-npu-driver/releases/download/v1.8.0/intel-driver-compiler-npu_1.8.0.20240916-10885588273_ubuntu24.04_amd64.deb
wget https://github.com/intel/linux-npu-driver/releases/download/v1.8.0/intel-fw-npu_1.8.0.20240916-10885588273_ubuntu24.04_amd64.deb
wget https://github.com/intel/linux-npu-driver/releases/download/v1.8.0/intel-level-zero-npu_1.8.0.20240916-10885588273_ubuntu24.04_amd64.deb
wget https://github.com/oneapi-src/level-zero/releases/download/v1.17.6/level-zero_1.17.6+u22.04_amd64.deb

apt --fix-broken install -y
apt install build-essential libtbb12 cmake -y

dpkg -i *.deb

We also need to create the following file which will enable the required kernel option (force_snoop=1) to initialize the NPU by default, by running the following command:

cat > /etc/modprobe.d/intel_vpu.conf << EOF
options intel_vpu force_snoop=1
EOF

Now reboot the system and the NPU should now be initialized successfully as shown in the screenshot below.

Lastly, if you want to confirm the NPU is fully functional, there are several samples in the Intel NPU Acceleration library including several Small Language Model (SLM) examples like TinyLlama, Phi-2, Phi-3, T5, etc.

The following can be used to setup your Python environment by using conda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
eval "$(/$HOME/miniconda3/bin/conda shell.bash hook)"

conda config --set auto_activate_base true
conda init
conda create -n npu python=3.10 -y
conda activate npu
conda install -c conda-forge libstdcxx-ng=12 -y

pip install accelerate intel-npu-acceleration-library==1.3.0 transformers==4.39.3

git clone https://github.com/intel/intel-npu-acceleration-library.git
cd intel-npu-acceleration-library
git checkout v1.3.0

I tried the tiny_llama_chat.py sample, but I guess the training data for this model may have been on images or artists 🧑‍🎨 ...

Whether you are using the new Intel NPU Acceleration library or the OpenVino framework, you now have access to another accelerator using ESXi, which can benefit Edge deployments, especially for workloads that require AI inferencing and now at lower power utilization.

UPDATE (09/17/24) - The following python sample can be used to verify the NPU device is visible from runtime frameworks such as OpenVino.

from openvino.runtime import Core

def list_available_devices():
    # Initialize the OpenVINO runtime core
    core = Core()

    # Get the list of available devices
    devices = core.available_devices

    if not devices:
        print("No devices found.")
    else:
        print("Available devices:")
        for device in devices:
            print(f"- {device}")

        # Optional: Print additional device information
        for device in devices:
            device_info = core.get_property(device, "FULL_DEVICE_NAME")
            print(f"\nDevice: {device}\nFull Device Name: {device_info}")

if __name__ == "__main__":
    list_available_devices()

Comments

notesman says

09/18/2024 at 9:56 pm

Thank you for your always interesting articles. EVC mode is a concern for consumer CPUs.
I know that cluster EVC mode is limiting on consumer CPUs, but how far is cluster EVC supported on Intel Meteor Lake CPUs?

Os says

11/13/2024 at 2:13 am

Whether the NPU computing power of multiple machines can be shared in this way？

- William Lam says
  
  11/13/2024 at 6:07 am
  
  No, unless you've built that into your application where you can access the NPU across different systems

More from my site

Comments

Thanks for the comment!Cancel reply