Starting with Intel Meteor Lake (14th Generation) CPUs, now part of the new Intel Core Ultra Processor (Series 1) brand, an integrated Neural Processing Unit or NPU is built right into the SoC (system-on-chip) and is optimized for low power AI Inferencing.
I found this article from Chips and Cheese about the new Intel Meteor Lake NPU to be super insightful, definitely recommend a read if you are new to NPUs.
While you can already consume the Intel integrated graphics (iGPU) in platforms like the Intel NUC with ESXi for workload inferencing, I was curious on whether this new Intel NPU could actually be used by ESXi? 🤔
I recently got access to an ASUS NUC 14 Pro (which I will do a detailed review on later) includes the new Intel NPU. After successfully installing the latest release of VMware ESXi 8.0 Update 3, I saw that the Intel NPU accelerator is just a PCIe device, which we can then enable passthrough and hopefully consume that within a VM.
For testing, I am using Ubuntu 22.04 and the Intel NPU Acceleration library to verify that I am able to access the NPU.
Step 1 - Create an Ubuntu 22.04 VM and configure memory reservation (required for PCIe passthrough), you can then add the NPU device which will show up as Meteor Lake NPU.
Note: You will need to disable Secure Boot (enabled by default), since we need to install a newer Linux kernel which is still in development. Edit the VM and navigate to VM Options->Boot Options to disable.
Once Ubuntu is up and running, you will need to install the required Intel NPU Driver to access the NPU device, however the NPU will fail to initialize, which you can see by running:
dmesg | grep vpu
After filing a Github issue with the Intel NPU Driver, it was suggested that I might be able to get the device initialized by using a new kernel option that is only available in 6.11 and later.
Step 2 - Using this reference, we can install Linux 6.11 kernel by running the following commands:
cd /tmp wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-headers-6.11.0-061100rc6_6.11.0-061100rc6.202409010834_all.deb wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-headers-6.11.0-061100rc6-generic_6.11.0-061100rc6.202409010834_amd64.deb wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-image-unsigned-6.11.0-061100rc6-generic_6.11.0-061100rc6.202409010834_amd64.deb wget -c https://kernel.ubuntu.com/mainline/v6.11-rc6/amd64/linux-modules-6.11.0-061100rc6-generic_6.11.0-061100rc6.202409010834_amd64.deb dpkg -i *.deb reboot
Once your Ubuntu system has rebooted, you can confirm that it is now running 6.11 by running the uname -r command.
Step 3 - We can now install the Intel NPU Driver for Linux and as of publishing this blog post, the latest version is 1.8.0, by running the following commands:
cd /tmp wget https://github.com/intel/linux-npu-driver/releases/download/v1.8.0/intel-driver-compiler-npu_1.8.0.20240916-10885588273_ubuntu24.04_amd64.deb wget https://github.com/intel/linux-npu-driver/releases/download/v1.8.0/intel-fw-npu_1.8.0.20240916-10885588273_ubuntu24.04_amd64.deb wget https://github.com/intel/linux-npu-driver/releases/download/v1.8.0/intel-level-zero-npu_1.8.0.20240916-10885588273_ubuntu24.04_amd64.deb wget https://github.com/oneapi-src/level-zero/releases/download/v1.17.6/level-zero_1.17.6+u22.04_amd64.deb apt --fix-broken install -y apt install build-essential libtbb12 cmake -y dpkg -i *.deb
We also need to create the following file which will enable the required kernel option (force_snoop=1) to initialize the NPU by default, by running the following command:
cat > /etc/modprobe.d/intel_vpu.conf << EOF options intel_vpu force_snoop=1 EOF
Now reboot the system and the NPU should now be initialized successfully as shown in the screenshot below.
Lastly, if you want to confirm the NPU is fully functional, there are several samples in the Intel NPU Acceleration library including several Small Language Model (SLM) examples like TinyLlama, Phi-2, Phi-3, T5, etc.
The following can be used to setup your Python environment by using conda:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda eval "$(/$HOME/miniconda3/bin/conda shell.bash hook)" conda config --set auto_activate_base true conda init conda create -n npu python=3.10 -y conda activate npu conda install -c conda-forge libstdcxx-ng=12 -y pip install accelerate intel-npu-acceleration-library==1.3.0 transformers==4.39.3 git clone https://github.com/intel/intel-npu-acceleration-library.git cd intel-npu-acceleration-library git checkout v1.3.0
I tried the tiny_llama_chat.py sample, but I guess the training data for this model may have been on images or artists 🧑🎨 ...
Whether you are using the new Intel NPU Acceleration library or the OpenVino framework, you now have access to another accelerator using ESXi, which can benefit Edge deployments, especially for workloads that require AI inferencing and now at lower power utilization.
UPDATE (09/17/24) - The following python sample can be used to verify the NPU device is visible from runtime frameworks such as OpenVino.
from openvino.runtime import Core def list_available_devices(): # Initialize the OpenVINO runtime core core = Core() # Get the list of available devices devices = core.available_devices if not devices: print("No devices found.") else: print("Available devices:") for device in devices: print(f"- {device}") # Optional: Print additional device information for device in devices: device_info = core.get_property(device, "FULL_DEVICE_NAME") print(f"\nDevice: {device}\nFull Device Name: {device_info}") if __name__ == "__main__": list_available_devices()
notesman says
Thank you for your always interesting articles. EVC mode is a concern for consumer CPUs.
I know that cluster EVC mode is limiting on consumer CPUs, but how far is cluster EVC supported on Intel Meteor Lake CPUs?