Several weeks back, I came across a really strange post on the VMTN communities asking how to change the Device ID (DID) and Vendor ID (VID) for a USB Device that has been passthrough to a VM from ESXi? The device in question is the Google Coral USB Edge TPU (Tensor Processing Unit) Accelerator, which is a relatively in-expensive device that can help accelerate machine learning (ML) inferencing. With all the buzz these days with Generative AI and ChatGPT, I can only imagine its popularity has grown even further but I did not realize how popular this device has been in the community, especially for those wanting to use it with ESXi.
The initial observation reported by this user and also by many others in the Coral community was that ESXi was showing the incorrect VID/DID for the Coral USB device and because of this, it was not working correctly when passthrough'ed to a VM and they were looking for a way to change the DID/VID value from 1a6e:089a (Global Unichip Corp.) to 18d1:9302 (Google Inc.).
Interestingly enough, a couple of weeks ago, my buddy Alan Renouf had also shared that he recently purchased the Coral USB device, so I figured I would check with him first to see if he was observing the same behavior that was being reported, which he was. I had been going through the Github reports to try better understand the issue and some of the previous workarounds that users had done including disabling the vmkusb module, which I definitely not recommended, especially for more recent releases of ESXi where that will simply disable all USB functionality to your ESXi host.
I still could not wrap my head around the issue as the reports did not make any sense in terms of the DID/VID not being claimed correctly or that it needed to change to properly function. This also did not make sense when speaking with our USB expert (Songtao who also developed our USB Network Native Driver for ESXi), so I decided to bite the bullet and purchase the Coral USB device, which apparently is difficult to obtain unless you overpay on Amazon, which I did.
After some exhaustive testing and debugging with Songtao, I think I finally understood what was actually going on and many of the assumptions that had been floating around were simply incorrect or had missing information. Putting ESXi aside for a second, the Coral USB device is actually a USB composite device and this will make more sense later. If you plugin the Coral USB device to any system, it is expected to have the DID/VID value of 1a6e:089a Global Unichip Corp. and this is the correct and expected behavior.
Before you can use the Coral USB device, firmware is actually flashed onto the device which is indirectly performed when running one of the Coral examples and this is actually what changes the DID/VID value of 18d1:9302 Google Inc. In fact, using another Coral project from Google called webcoral, you can manually perform the firmware updated as shared in this blog post.
So while the Coral USB device must be flashed with the correct firmware to function correctly, ESXi was correctly seeing the initial DID/VID and this is also true for any other operating system when you first plugin the Coral USB device. With this information, we ran some additional experiments where the following error was observed from the VM when it initially attempts to communicate with the USB Coral Device:
Failed to load delegate from libedgetpu.so.1
While on the surface it may look like a failed attempt, but actually happened was that it was able to successfully flash the firmware onto the USB Coral device but ESXi was not aware of this change or expecting that the device would change. This was further validated by additional testing using VMware Fusion, to first understand the expected behavior of the device before proceeding to finding a solution for ESXi. Once we understood what was needed, I was able to debug further with Songtao and we came up with a pretty simple solution that would make ESXi aware of the updated Coral USB device and then the VM was able to use the passthrough device without any issues.
As with any technical issue, it is extremely important to actually understand what is happening, especially if you are looking to find or ask for a solution. Initial observations can also be miss-leading and add additional confusion when reporting issues.
I have personally not worked with any USB device that has ever behaved this way, so I can not say if this is common or not, but I do think the device could have been simplified in its design. Perhaps this was a design consideration to ensure the device was always running the latest firmware, but it definitely is one of the more stranger types of USB devices that I had ever come across and Songtao also agreed.
Note: Google also has a Coral PCIe Edge TPU Accelerator that many folks have also reported issues with ESXi, but it turns out this device does NOT actually conform to the PCIe standard and violates the PCIe specification shared by one of our Principle Engineers at VMware and therefore can not be used for passthrough with ESXi. If anyone from the Google Coral team is reading this, there is a recommendation in link above on how to remediate this problem if you are interested in enabling this for your users requesting support for ESXi.
Below are the step by step instruction for getting the Coral USB device to function in passthrough mode with a VM using recent ESXi 7.x and 8.x releases.
Step 0 - I will assume you have already setup a VM to run the Coral software. If not, install a supported operating system for use with Coral. For my setup, I am using an Ubuntu 20.04 and make sure you have USB 3.1 controller configured when adding the Coral USB device. If the VM is powered on, go ahead and shut it down as we need to add one additional configuration change to the VM.
Step 2 - Edit the VM Advanced Setting and add the following setting:
usb.quirks.device0 = 0x18d1:0x9302 skip-reset, skip-refresh, skip-setconfig
Note: The VM must be powered off before you add these setting above for it to take affect.
Step 3 - Power on the VM and then run through the initial Coral setup instructions which will initialize the Coral USB device and update it with the required firmware. It is expected that you will see the Failed to load delegate from libedgetpu.so.1 error message, but the underlying Coral USB device has already been successfully flashed.
Step 4 - Login to ESXi Shell to confirm that Coral USB device is still showing the default value of Global Unichip Corp. by running the lsusb command as shown in the screenshot below.
Step 5 - Next, we need to make ESXi aware of the updated Coral USB device and there are two options in achieving this:
- Reboot ESXi - As long as you do NOT unplug the Coral USB device from the physical ESXi host as it has already been successfully initialized, then this will be the quickest method. Ensure that your system BIOS has USB S4/S5 Power setting enabled for this to properly function or the Coral USB device will lose power and will revert back to the default uninitialized state
- Reload USB Module - If you prefer not to reboot, we can make ESXi aware of the updated Coral USB device by reloading the USB module
To reload the USB module, login to the ESXi Shell and run the following commands:
/etc/init.d/usbarbitrator stop vmkload_mod -u vmkusb;vmkload_mod vmkusb kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}') /etc/init.d/usbarbitrator start
Note: If you are using or have the USB Network Native Driver for ESXi installed, then use the following commands instead to unload the USB module:
/etc/init.d/usbarbitrator stop vmkload_mod -u vmkusb_nic_fling;vmkload_mod vmkusb_nic_fling kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}') /etc/init.d/usbarbitrator start
Step 6 - We can now confirm the updated Coral USB device is now showing the expected value of Google Inc. by re-running the lsusb command. It may take a second or two from the previous step, but you should now see the updated DID/VID for the Coral USB device as shown in the screenshot below.
Step 7 - Finally, we can confirm that our VM can also see the updated Coral USB device by running the lsusb command within the OS. If we now re-run the Coral setup example, we can now see that the operation has successfully completed and can properly communicate with the Coral USB device! 😎
Once the Coral USB device has been successfully initialized, it will be persisted across both VM and ESXi host reboots. The Coral USB device will only return back to its default state when it is physically unplugged from the ESXi host and you just need to re-run Step 3 & 5 again.
With the Coral USB device now fully functional on ESXi, I am definitely interested in hearing how our users will be leveraging this device whether that is with the popular Frigate NVR application or for other ML inferencing solutions.
UPDATE (05/11/23) - The Coral USB device also functions with ESXi-Arm, I was able to confirm by remotely mounting the Coral USB device from my local macOS system using VMware Fusion with a remote ESXi-Arm host, which was not a Raspberry Pi system.
I have both ARM & Intel ESXi at home. Running a google coral directly on ARM based hardware for Frigate NVR. Was hoping to port things over to the Radxa 5b NPU & drop the google coral.
While I am at it, any word on ESXi running on the Radxa 5b in the future?
William, followed your guide. No success. I get to "step 5" and after "reboot" the lsusb command does not even give the usb device listed at all... It is also removed from the VM settings (passthrough). only way to "get it listed back" is to unplug and replug the usb device... as Bus 002 Device 005: ID 1a6e:089a Global Unichip Corp.
Only the "Reload USB Module" seems to work.
What would be the criteria to persist across reboots? Must there stay power on the coral? In my case it does not work when I reboot the host (NUC12).
Sorry, I cannot edit a reply. Also when the coral "renames" to google it is removed form the VM passthrough settings. However when I shutdown the VM, look at the settings it is still passed through as "google product 0x9302", then I start the VM but no response on lsusb. Look at settings... empty... then add it back, start VM an then it's there.
How do you think I can really persist this?
So far so good, thanks!
Check your NUC BIOS for any energy savings, I didn't do anything special on my NUC 13 but I suspect power is still reaching the Coral device and hence its init is persisted. I don't know what you mean by no response on lsusb ... Depending on your order of operations, you may need to wait a second or so when the device is passed through. Double check from both ESXi side that its showing right id via lsusb and then confirm its indeed passthrough to guest, it may take a second to refresh (I do explicit UI refresh to make sure all state is correct) and then run through guest. Make sure you added the VM Advanced Setting or else you'll hit all sorts of issues
I have the same problem on my Nuc 12. After rebooting esxi, coral is gone in the "edit setting/USB device" options. I will check the energy savings in the bios.
Should everything power saving related be turned off?
You'll have to play with the settings and see if they help, I've only confirmed this on my NUC 13 which should mirror NUC 12/11. If I get some additional time, I'll give it a go on my NUC 12 Pro but I didn't touch any of the default settings. I will say, I have recently updated my NUC 13 to run latest BIOS and same goes for few other NUC, so something to also check off as I know it does bring new functionality such as the ability to disable E/P Cores (which wasn't there for some of the earlier models)
Many USB devices have different uids/pids when they are plugged in and when they are in normal use. Common ones include hp printers, wireless network cards, and keys for online banking. It is important that your experiments help us to use the above equipment correctly.
I was able to the Coral working in ESXi 8.0. However, it didn't go exactly as you mentioned for me. I went through all the steps, but it didn't work. I reboot the ESXi and tried again. This time the firmware update took like a second. The first time it was like 10 seconds. Next, I was able to see it in ESXi (lsusb), but not in guest machine. I had to remove the adapter from the guest machine and add again as a "Google device". After that, it started to work properly.
Thank you very much for you efforts here, William!
Its mentioned in Coral documentation that 1st inference can take longer (I assume also as its first time flashing the firmware), so as part of my testing, I typically wait a second or so before jumping into VM. Since I've done this a few times, its possible the firmware flash is quicker (hard to say as its black box). As I mentioned in the blog post, it may take a second for Guest to see the updated device, so again being a little patient helps and there's no need to re-add the device, it'll auto-refresh but explicitly removing/adding also works and its a workflow I've confirmed but for simplicity, I just wait for it to refresh
I've tried now a couple of times this workflow and I'm not able to see the "Google product" in lsusb in the guest machine (Ubuntu 22.04) after initializing with the Python script and running ESXI usb commands. ESXI can see it properly. Rebooting the guest or waiting doesn't help. If I take a look into the ESXI, the Coral device has disappeared from the guest machine settings. After I shutdown the guest, the Coral device can be seeing in the settings as a "Google product". If I re-add the "Google product" usb device to the guest machine, it can be seen in lsusb output of the guest machine and it works properly.
Not sure if this is Ubuntu 22.04 related or what.
Running it on: NUC12WSHi5 / ESXi-8.0b-21203435-standard (VMware, Inc.)
William, this post seems like a lot of steps to ultimately pass-through a USB port to a VM. I am not very familiar with ESXi hence my question.
I run Blue Iris and Home Assistant as virtual machines on Proxmox and am able to pass-through a USB port that each Google Coral is plugged into. This way, the virtual machine has full control over the Google Coral and can flash the firmware and still be able to access the Google Coral once the device id changes. Is there not an equivalent option in ESXi?
Kris,
Not sure what you mean by "a lot of steps" ... the actual configuration change to passthrough is single step 🙂 The remainder steps outlined in the blog post is to make ESXi aware of this unique behavior of the Coral USB device where the DID/VID will change upon initial connection, this doesn't happen automatically. ESXi has been designed to run in Enterprise datacenters and while USB accessories are prevalent in the consumer space, its still not widely used in Datacenter and certainly ones where this type of behavior is seen normally and hence the extra steps. If you're able to maintain power to USB device, a simple reboot after the initial connection will yield same results which is what I've outlined above.
So I tried a bunch of things on my NUC12WSHi5 and I just can't get it to work. I updated the bios, tried restarting esxi and also the reset. I added the quirk in the advanced options for the vm and also a usb 3.1 controller with the coral as a usb device.
If I restart esxi it can't find the coral anymore unless I unplug/plug it back in.
If I do the reset it doesn't change to "google" but stay at "global unichip corp"
I also tried to create a new ubuntu install and followed the "install the edge TPU runtime" guide and at the
"Run the image classifier with the bird photo (shown in figure 1):" step it errors out with:
"Traceback (most recent call last):
File "/home/frigate/pycoral/examples/classify_image.py", line 37, in
from pycoral.adapters import classify
ModuleNotFoundError: No module named 'pycoral.adapters'"
I feel like I have tried everything at this point 🙂
Just tried this on my NUC12WSKi7 using ESXi 8.0 Update 1 and it works, again the NUC is a stock unit.
My workflow is to confirm that I can talk to Coral device by reloading the module and confirm the device shows up properly (e.g. Google Inc.) on ESXi and then confirming on Ubuntu VM. Once I'm able to connect, then I'm able to reboot ESXi and Coral device retains the initialized state
Given that you were NOT able to actually run the example tells me you were NOT able to successfully flash the firmware on the device and hence you did NOT actually follow the directions to get the same outcome as described on the blog.
You should probably double check that you followed all instructions on Coral documentation (hint: See 2a and ensure you've installed the Coral python module). I'm also using Ubuntu 20.04, as there's a python dependency where newer versions may not work
Finally, I've had more than a few folks comment that the steps above work as described, so if there's issues its either user error and/or kit that you're using.
Yes, there's something going on with the python dependencies. Not sure how to deal with those. I won't bother you further as it is my fault.
I finally got it to work.
Awesome job William! I enjoyed reading this.
I have the m.2 variant and hope that the Google Coral team can correct the pci behaviour or find a way around the problem.
Again great work, looking forward to more articles from you.
Thank you very much! Now it's working also on my ESX 😀 the CPU usage dropped by 60%
# Host
Intel NUC10i7FNH
ESXi 7.0 U3
USB Coral
# VM
Debian 10
My pleasure and thanks for sharing your success story and setup with everyone else! 💪
Hi William, do you or your colleague have further succes in running Frigate further? I can't get that going...
https://github.com/blakeblackshear/frigate/issues/6472#issuecomment-1549641090
See my response in that thread ... seems like you or others may not have properly setup all he requirements for Frigate and this has nothing to do with Coral device ...
Thanks for this, Champ!
Struggled to get it to work at first (Ubuntu 22.04 on my Frigate VM) but created another VM with ubuntu 20.04 with an older version of python3. Used that VM to flash the coral and then moved the usb passthrough back to the 22.04 machine. That solved it for me atleast.
William, thank you for this post, very very helpfull !
In my case when "Reboot ESXi" in step 5 the device is not visible, but "Reload USB Module" work fine.
When host is power off, the Coral TPU lost config, so I enable power USB permanently to solve this problem.
I share my configuration and process to make it work
Hardware : INTEL NUC RNUC12WSHI5
Bios Version : WSADL357.0087.2023.0306.1931
Bios config :
- Startup on power detect (tab Power -> secondary power settings -> enable "USB S4/S5 Power")
- E-core disabled, P-core enabled
ESXi : 8.0U1-21495797-standard
Coral USB Edge TPU (connected on USB 3 port of NUC)
Debian11 with docker container with Frigate
## Step 1 -> On VM configure Coral TPU
mkdir coral && cd coral
apt install build-essential autoconf
git clone https://github.com/google-coral/pycoral.git
cd pycoral
bash examples/install_requirements.sh classify_image.py
## Step 2 -> ON VM Init Edge TPU (after each power loss on Coral TPU)
cd coral/pycoral
python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg
shutdown vm
## Step 3 -> ON ESX shell to detect Google inc. (reboot ESXi doesn't work, so I Reload USB Module)
/etc/init.d/usbarbitrator stop
vmkload_mod -u vmkusb;vmkload_mod vmkusb
kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}')
/etc/init.d/usbarbitrator start
## Step 4 -> reconfigure VM
Remove USB coral passthrough from VM
Save
wait 5 secs
Add USB coral passthrough to VM
wait 5 secs
start VM
Now lsusb on VM display "Google inc."
BIOS CONFIGUATION TESTS
1 - With bios option "USB S4/S5 Power" diabled
You can't reboot without restart config at step 2
You can't shutdown without restart config at step 2
2 - With bios option "USB S4/S5 Power" diabled
You can't reboot
You can't shutdown (without remove power)