Several weeks back, I came across a really strange post on the VMTN communities asking how to change the Device ID (DID) and Vendor ID (VID) for a USB Device that has been passthrough to a VM from ESXi? The device in question is the Google Coral USB Edge TPU (Tensor Processing Unit) Accelerator, which is a relatively in-expensive device that can help accelerate machine learning (ML) inferencing. With all the buzz these days with Generative AI and ChatGPT, I can only imagine its popularity has grown even further but I did not realize how popular this device has been in the community, especially for those wanting to use it with ESXi.
The initial observation reported by this user and also by many others in the Coral community was that ESXi was showing the incorrect VID/DID for the Coral USB device and because of this, it was not working correctly when passthrough'ed to a VM and they were looking for a way to change the DID/VID value from 1a6e:089a (Global Unichip Corp.) to 18d1:9302 (Google Inc.).
Interestingly enough, a couple of weeks ago, my buddy Alan Renouf had also shared that he recently purchased the Coral USB device, so I figured I would check with him first to see if he was observing the same behavior that was being reported, which he was. I had been going through the Github reports to try better understand the issue and some of the previous workarounds that users had done including disabling the vmkusb module, which I definitely not recommended, especially for more recent releases of ESXi where that will simply disable all USB functionality to your ESXi host.
I still could not wrap my head around the issue as the reports did not make any sense in terms of the DID/VID not being claimed correctly or that it needed to change to properly function. This also did not make sense when speaking with our USB expert (Songtao who also developed our USB Network Native Driver for ESXi), so I decided to bite the bullet and purchase the Coral USB device, which apparently is difficult to obtain unless you overpay on Amazon, which I did.
After some exhaustive testing and debugging with Songtao, I think I finally understood what was actually going on and many of the assumptions that had been floating around were simply incorrect or had missing information. Putting ESXi aside for a second, the Coral USB device is actually a USB composite device and this will make more sense later. If you plugin the Coral USB device to any system, it is expected to have the DID/VID value of 1a6e:089a Global Unichip Corp. and this is the correct and expected behavior.
Before you can use the Coral USB device, firmware is actually flashed onto the device which is indirectly performed when running one of the Coral examples and this is actually what changes the DID/VID value of 18d1:9302 Google Inc. In fact, using another Coral project from Google called webcoral, you can manually perform the firmware updated as shared in this blog post.
So while the Coral USB device must be flashed with the correct firmware to function correctly, ESXi was correctly seeing the initial DID/VID and this is also true for any other operating system when you first plugin the Coral USB device. With this information, we ran some additional experiments where the following error was observed from the VM when it initially attempts to communicate with the USB Coral Device:
Failed to load delegate from libedgetpu.so.1
While on the surface it may look like a failed attempt, but actually happened was that it was able to successfully flash the firmware onto the USB Coral device but ESXi was not aware of this change or expecting that the device would change. This was further validated by additional testing using VMware Fusion, to first understand the expected behavior of the device before proceeding to finding a solution for ESXi. Once we understood what was needed, I was able to debug further with Songtao and we came up with a pretty simple solution that would make ESXi aware of the updated Coral USB device and then the VM was able to use the passthrough device without any issues.
As with any technical issue, it is extremely important to actually understand what is happening, especially if you are looking to find or ask for a solution. Initial observations can also be miss-leading and add additional confusion when reporting issues.
I have personally not worked with any USB device that has ever behaved this way, so I can not say if this is common or not, but I do think the device could have been simplified in its design. Perhaps this was a design consideration to ensure the device was always running the latest firmware, but it definitely is one of the more stranger types of USB devices that I had ever come across and Songtao also agreed.
Note: Google also has a Coral PCIe Edge TPU Accelerator that many folks have also reported issues with ESXi, but it turns out this device does NOT actually conform to the PCIe standard and violates the PCIe specification shared by one of our Principle Engineers at VMware and therefore can not be used for passthrough with ESXi. If anyone from the Google Coral team is reading this, there is a recommendation in link above on how to remediate this problem if you are interested in enabling this for your users requesting support for ESXi.
Below are the step by step instruction for getting the Coral USB device to function in passthrough mode with a VM using recent ESXi 7.x and 8.x releases.
Step 0 - I will assume you have already setup a VM to run the Coral software. If not, install a supported operating system for use with Coral. For my setup, I am using an Ubuntu 20.04 and make sure you have USB 3.1 controller configured when adding the Coral USB device. If the VM is powered on, go ahead and shut it down as we need to add one additional configuration change to the VM.
Step 2 - Edit the VM Advanced Setting and add the following setting:
usb.quirks.device0 = 0x18d1:0x9302 skip-reset, skip-refresh, skip-setconfig
Note: The VM must be powered off before you add these setting above for it to take affect.
Step 3 - Power on the VM and then run through the initial Coral setup instructions which will initialize the Coral USB device and update it with the required firmware. It is expected that you will see the Failed to load delegate from libedgetpu.so.1 error message, but the underlying Coral USB device has already been successfully flashed.
Step 4 - Login to ESXi Shell to confirm that Coral USB device is still showing the default value of Global Unichip Corp. by running the lsusb command as shown in the screenshot below.
Step 5 - Next, we need to make ESXi aware of the updated Coral USB device and there are two options in achieving this:
- Reboot ESXi - As long as you do NOT unplug the Coral USB device from the physical ESXi host as it has already been successfully initialized, then this will be the quickest method. Ensure that your system BIOS has USB S4/S5 Power setting enabled for this to properly function or the Coral USB device will lose power and will revert back to the default uninitialized state
- Reload USB Module - If you prefer not to reboot, we can make ESXi aware of the updated Coral USB device by reloading the USB module
To reload the USB module, login to the ESXi Shell and run the following commands:
/etc/init.d/usbarbitrator stop vmkload_mod -u vmkusb;vmkload_mod vmkusb kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}') /etc/init.d/usbarbitrator start
Note: If you are using or have the USB Network Native Driver for ESXi installed, then use the following commands instead to unload the USB module:
/etc/init.d/usbarbitrator stop vmkload_mod -u vmkusb_nic_fling;vmkload_mod vmkusb_nic_fling kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}') /etc/init.d/usbarbitrator start
Step 6 - We can now confirm the updated Coral USB device is now showing the expected value of Google Inc. by re-running the lsusb command. It may take a second or two from the previous step, but you should now see the updated DID/VID for the Coral USB device as shown in the screenshot below.
Step 7 - Finally, we can confirm that our VM can also see the updated Coral USB device by running the lsusb command within the OS. If we now re-run the Coral setup example, we can now see that the operation has successfully completed and can properly communicate with the Coral USB device! 😎
Once the Coral USB device has been successfully initialized, it will be persisted across both VM and ESXi host reboots. The Coral USB device will only return back to its default state when it is physically unplugged from the ESXi host and you just need to re-run Step 3 & 5 again.
With the Coral USB device now fully functional on ESXi, I am definitely interested in hearing how our users will be leveraging this device whether that is with the popular Frigate NVR application or for other ML inferencing solutions.
UPDATE (05/11/23) - The Coral USB device also functions with ESXi-Arm, I was able to confirm by remotely mounting the Coral USB device from my local macOS system using VMware Fusion with a remote ESXi-Arm host, which was not a Raspberry Pi system.
Richard Hughes says
I have both ARM & Intel ESXi at home. Running a google coral directly on ARM based hardware for Frigate NVR. Was hoping to port things over to the Radxa 5b NPU & drop the google coral.
While I am at it, any word on ESXi running on the Radxa 5b in the future?
Sender says
William, followed your guide. No success. I get to "step 5" and after "reboot" the lsusb command does not even give the usb device listed at all... It is also removed from the VM settings (passthrough). only way to "get it listed back" is to unplug and replug the usb device... as Bus 002 Device 005: ID 1a6e:089a Global Unichip Corp.
Sender says
Only the "Reload USB Module" seems to work.
What would be the criteria to persist across reboots? Must there stay power on the coral? In my case it does not work when I reboot the host (NUC12).
Sender says
Sorry, I cannot edit a reply. Also when the coral "renames" to google it is removed form the VM passthrough settings. However when I shutdown the VM, look at the settings it is still passed through as "google product 0x9302", then I start the VM but no response on lsusb. Look at settings... empty... then add it back, start VM an then it's there.
How do you think I can really persist this?
So far so good, thanks!
William Lam says
Check your NUC BIOS for any energy savings, I didn't do anything special on my NUC 13 but I suspect power is still reaching the Coral device and hence its init is persisted. I don't know what you mean by no response on lsusb ... Depending on your order of operations, you may need to wait a second or so when the device is passed through. Double check from both ESXi side that its showing right id via lsusb and then confirm its indeed passthrough to guest, it may take a second to refresh (I do explicit UI refresh to make sure all state is correct) and then run through guest. Make sure you added the VM Advanced Setting or else you'll hit all sorts of issues
sp1910 says
I have the same problem on my Nuc 12. After rebooting esxi, coral is gone in the "edit setting/USB device" options. I will check the energy savings in the bios.
Should everything power saving related be turned off?
William Lam says
You'll have to play with the settings and see if they help, I've only confirmed this on my NUC 13 which should mirror NUC 12/11. If I get some additional time, I'll give it a go on my NUC 12 Pro but I didn't touch any of the default settings. I will say, I have recently updated my NUC 13 to run latest BIOS and same goes for few other NUC, so something to also check off as I know it does bring new functionality such as the ability to disable E/P Cores (which wasn't there for some of the earlier models)
Chi Zheng says
Many USB devices have different uids/pids when they are plugged in and when they are in normal use. Common ones include hp printers, wireless network cards, and keys for online banking. It is important that your experiments help us to use the above equipment correctly.
Daniel says
I was able to the Coral working in ESXi 8.0. However, it didn't go exactly as you mentioned for me. I went through all the steps, but it didn't work. I reboot the ESXi and tried again. This time the firmware update took like a second. The first time it was like 10 seconds. Next, I was able to see it in ESXi (lsusb), but not in guest machine. I had to remove the adapter from the guest machine and add again as a "Google device". After that, it started to work properly.
Thank you very much for you efforts here, William!
William Lam says
Its mentioned in Coral documentation that 1st inference can take longer (I assume also as its first time flashing the firmware), so as part of my testing, I typically wait a second or so before jumping into VM. Since I've done this a few times, its possible the firmware flash is quicker (hard to say as its black box). As I mentioned in the blog post, it may take a second for Guest to see the updated device, so again being a little patient helps and there's no need to re-add the device, it'll auto-refresh but explicitly removing/adding also works and its a workflow I've confirmed but for simplicity, I just wait for it to refresh
Daniel says
I've tried now a couple of times this workflow and I'm not able to see the "Google product" in lsusb in the guest machine (Ubuntu 22.04) after initializing with the Python script and running ESXI usb commands. ESXI can see it properly. Rebooting the guest or waiting doesn't help. If I take a look into the ESXI, the Coral device has disappeared from the guest machine settings. After I shutdown the guest, the Coral device can be seeing in the settings as a "Google product". If I re-add the "Google product" usb device to the guest machine, it can be seen in lsusb output of the guest machine and it works properly.
Not sure if this is Ubuntu 22.04 related or what.
Running it on: NUC12WSHi5 / ESXi-8.0b-21203435-standard (VMware, Inc.)
Jos says
I also do have a Intel NUC SWNUC12WSKi5000 with a Google Coral USB.
- On the NUC I run ESXi 8 U2
- On ESXi I run a Debian 12 VM server
- On Debian I run with docker
- On Docker I run Frigate
But for some reason I can't get it stable. Frigate found the Google Coral, everything was working, the day after the Google Coral was still visible inside the VM, but I Frigate couldn't use it anymore. After restarting the VM the Google Device was lost. After rebooting ESXi the device was back to the old state.
I'm doubting what I can do to make a stable environment, because I would like to be able to trust Frigate to monitor my cameras.
I looked at:
- Buying a production version for M2, but I already used the SSD M2 slot for my harddisk inside the intel Nuc.
- Buying a powered external Thunerbold dock, which I hopefully can connect to the VM without interference of ESXi. But not sure if this works and still I have to add a UPS to make sure the power will never be gone from the USB device
- Reinstalling the Intel NUC with Proxmox or just Debian with Docker. But not sure if the Google USB coral then not have the unstability issues.
So what would you guys recomend?
Kris Crawford says
William, this post seems like a lot of steps to ultimately pass-through a USB port to a VM. I am not very familiar with ESXi hence my question.
I run Blue Iris and Home Assistant as virtual machines on Proxmox and am able to pass-through a USB port that each Google Coral is plugged into. This way, the virtual machine has full control over the Google Coral and can flash the firmware and still be able to access the Google Coral once the device id changes. Is there not an equivalent option in ESXi?
lamw says
Kris,
Not sure what you mean by "a lot of steps" ... the actual configuration change to passthrough is single step 🙂 The remainder steps outlined in the blog post is to make ESXi aware of this unique behavior of the Coral USB device where the DID/VID will change upon initial connection, this doesn't happen automatically. ESXi has been designed to run in Enterprise datacenters and while USB accessories are prevalent in the consumer space, its still not widely used in Datacenter and certainly ones where this type of behavior is seen normally and hence the extra steps. If you're able to maintain power to USB device, a simple reboot after the initial connection will yield same results which is what I've outlined above.
sp1910 says
So I tried a bunch of things on my NUC12WSHi5 and I just can't get it to work. I updated the bios, tried restarting esxi and also the reset. I added the quirk in the advanced options for the vm and also a usb 3.1 controller with the coral as a usb device.
If I restart esxi it can't find the coral anymore unless I unplug/plug it back in.
If I do the reset it doesn't change to "google" but stay at "global unichip corp"
I also tried to create a new ubuntu install and followed the "install the edge TPU runtime" guide and at the
"Run the image classifier with the bird photo (shown in figure 1):" step it errors out with:
"Traceback (most recent call last):
File "/home/frigate/pycoral/examples/classify_image.py", line 37, in
from pycoral.adapters import classify
ModuleNotFoundError: No module named 'pycoral.adapters'"
I feel like I have tried everything at this point 🙂
William Lam says
Just tried this on my NUC12WSKi7 using ESXi 8.0 Update 1 and it works, again the NUC is a stock unit.
My workflow is to confirm that I can talk to Coral device by reloading the module and confirm the device shows up properly (e.g. Google Inc.) on ESXi and then confirming on Ubuntu VM. Once I'm able to connect, then I'm able to reboot ESXi and Coral device retains the initialized state
Given that you were NOT able to actually run the example tells me you were NOT able to successfully flash the firmware on the device and hence you did NOT actually follow the directions to get the same outcome as described on the blog.
You should probably double check that you followed all instructions on Coral documentation (hint: See 2a and ensure you've installed the Coral python module). I'm also using Ubuntu 20.04, as there's a python dependency where newer versions may not work
Finally, I've had more than a few folks comment that the steps above work as described, so if there's issues its either user error and/or kit that you're using.
Sp1910 says
Yes, there's something going on with the python dependencies. Not sure how to deal with those. I won't bother you further as it is my fault.
sp1910 says
I finally got it to work.
Erik says
Awesome job William! I enjoyed reading this.
I have the m.2 variant and hope that the Google Coral team can correct the pci behaviour or find a way around the problem.
Again great work, looking forward to more articles from you.
mr-manuel says
Thank you very much! Now it's working also on my ESX 😀 the CPU usage dropped by 60%
# Host
Intel NUC10i7FNH
ESXi 7.0 U3
USB Coral
# VM
Debian 10
William Lam says
My pleasure and thanks for sharing your success story and setup with everyone else! 💪
Sender says
Hi William, do you or your colleague have further succes in running Frigate further? I can't get that going...
https://github.com/blakeblackshear/frigate/issues/6472#issuecomment-1549641090
William Lam says
See my response in that thread ... seems like you or others may not have properly setup all he requirements for Frigate and this has nothing to do with Coral device ...
Mike says
Thanks for this, Champ!
Struggled to get it to work at first (Ubuntu 22.04 on my Frigate VM) but created another VM with ubuntu 20.04 with an older version of python3. Used that VM to flash the coral and then moved the usb passthrough back to the 22.04 machine. That solved it for me atleast.
Vince says
William, thank you for this post, very very helpfull !
In my case when "Reboot ESXi" in step 5 the device is not visible, but "Reload USB Module" work fine.
When host is power off, the Coral TPU lost config, so I enable power USB permanently to solve this problem.
I share my configuration and process to make it work
Hardware : INTEL NUC RNUC12WSHI5
Bios Version : WSADL357.0087.2023.0306.1931
Bios config :
- Startup on power detect (tab Power -> secondary power settings -> enable "USB S4/S5 Power")
- E-core disabled, P-core enabled
ESXi : 8.0U1-21495797-standard
Coral USB Edge TPU (connected on USB 3 port of NUC)
Debian11 with docker container with Frigate
## Step 1 -> On VM configure Coral TPU
mkdir coral && cd coral
apt install build-essential autoconf
git clone https://github.com/google-coral/pycoral.git
cd pycoral
bash examples/install_requirements.sh classify_image.py
## Step 2 -> ON VM Init Edge TPU (after each power loss on Coral TPU)
cd coral/pycoral
python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg
shutdown vm
## Step 3 -> ON ESX shell to detect Google inc. (reboot ESXi doesn't work, so I Reload USB Module)
/etc/init.d/usbarbitrator stop
vmkload_mod -u vmkusb;vmkload_mod vmkusb
kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}')
/etc/init.d/usbarbitrator start
## Step 4 -> reconfigure VM
Remove USB coral passthrough from VM
Save
wait 5 secs
Add USB coral passthrough to VM
wait 5 secs
start VM
Now lsusb on VM display "Google inc."
BIOS CONFIGUATION TESTS
1 - With bios option "USB S4/S5 Power" diabled
You can't reboot without restart config at step 2
You can't shutdown without restart config at step 2
2 - With bios option "USB S4/S5 Power" diabled
You can't reboot
You can't shutdown (without remove power)
Stephan says
Thank you William, a great writeup.
I managed to get it working on my NUC11PAHi7 with ESXi 8.0 U1a.
It was a heck of a job to get the firmware on it. I used Debian 12 and first tried webcoral. While using webcoral the 'make reset' failed on building dfu-util with libusb.h. I figured it used dfu-util to flash the firmware on the device.
So I extracted the apex_latest_single_ep.bin (firmware file) and installed dfu-util standalone.
With this command I flashed it successfully:
dfu-util -D apex_latest_single_ep.bin -d 1a6e:089a -R || true
Now I'm on to the Frigate NVR section!
AndreasR says
Thanks William, worked like a charm 🙂
/AndreasR
Michal Charvat says
Hi William,
thanks for tutorial however in my case it only works for short time. When I had there just one camera it works well but with 4 it goes down in few minuts. I have upgraded ESXi to 703.0.0.11.3.0-5 because described steps (also what I have found in other discussions) didnt work on 6.7. On 7.0 I am able to see the device, I am able to load it in docker container with frigate but as I mentioned above it fails in few minute with error:
[ 1328.081070] xhci_hcd 0000:0b:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
[ 1328.084066] xhci_hcd 0000:0b:00.0: Looking for event-dma 00000000aae85df0 trb-start 00000000aac1fe70 trb-end 0000000000000000 seg-start 00000000aac1f000 seg-end 00000000aac1fff0
[ 1328.084067] xhci_hcd 0000:0b:00.0: Looking for event-dma 00000000aae85df0 trb-start 00000000aae85000 trb-end 00000000aae85d80 seg-start 00000000aae85000 seg-end 00000000aae85ff0
[ 1335.218009] usb 2-1: reset SuperSpeed Gen 1 USB device number 7 using xhci_hcd
[ 1335.241446] usb 2-1: LPM exit latency is zeroed, disabling LPM.
I have currently bad feeling the USB isnt stable enough to run in HP Microserver G8 or G10+. If you had more luck on NUC its great but I do not wont to add other toy when I can create new VM with Frigate on existing machine.
AndreasR says
Seems like i get the same problem.
After starting up Frigate, the Coral is discovered and everything works like it should.
However, after 5 minutes - 5 hours, the Coral hangs and stops responding.
Then i need to unplug/plug in the USB device again to get it working.
Have tried different USB ports, another USB-C cable.
Started with ESXi 7.0.3, then upgraded to ESXi 8.0.1, but experiencing the same problem.
Running on an HPE proliant server.
AndreasR says
I solved the issue by purchasing a PCI-e USB 3.1 controller and running PCI passthrough to the VM. However, one thing I noticed was that even with PCI passthrough, the Coral was restarting several times an hour. I tried replacing the included USB-C to USB-A cable with another one, but encountered the same behavior. However, after replacing it with a USB-C to USB-C cable, it started running stably.
Romain says
Hello AndreasR,
Can you confirm that it still works correctly today?
I have exactly the same problem as you on an HPe GEN8.
If so, would you have the card model to be sure? I already have a usb 3.0 card (USB-A not USB-C).
Thanks a lot
Best regards,
Tim says
Same problem for me as well, running esxi on a nuc11. I just flashed the bios and confirmed power settings as well as moved the device to a powered hub, but it still has come back. Using Frigate and having 4 cameras it takes a few hours before it crashes. Longer if I put less cameras through it. When it crashes I have to reset the coral device firmware again to get it back a simple reboot doesn't seem to fix it. Guest os is debian 12 running hass supervised, but was seeing similar on ubuntu via docker.
William Lam says
For folks able to consistently reproduce the issue, can you please generate vm-support immediately after when this happens? Please provide a link for support bundle and I can see if Songtao can take a look and see if anything stands out, but this might be constraint of device working …
Frank says
Hey William, thanks for the blog post. I followed it 1:1 on ESXi 6.7 -U3, on servergrade hardware.
I came to a similar end, the stick suddenly disapeared from the VM. To fix it I need to restart the host (only VM was not working) and run your mini tutorial again. I set up a raspberry pi (with frigate and the google coral) to see if the behavior of the coral is different. The Coral device is not getting that hot on bare metal (eventou same .std settings). Further; on ESXi it is blinking all the time until it stops functioning and disappears from the VM. on my PI it is only blinking when objekt detection is performed. So I assume the USB version gets load of traffic or potentially wrong instructions, when used on ESXi with ubuntu 20.04, python 3.8 and latest coral resources. However I am willing to follow your proposal with the VM.support part. I just dont know how. Can you give me instructions?
William Lam says
Run “vm-support” on ESXi host via SSH to generate support bundle and the provide a download link to bundle . Ideally this is done AFTER Coral device stops working, so we have data when it was and after it stops working
Peter says
Any idea if this will work in 6.7?
William Lam says
Don’t know, try it … especially with 6.x being EOL’ed
Peter says
It works temporarily then get errors like
xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD
ep_index 2 comp_code 13
I then lose connectivity and have to restart esxi, repeat as the flashing. Does not retain Google device ID. May need to try USB to PCIe as mentioned by others. This is on HP elitedesk 800 G1. Thank you for making this post.
Peter says
Just wanted to add that I have had it working for a little while with usb 2. I am using it with frigate and the inference time is about the same as it was on the cpu, but it is still taking load off the CPU so this will work. Fingers crossed Google fixes the firmware in the future.
Martin Rask Thomsen says
Same as many others when using esxi (7 & 8)
ERROR Transfer event TRB DMA ptr not part of current TD
ep_index 2 comp_code 1
Using an intel nuc 13. No way to add a PCI-e usb hub 🙁
William Lam says
For folks able to consistently reproduce the issue, can you please generate vm-support immediately after when this happens? Please provide a link for support bundle and I can see if Songtao can take a look and see if anything stands out, but this might be constraint of device working …
rufftruffle says
@William Lam: I followed your instructions and everything worked as it should. But the tpu usb disappears from the esxi hardware after running a while (an hour tops) I have to go unplug and plug it back in and repeat the process. Any ideas what could this be?
I am running VMware ESXi 7.0.3 build-21930508 on a Dell Poweredge T630 server.
Should I buy a PCI-E 3.1 USB card and pass that directly to the vm? will that help with this?
Thanks
William Lam says
I’ve not seen this behavior for 8.x … I don’t have 7.x running, so can’t comment
Francesco says
Hi, I have the same issue (comment below) 🙁
Francesco says
Hi, really thanks for your guide! I followed all the steps and successfully run Frigate with the Coral on my Ubuntu 22 on esxi 7. Unfortunately I have a strange behavior: it works for an hour more or less (really well), and without doing nothing (tested 3 times) all the works is "deleted", the name of the usb on esxi is not Google anymore but the original "ID 1a6e:089a Global Unichip Corp." and obviously the vm don't read it. (to retry all the steps it I need to unplug and plug the coral because also the esxi don't see it from the gui but only frm the cli and then attach it to the vm)
Do you know how to fix it or if can be a problem of the coral so I need to sent it back and try a new one? Thanks
Francesco says
OMG I read only now the comment above me! It's the same!
William Lam says
For folks able to consistently reproduce the issue, can you please generate vm-support immediately after when this happens? Please provide a link for support bundle and I can see if Songtao can take a look and see if anything stands out, but this might be constraint of device working …
ESXi User 92375 says
same issue many others describe (not Williams fault or issue- but just a FYI) -
im on a supermicro x11 motherboard based server - esxi 6.7 (latest build)
thanks to Williams article I can get everything working and pass the coral through to frigate, it works great for anywhere from 10 minutes to an hour or 2 - then stops working (it disappears from lsusb at guest level and at esxi host level, or soemtimes will just revert back to "Global Unichip Corp" ) -> then only fix is to physically unplug and replug the coral TPU and then do the ridiculous process over again.
ridiculous in the sense that Google would not have non volatile firmware on these devices. It's almost as if they crippled the device on purpose, so that it could only be used for short testing.
Im wondering if its a power draw issue prehaps? (not likely though as the device specs show its pretty low power draw, esp. when not using the -std and not -max firmware
Im now going to try the pcie -> miniPcie adapter route (with a m-pcie coral tpu).
another GREAT article and info by william, HE IS THE MASTER OF ESXi / vSphere!! (thank you!)
Francesco says
I'm Francesco of the comment before you. Today I updated my esxi to version 8u2 and the issue is the same (I tried with a new VM to be sure).
So I'm almost sure is not an esxi version issue.
I suspect the same: it can be the power draw issue. I agree with you that if is that, it's a crap, how it can be possible thining a thing like that? Even if you fix the usb to maintain the max power the server will reboot soon or later and it's not possible to repeat all that steps everytime.
William Lam says
For folks able to consistently reproduce the issue, can you please generate vm-support immediately after when this happens? Please provide a link for support bundle and I can see if Songtao can take a look and see if anything stands out, but this might be constraint of device working …
FrigateESXi says
Im having similar issues to all others here, and have extensively debugged and tried to resolve this- the *ONLY* stable solution I have found is to use a USB-C to USB-A cable that does not support above 480mbit/s speeds (ie usb2)- Thus to artificially limit the speed of the coral. Using that ive had frigate stable for weeks (container uptime) , with an nvidia p1000 GPU pass-through + 15x 1080p cameras at ~ 5-8fps each. (note - FPS / IO on the usb coral are only consumed when motion / activity passes a certain threshold on each camera, so its not exactly 15x 5fps load upon the coral)
Im only using server grade supermicro hardware in these tests (ie x9 dual CPU board or a x11 dual CPU based server, for my testing).
examples:
NOT stable:
lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 10000M
|__ Port 3: Dev 8, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M
|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 10000M
|__ Port 3: Dev 5, If 0, Class=Vendor Specific Class, Driver=, 5000M
IS STABLE (although limited, but VERY usable, coral -> Frigate performance):
# lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M
|__ Port 2: Dev 3, If 0, Class=Vendor Specific Class, Driver=usbfs, 480M
I have 3x usb Corals + 1x pcie coral (Not for use at the same time just for testing different setups).
(also must be using a pcie USB 3.X card passed through to the VM, as in the entire pcie card via pci-passthrough.) - With this set up my inference speed is around 30MS (vs 10-12ms with unstable 5gbit usb link), coral is able to process about 30 FPS max (as opposed to the nearly 70 to 100 FPS I get when it's linked at usb-5gbit, however it will only be stable for anywhere from 5 minutes to an hour).
I have tried modifying frigate the docker img to use the coral-STD firmware (ie standard, not max firmware) - the issue persists with the only difference being the power draw is reduced and inference speed decreased. (measured by multimeter on usb power or a current meter)
(In my experience- although a lot of these issues *seem* like USB POWER related and yes the coral can be power hungry up to 1 amp, the issues in fact are not usb power related, assuming you are using a pcie USB card capable of > 1amp @ 5v per port, which all of my test cards are using.)
I have tried 3x different, new, USB 3.x pcie cards, as well as a few High quality powered USB3 hubs, The only stable result I have found is what I describe above (and is without a usb hub).
The way to confirm the USB -> coral linked speed is:
lsbusb -t
The pcie version of coral is stable, but *not* with any version of ESXi (i have only had pcie coral working / stable on frigate when using baremetal Ubuntu + docker).
Usual issues when using 5gbit usb -> coral :
xhci_hcd 0000:13:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1
xhci_hcd 0000:13:00.0: Looking for event-dma 000000000144a060 trb-start 000000000144bfe0 trb-end 000000000144bfe0 seg-start 000000000144b000 seg-end 000000000144bff0
or;
usb 2-1.3: reset SuperSpeed USB device number 4 using xhci_hcd
usb 2-1.3: LPM exit latency is zeroed, disabling LPM.
or
(or frigate will keep restarting with TPU related errors logged)
I really think the heart of these issues are the coral device / google. I have pretty extensive experience with ESXI, and can't recall any other major PCIE pass through issues ive delt with. (With many of my pcie passthrough VMs running Stable for several years, + moving much more pcie bandwidth / IO than the coral requirements here).
I would love to hear if anyone else has had success with Frigate + Coral + ESXi (or proxmox even ).
My next step is to start the troubleshooting on bare metal. (all above testing has been with esxi 6.7 u2 or esxi 8u2 - and ubuntu 20 LTS or ubuntu 22 LTS with docker running frigate)
Kev says
I’ve experienced the exact same thing and I’ve passed through usb cards, etc. the only solution is to use usb 2.0 speeds to connect to coral. I’m using Debian with docker container. I’m wondering if Debian is the culprit for these usb issues.
Jos says
What should the "usb.quirks.device0" setting be when using multiple USB devices? Because I have also connected a Zigbee and ZWave stick to the same VM.
I did get it to work, but Frigate suddenly loses the Google Coral USB after some time. No idea why, the docker host still has the Google device, but when trying the example python program, also there its not working anymore.
What could this be.?
Jos says
I did get it working, but I can't find a way to keep it stable. I have a Intel NUC 12 with a Google Coral USB. I'm running Debian 12 inside ESXi 8.0 Update 2.
When connecting the USB I get this inside ESXi:
# Bus 001 Device 004: ID 1a6e:089a Global Unichip Corp.
When connecting the USB from ESXi to the VM its connected as device 3:
# Bus 001 Device 004: ID 1a6e:089a Global Unichip Corp.
When starting Frigate it doesn't recognize it, restarting Frigate doesn't help.
But when I then reboot ESXi the lsusb shows:
# Bus 002 Device 006: ID 18d1:9302 Google Inc.
Then the USB isn't connected anymore to the VM, so I have to connect it to the VM again. Then inside the VM I also get
# Bus 002 Device 006: ID 18d1:9302 Google Inc.
Now I sometimes get it working by restarting Frigate, sometimes after restarting the VM, sometimes the entire USB is lost again after restarting both Frigate and the VM.
So not sure how to get this solution stable. Do I need to remove ESXi and install docker directly on the Iron without ESXi? Or do I forget something?
megapearl says
I'm having the same stability problems as all the comments above when it is connected to the USB3 port of the mainboard, but when connected to any of the USB2 ports it is running stable.
I'm running a Supermicro X10DRH-LN4 Mainboard with VMware ESXi 6.7.0-20497097
Kev says
Seems like either the coral device has stability issues with usb 3 or the vm does. Anyone notice Debian and Ubuntu have more problems?
Jos says
I removed the esxi layer, did everything with Docker already. Just received notification of this Article that he esxi is going to stop is i read it correctly. https://www.servethehome.com/broadcom-vmware-ends-free-vmware-vsphere-hypervisor-closing-an-era/
Marco says
Hi @William Lam:
Thank you very much for the time to write this Blogpost! Really helpful.
Im also running ESXI on an NUC13. The mentioned steps are working very well and i can use the Google-Coral-USB Device within my HomeAssistant VM for Frigate.
BUT...
1.) Sometimes it seems that the Device is resetting itself to "default firmware". (Dont know why, maybe a bug in the USB-Device itself.) if this is happening, i need to reinitialize it. Does someone have the same issues? My Device is running 24/7 processing camera streams. Maybe its not made for that "Productive Usage?"
2.) If i reload the complete ESXI USB Modules using "/etc/init.d/usbarbitrator stop && vmkload_mod -u vmkusb;vmkload_mod vmkusb && kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}') && /etc/init.d/usbarbitr
ator start"
ALL attached USB Devices in ESXI will get "reconnected", right? This is done with a delay of a few sconds. But this leads to other issues in my case: I also have a ZigBee "SkyConnect" Stick attached to the HomeAssistant VM. During this few seconds, HomeAssistant needs to reinitialize the whole ZigBee Network, which takes some longer time. And then, all ZigBee Devices are not working in that time-frame.
So is there an other way of "reloading" a specific USB-Device? (Coral in that Case) or could i even reload only the necassary usb-port instead? Maybe if i use one of the front-USB ports of the NUC13 instead of the backplane ports?
THANK YOU VERY MUCH!
best regards from Germany.