I have to say, this is one of the more interesting challenges that I have come across in quite some time. A user was looking for assistance after they accidentally disabled the vmkusb module, which is the USB driver for ESXi and allows it to communicate with USB devices that are connected to the system.
The vmkusb module also plays a very critical role if you have ESXi installed on a USB device, as the driver is required for proper functionality such as being able to save the ESXi state and configurations to the USB device. So what happens when you disable the vmkusb module and you reboot the ESXi host, which is also installed on a USB device?
Well, everything continues to work including VMs since ESXi by design runs in memory after the initial boot from the USB device. However, any configuration changes made after that is lost after a system reboot including the attempt to re-enable the vmkusb module since ESXi is unable save any of the settings to the USB device. Fortunately, I was able to help the user out as I had a few ideas on how we could fully recover from this type of scenario and hence the blog post.
Hopefully a lesson can be learned here, do not make changes or disable things that you are not familiar with 🙂
The high level recovery summary is to restore connectivity to the USB device so that we can restore ESXi without having to pull out the USB key, this is especially helpful if your system is in a remote location. We are going to replace the old state.tgz file within the active ESXi bootbank, which is why we need access to the USB device with an updated version which has the vmkusb module re-enabled. This procedure has been validated by both the user as well as myself as I was able to reproduce this locally on my Intel NUC using both ESXi 7.x and 8.x.
Step 1 - Do not panic. Boot ESXi normally, it will startup and run in ramdisk as by design. If you have VMs automatically started, you can leave them running or you can shut them down, the procedure below has no impact to running workloads. You will need access the ESXi host via SSH since any local USB keyboards connected to the host will not function without the vmkusb module enabled.
Step 2 - Re-enable vmkusb module and then restart the ESXi device manager process, which will have ESXi re-attempt the USB device claiming:
esxcli system module set -m vmkusb -e true
kill -SIGHUP $(ps -C | grep vmkdevmgr | awk '{print $1}')
At this point, you should be able to list any USB attached devices including the device that has ESXi installed by running the lsusb command as shown in the screenshot below. We should also be able to see the two ESXi bootbanks under /vmfs/volumes
Step 3 - Since we have re-enabled the vmkusb module, you can run esxcli system module list | grep vmkusb to confirm. We now just need to generate a new ESXi state.tgz file and we can do so by running the following command:
/sbin/auto-backup
As you can see from the screenshot, the new state.tgz file is stored in /bootbank directory.
Step 4 - We now need to replace the old state.tgz with the newly created one that contains the re-enablement of the vmkmod module. Before we can do this, we need to identify the active bootbank, since ESXi uses a dual-bootbank architecture and this allows users to easily revert from a previous patch or upgraded ESXi version.
To do so, simply cat out both /vmfs/volumes/BOOTBANK1/boot.cfg and /vmfs/volumes/BOOTBANK2/boot.cfg and see which one has the higher value for the updated property, which indicates the current active bootbank. In my example, bootbank 1 was the active but it may be different for your deployment, so please confirm before proceeding.
Now, copy the new state.tgz into the desired bootbank and confirm the overwrite by using the following command:
cp /bootbank/state.tgz /vmfs/volumes/BOOTBANKX/
Step 5 - Lastly, you just now need to reboot the ESXi host and all USB functionality will be completely restored!
Thanks for the comment!