I am super excited to be able to finally share, what I think, is a really cool ESXi-Arm solution which has been an evolution of this and this. This solution also incorporates a number of automation techniques I have shared over the years when it comes to ESXi scripted installation aka Kickstart, so it was really neat to all those things get pulled into a single solution. Lastly, I also want to give huge thanks to Cyprien Laplace who threw the initial challenge my way after I had shared how to perform an ESXi-Arm scripted installation without using SD Card.
ESXi-x86 can be deployed using either a stateful or stateless installation. In the latter case, ESXi is booted over the network using the vSphere Auto Deploy feature in vCenter Server which does not require any local media for ESXi. Upon attaching itself to vCenter Server, Auto Deploy then leverages vSphere Host Profiles and its rules engine to determine which configurations or profiles should be applied to ensure the ESXi hosts are configured per their desired stated. Here is a quick video overview of how Auto Deploy and Host Profiles work.
Fundamentally, vSphere Auto Deploy and Host Profiles can also work with ESXi-Arm but today, vCenter Server would require some code modification for this to actually work.
OK, so am I teasing you with something that does not exists? Nope, but I just wanted to help set the context 🙂
The solution that I have created boots ESXi-Arm over the network in a "stateless" manner, so there is no need for an SD Card or USB device plugged into the Raspberry Pi (rPI). In addition to the ESXi-Arm files, it also includes a custom payload which runs to retrieve additional configurations which can automatically join a desired vCenter Server as well as apply further customizations of an ESXi-Arm host. As you can see, this solution behaves similar to that of vSphere Auto Deploy and Host Profiles but does not use either of these vSphere features and works with the ESXi-Arm Fling right now.
Technically speaking, these techniques can also be applied to ESXi-x86 but I will leave that to the reader for further exploration.
Here is a quick video demonstrating my ESXi-Arm Stateless solution booting one of my Raspberry Pi 4 systems:
Below are the instructions on how to set this up and although they are a bit lengthy, it is well worth the effort!
Step 1 - Download and install the Raspberry Pi Imager Tool for your desktop OS. This is needed as we need to install Raspberry Pi (rPI) OS onto our SD Card so that we can change the default rPI boot order to 0xf241 which attempts booting using the following order (right to left): SD Card, USB and then Network Boot. If no bootable devices are found, the sequence is repeated all over again. This order would allow us to perform our stateless boot and for those that prefer to have a stateful installation on USB, you can install ESXi-Arm via Kickstart as outlined in my previous article.
Step 2 - Power up the rPI with the SD Card plugged in, which has rPI OS image. Once rPI OS boots up, open up a terminal and run the following command to apply the latest EEPROM and update the boot order, which is only available when using the command-line.
PI_EEPROM_VERSION=pieeprom-2020-09-03
wget https://github.com/raspberrypi/rpi-eeprom/raw/master/firmware/beta/${PI_EEPROM_VERSION}.bin
sudo rpi-eeprom-config ${PI_EEPROM_VERSION}.bin > bootconf.txt
sed -i 's/BOOT_ORDER=.*/BOOT_ORDER=0xf241/g' bootconf.txt
sudo rpi-eeprom-config --out ${PI_EEPROM_VERSION}-netboot.bin --config bootconf.txt ${PI_EEPROM_VERSION}.bin
sudo rpi-eeprom-update -d -f ./${PI_EEPROM_VERSION}-netboot.bin
Reboot for the changes to go into effect. At this point, you can now shutdown the rPI and remove the SD Card from the system.
Step 3 - Follow Steps 1-7 from this blog post in setting up dnsmasq for our PXE infrastructure. I really like dnsmasq as it integrates with your existing DHCP environment and is fairly easy to setup. From here on out, I will refer to this this system as our PXE Server and in my example, the IP Address of this system is 192.168.30.176.
To perform a stateless boot of ESXi-Arm, you just need to remove all the default boot options in the kernelopt line in ESXi-Arm efi/boot/boot.cfg configuration file. To make our solution a bit more dynamic, we are going to leverage a few custom kernel boot options which we will define and will get passed into our custom script. The three options are
- configServer - IP Address of your PXE Server which also runs the web server hosting the configuration files for customization
- joinVC - Specifies whether to automatically join ESXi-Arm host to vCenter Server
- runExtraConfig - Specifies whether to apply additional post-deployment configurations
Note: When ESXi-Arm boots up in stateless mode, the default root password is empty. This is also reflected when ESXi-Arm host is added to vCenter Server. It should be possible to change the password as part of the post-deployment configuration but the default behavior is to have an empty password.
Step 4 - Replace the kernelopt line in /srv/tftpboot/esxi-arm/efi/boot/boot.cfg with the example below, where the IP Address will be the PXE server that will be hosting our configuration files. We also need to append the modules line with our custom payload called extra.tgz which actually does all the magic.
bootstate=0 title=Loading ESXi installer timeout=5 prefix=esxi-arm kernel=b.b00 kernelopt=configServer=192.168.30.176 joinVC=true runExtraConfig=true modules=jumpstrt.gz --- useropts.gz --- features.gz --- k.b00 --- procfs.b00 --- vmx.v00 --- vim.v00 --- tpm.v00 --- sb.v00 --- s.v00 --- ena.v00 --- bnxtnet.v00 --- bnxtroce.v00 --- brcmfcoe.v00 --- brcmnvme.v00 --- elxiscsi.v00 --- elxnet.v00 --- i40en.v00 --- i40iwn.v00 --- iavmd.v00 --- igbn.v00 --- iser.v00 --- ixgben.v00 --- lpfc.v00 --- lpnic.v00 --- lsi_mr3.v00 --- lsi_msgp.v00 --- lsi_msgp.v01 --- lsi_msgp.v02 --- mtip32xx.v00 --- ne1000.v00 --- nenic.v00 --- nfnic.v00 --- nhpsa.v00 --- nmlx4_co.v00 --- nmlx4_en.v00 --- nmlx4_rd.v00 --- nmlx5_co.v00 --- nmlx5_rd.v00 --- ntg3.v00 --- nvme_pci.v00 --- nvmerdma.v00 --- nvmxnet3.v00 --- nvmxnet3.v01 --- pvscsi.v00 --- qcnic.v00 --- qedentv.v00 --- qedrntv.v00 --- qfle3.v00 --- qfle3f.v00 --- qfle3i.v00 --- qflge.v00 --- rste.v00 --- sfvmk.v00 --- smartpqi.v00 --- vmkata.v00 --- vmkfcoe.v00 --- vmkusb.v00 --- vmw_ahci.v00 --- elx_esx_.v00 --- btldr.v00 --- esx_dvfi.v00 --- esx_ui.v00 --- esxupdt.v00 --- tpmesxup.v00 --- weaselin.v00 --- loadesx.v00 --- lsuv2_hp.v00 --- lsuv2_in.v00 --- lsuv2_ls.v00 --- lsuv2_nv.v00 --- lsuv2_oe.v00 --- lsuv2_oe.v01 --- lsuv2_oe.v02 --- lsuv2_sm.v00 --- native_m.v00 --- qlnative.v00 --- vmware_e.v00 --- vsan.v00 --- vsanheal.v00 --- vsanmgmt.v00 --- tools.t00 --- imgdb.tgz --- imgpayld.tgz --- extra.tgz build=7.0.0-1.0.40886095 updated=0
Step 5 - Download (or create) the extra.tgz to your PXE Server and copy that to the /srv/tftp/esxi-arm directory
Step 6 - Create /var/www/html/esxi-arm-config.json file which contains the vCenter Server configuration for the ESXi-Arm host to automatically join along with the matching NTP server as this is required. For security purposes, you should consider creating a non-administrator account which only has permissions to add ESXi-Arm hosts to a specific vSphere Cluster. If you do not want your ESXi-Arm host to automatically be joined to vCenter Server, simply set the joinVC boot option to false
{ "vcenter_server": "192.168.30.200", "vcenter_user": "*protected email*", "vcenter_pass": "VMware1!", "vcenter_datacenter": "Arm-Datacenter", "vcenter_cluster": "Arm-Cluster", "ntp_server": "pool.ntp.org" }
Step 7 - Create the /var/www/html/esxi-arm-extra-config.sh file and set it to be executable. This is basically a shell script that contains ESXi-Arm shell commands that would be executed for additional host configurations. If you do not have additional configurations, you can disable this by simply setting the runExtraConfig boot option to false.
Below is a very basic example which simply suppresses the warnings found on the vSphere UI. For shared storage such as configuring NFS/iSCSI, it is recommended that you place those settings here so that all ESXi-Arm hosts will have the same configurations.
#!/bin/sh # Suppress UI Warnings esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1 esxcli system settings advanced set -o /UserVars/SuppressCoredumpWarning -i 1
Step 8 - Download the latest official Raspberry Pi Firmware and extract the contents to your local desktop, you should have a folder called firmware-master. This corresponds to the microcode necessary to initialize the Raspberry Pi. Download the latest community Raspberry Pi 4 UEFI firmware and extract the contents to your desktop you should have a folder called RPi4_UEFI_Firmware_v1.20. This is the firmware necessary to boot ESXi-Arm.
Step 9 - Delete all files starting with kernel*.img within firmware-master/boot directory and then copy the entire content of the boot directory into a new folder called uefi
rm ~/Desktop/firmware-master/boot/kernel*.img
cp -rf ~/Desktop/firmware-master/boot/* uefi
Step 10 - Copy all files within the RPi4_UEFI_Firmware_v1.20 directory into the same uefi directory
cp -rf ~/Desktop/RPi4_UEFI_Firmware_v1.20/* uefi
Note: For 4GB Pi 4 only, edit the config.txt file in the uefi directory and append gpu_mem=16:
Step 11 - Zip up the contents of the uefi folder and not the folder itself. On a Mac, this can be done by changing into the folder and running the following command from within the folder itself and name it uefi.zip:
zip -r ../uefi.zip *
Step 12 - SCP the uefi.zip file to our PXE Server and place it under /srv/tftpboot
Step 13 - Run the following command to create our UEFI directory and unzip the contents of the uefi.zip file
mkdir /srv/tftbroot/rpi-uefi-1.20/
cd /srv/tftbroot/rpi-uefi-1.20/
unzip uefi.zip
Step 14 - We need to obtain the serial number of our rPI as it expects the UEFI files to be placed in a directory with that ID. You can easily do this by just powering on the rPI and the serial will be displayed under the board: line as shown in the screenshot. In my example below, it is 49a6ff15
Login to the Kickstart server and we will just create a symlink for our rPI serial to our UEFI files which is stored in /srv/tftpboot/rpi-uefi-1.20/ by running the following command:
ln -s /srv/tftbroot/rpi-uefi-1.20/ 49a6ff15
Step 15 - Finally, enable and start both dnsmasq and apache2 services by running the following commands:
systemctl enable dnsmasq
systemctl start dnsmasq
systemctl enable apache2
systemctl start apache2
You are now ready to power up your rPI and see the stateless magic happen! Not only is this an easy way to deploy ESXi-Arm, especially with the 180 days evaluation period but super simple way to try out newer version of the ESXi-Arm Fling without much hassle, especially for those that have more than one device.
Troubleshooting
The default extra.tgz payload has been configured to log directly to ESXi Console during boot up but also into /var/log/syslog on the ESXi-Arm host. You can simply grep for the keyword STATELESS-DEBUG to see what is happening.
Below is a log snippet for an initial deployment:
[STATELESS-DEBUG] Enabling and Starting SSH
[STATELESS-DEBUG] Enabling and Starting ESXi-Arm Shell
[STATELESS-DEBUG] Enabling httpClient on ESXi-Arm Firewall
[STATELESS-DEBUG] Processing ESXi-Arm Boot Options
[STATELESS-DEBUG] Downloading ESXi-Arm Configuration File
[STATELESS-DEBUG] Configuring NTP
[STATELESS-DEBUG] Downloading ESXi-Arm Extra Configuration Script
[STATELESS-DEBUG] Running esxi-arm-extra-config.sh
[STATELESS-DEBUG] Running join-vcenter.py
[STATELESS-DEBUG] jsonConfigData={vcenter_pass: VMware1!, ntp_server: pool.ntp.org, vcenter_server: 192.168.30.200, vcenter_datacenter: Arm-Datacenter, vcenter_cluster: Arm-Cluster, vcenter_user: *protected email*}
[STATELESS-DEBUG] Creating AddHost Spec
[STATELESS-DEBUG] hostAddSpec={vmFolder: null, port: 443, userName: root, sslThumbprint: 25:B3:EC:4C:D1:68:E3:4B:29:2F:AC:CF:BB:E0:2A:F2:7D:F1:2F:23, vimAccountName: null, lockdownMode: null, dynamicType: null, dynamicProperty: [], hostName: 192.168.30.91, managementIp: null, hostGateway: null, force: true, password: , vimAccountPassword: null}
[STATELESS-DEBUG] Joining vCenter Server
Upon a reboot or power cycle, one thing I needed to consider was that the previous ESXi-Arm host which was added to vCenter Server is now in a disconnected state and would cause re-connecting to fail since the ESXi-Arm host IP/Hostname has been seen before. This is automatically handle by checking to see if the ESXi-Arm IP exists in vCenter and if so, remove that entry prior to re-adding. You will know that ESXi-Arm host has gone through a reboot with the additional log entry of "Removing previous ESXi-Arm instance X" where X is the IP Address.
Below is a log snippet of reboot or power cycle:
[STATELESS-DEBUG] Enabling and Starting SSH
[STATELESS-DEBUG] Enabling and Starting ESXi-Arm Shell
[STATELESS-DEBUG] Enabling httpClient on ESXi-Arm Firewall
[STATELESS-DEBUG] Processing ESXi-Arm Boot Options
[STATELESS-DEBUG] Downloading ESXi-Arm Configuration File
[STATELESS-DEBUG] Configuring NTP
[STATELESS-DEBUG] Downloading ESXi-Arm Extra Configuration Script
[STATELESS-DEBUG] Running esxi-arm-extra-config.sh
[STATELESS-DEBUG] Running join-vcenter.py
[STATELESS-DEBUG] jsonConfigData={vcenter_datacenter: Arm-Datacenter, vcenter_cluster: Arm-Cluster, vcenter_pass: VMware1!, vcenter_user: *protected email*, ntp_server: pool.ntp.org, vcenter_server: 192.168.30.200}
[STATELESS-DEBUG] Removing previous ESXi-Arm instance 192.168.30.91
[STATELESS-DEBUG] hostAddSpec={lockdownMode: null, userName: root, vimAccountName: null, hostName: 192.168.30.91, port: 443, hostGateway: null, vmFolder: null, force: true, dynamicType: null, sslThumbprint: E5:14:E0:A4:9F:AE:D8:4F:57:DF:01:5D:BD:B2:C0:A6:4F:5E:FC:A9, managementIp: null, password: , vimAccountPassword: null, dynamicProperty: []}
[STATELESS-DEBUG] Joining vCenter Server
brimur says
Pretty cool. What kind of VMs can you run on it, ARM only I assume?
William Lam says
Please see the ESXi-Arm Documentation, there are over 12+ Arm GuestOS and yes, Arm only, no x86 🙂
Dennis Faucher says
Very cool. Thank you for this.
adilinden says
Will the individual rPi4 persist within vCenter? Can workloads be placed on it that will restart on reboot, can a bunch of rPi added to a cluster to dynamically scale for workloads?
William Lam says
Yes, until next reboot and hence you need to add any customizations into script or have post-deploy script or even use Host Profile if you want to use that
Darren Williams says
Amazing guide but certainly, at least for me, this is not working as expected.
The problem seems to be with the automatic selection of the UEFI boot device.
It appears from testing that using the symbolic link to the uefi firmware for multiple pis does not work.
It seems to only work for the device you took the firmware from, ie there is something in that folder that makes it specific to the device.
If I put an SD card with uefi firmware in the pi then select pxe boot and save it, then copy the firmware to an individual folder named with the serial of the board, everything works as expected.
I'm sure people of greater skill will be able to resolve this and we will eventually have something that works.
On the other hand, I could be doing something completely wrong.
Thanks for your excellent work 🙂
William Lam says
Darren,
Unlike the earlier solutions, the UEFI is never actually copied onto the device if you follow the instructions outlined in this blog post, so not sure what you mean by specific to this device. The only key requirement is to ensure that you've update the eeprom and change the default boot order in Step 2. I'd say double check the instructions to make sure you didn't miss anything
Darren Williams says
If I just plug in a new Pi having set it to do the boot from network, when I have a symlinked efi boot folder the pi boots to the uefi screen and then loops. I have to hit the escape in the uefi screen and then select pxe boot in order for it to work. If I reboot it again, same thing happens.
Darren Williams says
One more tiny one. The extra.tgz provided in the link is not the one that contains the code to delete a previously registered host. I fixed on my system by cloning the repo and retarring it.
William Lam says
Thanks for the catch Darren! Just fixed and pushed the updated extra.tgz to Github
Craig McPhee says
Thank you so much! This has saved so much time and effort!
We have implemented your network boot across a cluster of 4x8GB Pi's; the only issue was the sim link, which does not appear to work across multiple hardware (I assume this is a secure boot / security feature within the Rasp Pi UEFI process, I will post solution if I can fix it). The UEFI firmware build must be customised per host, with its' own directory (named to host serial number) within the TFTP server. The rest works seamlessly, including host registration into vCenter.
Many Thanks!
Darren Williams says
@Craig McPhee, @Williams Lam
I think Craig is confirming the issue I mentioned above regarding sim links.
All is now working perfectly including auto attaching iSCSI storage and creating VLANs
Thanks again.
Mike says
Got it to boot to pxe server but I get the purple screen saying I only have 3GB of memory, I must be missing a step but I can find it...Any pointers?
Mike Q says
FYI, my PI is an 8GB model
tutugreen says
To UEFI Settings (Hit ESC when the boot logo appeared.)
Device Manager>Raspberry Pi Configuration>Advanced Configuration>Limit RAM to 3 GB->Disabled
(Double-check this after reboot. Someone said their settings were not saved properly on the first try.)
Toby says
Hi,
Just to confirm Darren's/Graig's comments.
It seems the issue here may be that in order to use the full 8Gb the virgin firmware must be booted on the pi and then set to disable the 3Gb limit.
Once this has been done the firmware can be copied away and used for tftpboot, however this does somehow then render this firmware exclusive to that particular pi, attempts to use the symlink trick to boot further pi's will fail and the process must be repeated for each device; requiring numerous firmware copied on the boot server.
Mike Q says
Thank you!
Toby says
Further to my previous comment, it can be seen using xxd that the MAC address becomes hard coded in the firmware RPI_EFI.fd data, which would potentially explain this, ideally what we need here is a way to disable the 3Gb limit without needing to boot the firmware on a specific Pi, maybe the owners of the project could be persuaded to release a branch that has limit removed by default?
Mike Q says
That would be great...or add the option in the config file.
Richard says
Love, love, love your blog! So far, your stuff just works. Other solutions I've tried only seem to work for *some* people (but never me - haha).
Anyway - here's my question:
Being 'Stateless' - how much should I worry about software upgrades breaking things down the road?
Thanks mucho!
simonsparks2060de5d00 says
Has this project / fling for ESXi on Raspberry Pi died ?
William Lam says
No. See vmwa.re/flings
Dominic Spatz says
Hey, thanks for this amazing content library among flings / ESXi on Arm.
I got a question regarding PXE Boot. I just stumbled accros tftpd-hpa and wanted to ask if there is a way to utilize this. I mean, i got a DHCP running and dnsmasq etc. is handled over there.
I tried to implement it, but I got some questions.:
I did make a symlink from my serial, e.g. 48c3fd3e -> rpi-uefi-1.35-4gb/
(I want to use 4 and 8GB side by side, but that shouldn't be an issue)
What is the filename for booting it up? Typically kickstart is using like a ks.cfg, but maybe i have not seen it.
Thanks for your support
Dominic Spatz says
Seems that i got it. I used some different tools and two different Pis.
As I am using an UDM Pro / Cloud Gateway manufactured by UniFi/Ubiquiti, this was a bit different.
So I created a VLAN for those Pis and set up a new Ubuntu VM on which i installed those packages "tftpd-hpa apache2 zip unzip net-tools"
"tcpdump -i ensXYZ port 69" is my go-to troubleshooting tool! (besides "netstat -an")
I added the IP of my "Kickstart-VM" as the option "TFTP" under DHCP Options, e.g. 172.16.1.100
But: PXE has to be enabled as well,
So I ticked "Network Boot" and added the following
(Host/Filename)
172.16.1.100 esxi-arm/efi/boot/bootaa64.efi
If you got some improvements, like how can i get a menu in there etc. It would be amazing!
So i threw my files in there, for both versions of the pi4:
this is the content of /srv/tftp:
(I had to link Serial Number and MAC adress, but thats fine)
01-d8-3a-dd-31-de-ee -> ./esxi-arm/efi/boot/
01-dc-a6-32-16-3e-49 -> ./esxi-arm/efi/boot/
48c3fd3e -> rpi-uefi-1.35-4gb/
852e586a -> rpi-uefi-1.35-8gb/
So there are three folders in place: esxi-arm, rpi-uefi-1.35-4gb, rpi-uefi-1.35-8gb
Everything else are symlinks.
Thanks to this forum and https://dev.to/weeee/raspberry-pi-cluster-part-1-the-boot-2fe5 (which gave me the advise with the UDM DHCP tftp server/Unifi stuff)