OK, the wait is finally over! In this final article, we will now walk through the process of getting access to this project as well as how to get this deployed in your own environment. For those that just want to see the code, you can find it at the Github project below:
Github Project: https://github.com/lamw/usb-to-sddc
Below are the details outlining the environment and software requirements as well as the instructions to consume this in your own home lab environment. The content below is a subset of what is published on the Github project, but this should get you going. For more details, please refer to the Github project and if you have any issues/questions, feel free to file a Github issue.
Environment Requirements:
- USB key that is at least 6GB in capacity
- Access to either macOS or Linux system as the script that creates the USB key is only supported on these two platforms
- No additional USB keys must be plugged into the hardware system other than the primary installer USB key
- Hardware system must have at least 2 disk drives which can either be 1xHDD and 1xSSD for running Hybrid vSAN OR 2xSSD for running All-Flash vSAN
- Both Intel NUC 6th Gen and Supermicro E200-8D and E300-8D have been tested with this solution. It should work with other hardware systems that meet the minimum requirements but YMMV
Software Requirements:
- ESXi 6.5a - VMware-VMvisor-Installer-201701001-4887370.x86_64.iso
- VCSA 6.5b - VMware-VCSA-all-6.5.0-5178943.iso
- DeployVM.zip
- UNetbootin (Required for Mac OS X users)
Note: Other ESXi / VCSA 6.5.x versions can also be substituted, this includes the latest ESXi 6.5d (vSAN 6.6) release which I have also verified myself.
UPDATE (04/17/18) - No changes are required to get vSphere 6.7 to work, the only minor thing to be aware of is that the vSphere Web Client customization has changed in 6.7 and so you need to set VCSA_WEBCLIENT_THEME_NAME="" as empty string or you will find that the UI will not load unless you delete the customization directory in the VCSA that was pulled down automatically.
Usage:
Step 1 - Clone the Github repository to your local system by running the following command:
git clone https://github.com/lamw/usb-to-sddc.git
If you do not have git installed on your computer or if you prefer to just download the scripts manually, you can do so by downloading the following file below: https://github.com/lamw/usb-to-sddc/archive/master.zip
Step 2 - Change into the usb-to-sddc directory or extract the contents if you downloaded the zip file from Github instead.
Step 3 - Download all the files listed in the Software Requirements above to your local desktop.
Step 4 - Open the KS.CFG using your favorite text editor such as vi, Visual Studio Code, etc. Search for the following tag in the file # ---> START EDIT HERE <--- # which should be located on Line 10.
There are 25 variables as shown below which can be adjusted to customize your deployment:
VSAN_DISK_TYPE="AF"
PHOTON_IP="192.168.1.10"
PHOTON_CIDR="24"
PHOTON_GATEWAY="192.168.1.1"
PHOTON_DNS="192.168.1.1"
ESXI_IP="192.168.1.100"
ESXI_PASSWORD="VMware1!"
ESXI_NETMASK="255.255.255.0"
ESXI_GATEWAY="192.168.1.1"
ESXI_HOSTNAME="nuc.primp-industries.com"
ESXI_DNS="192.168.1.1"
VCSA_IP="192.168.1.200"
VCSA_HOSTNAME="192.168.1.200"
VCSA_PREFIX="24"
VCSA_GATEWAY="192.168.1.1"
VCSA_DNS="192.168.1.1"
VCSA_SSO_DOMAIN_NAME="vsphere.local"
VCSA_SSO_SITE_NAME="virtuallyGhetto"
VCSA_ROOT_PASSWORD="VMware1!"
VCSA_SSO_PASSWORD="VMware1!"
VCSA_SSH_ENABLED="true"
VCSA_CEIP_ENABLED="true"
VCSA_DATACENTER_NAME="VSAN-Datacenter"
VCSA_CLUSTER_NAME="VSAN-Cluster"
VCSA_WEBCLIENT_THEME_NAME="CormacHogan"
The variables should be pretty self-explanatory, but here are few that need some additional explanation:
- VSAN_DISK_TYPE - Defines whether you have a Hybrid or All-Flash vSAN setup based on your physical disks. The valid values are HYBRID or AF.
- PHOTON_IP - This is the IP Address of the DeployVM. If you are deploying in an isolated network (e.g. using cross-over cable between your laptop and server), make sure the network between DeployVM and ESXi host is on the same network.
- PHOTON_CIDR - This is the CIDR network for DeployVM (e.g. 24 = /24 = 255.255.255.0)
- PHOTON_GATEWAY and PHOTON_DNS - This is the Gateway and DNS Server for DeployVM.
- VCSA_IP and VCSA_HOSTNAME - If you do not have valid DNS in your enviornment which both forward and reverse is functional, then make sure both these variables have the exact same IP Addresss or your VCSA deployment will fail as it will try to resolve the hostname (FQDN) with the DNS server you provided.
- VCSA_WEBCLIENT_THEME_NAME - Defines the theme that will automatically be applied if you wish to customize the vSphere Web Client as described here. You can find the complete list of theme names here.
Step 5 - Next, edit either create_sddc_deployment_on_usb_for_osx.sh or create_sddc_deployment_on_usb_for_linux.sh depending on the platform that you will be using to create the USB installer. You will need to edit the following variables at the top of the script which will point it to the download files you had performed in Step 3.
- UNETBOOTIN_APP_PATH - Complete path to the UNetbootin application directory, only applicable for Mac OS X users
- ESXI_ISO_PATH - Complete path to the ESXi ISO
- VCSA_ISO_PATH - Complete path to the VCSA ISO
- ESXI_KICKSTART_PATH - Complete path to the KS.CFG
- DEPLOYVM_ZIP_PATH - Complete path to the DeployVM zip
Step 6 - The next step is to now use the create_sddc_deployment_on_usb_for_X.sh to create our USB insaller. Plug a USB key into your system. Please be aware, all contents on the USB key will be wiped after you confirm the USB key that is to be used to create the automated installer.
First you need to identify the USB device that was plugged in to your system to make sure you select the right one.
On macOS - You can run diskutil list and identify the device which should look like /dev/diskX.
On Linux - You can run parted -l and identify the device which should look like /dev/sdX.
Next, to run the script you will need to use sudo and pass in the device that you had retrieved from the previous commands.
Here is an example of running on macOS system:
sudo ./create_sddc_deployment_on_usb_for_osx.sh /dev/disk4
Here is an example of running on Linux system:
sudo ./create_sddc_deployment_on_usb_for_linux.sh /dev/sdb
The script will automatically clear existing partitions and create the expected partition scheme. It will copy all the software packages you had downloaded from Step 3 and once it has completed, it will also unmount the USB device.
Step 7 - The final step is to now take the USB key and plug it into your system and simply power it on. If you want to verify that things are working, you can connect an external monitor and watch the installation but I will warn you, it is pretty boring 🙂 If things are going well, you should see the ESXi installer stay on the "reading installation file" for quite a while as this is where the majority of the time is spent during the %pre section where it forms the vSAN datastore and copies all the files from PAYLOAD partition over to vSAN.
Once ESXi has been successfully installed, which you can verify by observing it is on the main boot screen with an IP Address. You can open a browser to ESXi Embedded Host Client (e.g https://[IP]/ui) and login. Depending on when this is done, you may only see the DeployVM and/or VCSA being deployed. If you want to follow the remainder progress of the deployment, you can login to the DeployVM using the IP Address you had assigned it and the credentials is root/VMware1! by default.
Once logged into the DeployVM, you can tail /root/script.log which will give you the progress of the VCSA deployment and configuration.
Expected Timings
Here is what you can expect from a timing standpoint from creating the USB Installer to plugging it into your system and provisioning the SDDC. From the testing I have observed in my personal lab, the USB Installer took 11min and the USB to SDDC deployment took 45min, which is from the time I plugged it into the NUC and powered it on to point where I can login to the vSphere Web Client of the vCenter Server. Obviously, YMMV depending on your hardware configuration.
Process | Estimated Time |
---|---|
Create USB Installer key | 10-15 minutes |
USB to SDDC deployment | 45-60 minutes |
mitesh says
Nice ! Thanks for getting this to the community 🙂
Ted Striker says
"Access to either macOS or Linux system as the script that creates the USB key is only supported on these two platforms"
Surely you can't be serious
William Lam says
Ted,
I would love to have multi-platform support and provide choice, however there's a limitation w/Windows if you're not familiar with it that prevents you from creating multiple partitions on a USB device (which is needed for this solution). There's workarounds but that requires 3rd party tools and most don't provide automation interfaces, so I wasn't left with much choice. Now, you can still use the solution, but instead of having a script that automate the creation of the USB key, you will have to do that portion manually.
Benedikt Frenzel says
Hi William,
first of all thank you for the great post. Please keep up the good work.
I just saw that the links arund the "VCSA_WEBCLIENT_THEME_NAME" description are not working in the post. On github they are fine. 😉
Greetings from Cork,
Benedikt
Benedikt Frenzel says
Ok I have to correct my self github is also broken.
Joe Clifford says
The 2 links are
https://github.com/lamw/customize-vsphere-web-client-6.5
and
https://github.com/lamw/customize-vsphere-web-client-6.5/tree/master/themes
Maher B says
Hi William,
First of all thank you very much.
I had to comment lines:
#ls ${VCSA_ISO_DIRECTORY}/VCSA-part-* > /dev/null 2>&1
#if [ $? -eq 1 ]; then
...
#else
# echo "VCSA ISO has already been splitted, skipping step ..."
#fi
from create_sddc_deployment_on_usb_for_osx.sh
because of the error:
cp: cannot stat '/VCSA-part-*': No such file or directory
Also Linux USB creation was OK (some errors on compilation though, I think I need to force gcc-4.8) but install failed on %pre% line 209, could not investigate more for lack of time (sorry about that).
tiger says
Don't you need the DNS server to depl\oy and run the vCSA? Or is it unnecessary if VCSA_HOSTNAME is set as an IP address?
William Lam says
Ideally yes, but for lab env, DNS may or may not be available. If its not available, then make sure IP and HOSTNAME is set to IP Address or else you will get a failed VCSA deployment
tiger says
Thanks for the reply. I'll take a try and modify the Deploy VM to also act as a DNS server since I may want to change the vCSA IP after deployment.
Patric Stuerzebecher says
Hi William,
i got your script running on intel nuc7i3, using a prep´d iso. I used powercli:
Add-EsxSoftwareDepot https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml
new-esximageprofile -cloneprofile "ESXi-6.5.0-20170104001-standard" -name "ESXi-6.5.0-20170104001-standard-7nuc" -vendor vGhetto
Remove-EsxSoftwarePackage -ImageProfile "ESXi-6.5.0-20170104001-standard-7nuc"-SoftwarePackage "net-e1000e"
Remove-EsxSoftwarePackage -ImageProfile "ESXi-6.5.0-20170104001-standard-7nuc"-SoftwarePackage "ne1000"
add-EsxSoftwarePackage -ImageProfile "ESXi-6.5.0-20170104001-standard-7nuc"-SoftwarePackage "net-e1000e 3.2.2.1-2vmw.600.3.57.5050593"
Export-EsxImageProfile -ImageProfile "ESXi-6.5.0-20170104001-standard-7nuc" -ExportToIso -FilePath C:\Temp\esxi650nuc7.iso
My Setup is nuc7i3 with 32 GB RAM, 1050 GB SSD and 128GB NVMe. The Kickstart script seems to have some issues if both, ssd and nvme have the same numbers but ssd is 1 scale larger. after installation i got 128GB on my vsanDatastore, as with this issue it used the ssd as cache and nvme as storage... after the 3rd run i just kicked in the correct names for my configuration. Is there any possibility of seeing the logs after installation? I haven´t found anything... my vdq -q output:
[root@esxi01:~] vdq -q
[
{
"Name" : "naa.2020030102060804",
"VSANUUID" : "",
"State" : "Ineligible for use by VSAN",
"Reason" : "Has partitions",
"IsSSD" : "0",
"IsCapacityFlash": "0",
"IsPDL" : "0",
},
{
"Name" : "t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03",
"VSANUUID" : "52aee84a-85f1-55ef-42c7-fa97824005c1",
"State" : "In-use for VSAN",
"Reason" : "None",
"IsSSD" : "1",
"IsCapacityFlash": "1",
"IsPDL" : "0",
},
{
"Name" : "t10.NVMe____Force_MP500_____________________________170379320001225301C900000001",
"VSANUUID" : "52210736-bd44-17aa-0d8b-8f783d31c162",
"State" : "In-use for VSAN",
"Reason" : "None",
"IsSSD" : "1",
"IsCapacityFlash": "0",
"IsPDL" : "0",
},
running your code directly on my esx gives me for the nvme:
localcli storage core device capacity list -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 | tail -1 | awk '{print $5}'
114473
and for the storage ssd:
localcli storage core device capacity list -d t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 | tail -1 | awk '{print $5}'
1001562
And after i found out, that changing the passwords to something with a "$" was not my best idea (they are getting cropped on transfer to the deployvm), my nuc7i3 is running esxi6.5 and vcsa 6.5 on vsan ;o)
HTH
Patric
William Lam says
Hi Patric,
Thanks for giving this a try. Strange to see the larger SSD get used as cache ... I see in your vdq output that the correct device is tagged as capacity, was this something you did or the script? The logic on the capacity comparison can be seen in https://github.com/lamw/usb-to-sddc/blob/master/KS.CFG#L110-L124
In terms of logs, yes you can look at /var/log/esxi_install.log which will log all my syslog entries. You can also add additional entries for additional debugging purposes
Patric Stuerzebecher says
Hi William,
my vdq -q output is my actual output, after i altered the script in lines 136 and 142 with the actual names of my SSDs, so there is no mixup ;o) I also checked /var/log/esxi_install.log, but it only contains a whole copy the script, not the actual logged lines.
I will reinstall the using my esxi6.5 (i also want to add the startech usb driver into installation, so i can use the usb ethernet nic from beginning). I will then post my vdq -q and the parts of esxi_install.log.
btw: thanks for your very good work, its just great 🙂
Patric Stuerzebecher says
Hi again,
as the log is a bit long, i pasted it here:
https://pastebin.com/yGpwq8yt
this is getting logged a couple of times, always with the same input. If i try to grep through the log using some specific wording of you, like "largest" (seen in line 126), i only get this as a result, hard to get anything out with that. There is also no variables filled, so i cant see the number mixup, or even which disk is largest disk.
here is my vdq -q without changing the KS.CFG:
[root@esxi01:~] vdq -q
[
{
"Name" : "naa.2020030102060804",
"VSANUUID" : "",
"State" : "Ineligible for use by VSAN",
"Reason" : "Has partitions",
"IsSSD" : "0",
"IsCapacityFlash": "0",
"IsPDL" : "0",
},
{
"Name" : "t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03",
"VSANUUID" : "5281956e-126d-93b6-b341-f33a48cec4c4",
"State" : "In-use for VSAN",
"Reason" : "None",
"IsSSD" : "1",
"IsCapacityFlash": "0",
"IsPDL" : "0",
},
{
"Name" : "t10.NVMe____Force_MP500_____________________________170379320001225301C900000001",
"VSANUUID" : "523b5481-a0ac-f97e-4d8c-3cd2f5477b52",
"State" : "In-use for VSAN",
"Reason" : "None",
"IsSSD" : "1",
"IsCapacityFlash": "1",
"IsPDL" : "0",
},
]
as you can see, the smaller NVMe flash is used as capacityFlash.
Thanks
Patric
Vincent Han says
Eager to see NSX and vRealize added to the mix!
Vincent Han says
As in can't wait to see NSX, vRealize Automation, Network Insight and Log Insight into the mix!
William Lam says
Patric,
The log you've got doesn't actually show me what the script has done. This is merely the parsing of the script 🙂 What I'm looking for is the actual output messages from these commands and easiest method is to just do grep VSAN-KS /var/log/esxi_install.log since all my messages a pre-fix with "VSAN-KS" and that should give us more info.
You can also enable debugging within the Python script just change L70 https://github.com/lamw/usb-to-sddc/blob/master/KS.CFG#L70 from False to True and that should give us more info as well
Patric Stuerzebecher says
Hi William,
it does not matter, what keyword i use to grep through the esxi_install.log, there is nothing else logged, than the parsed script. It is logged a couple of times (i think 3-4 times). Also i read the logfile to check, if i can see anything, but there is nothing of interest in it 🙁
I saw L70 a couple of days ago and tried to use it, as i thought, i could get my issue figured out alone. But if i activate debug by changing L70 to True, all i get is an error while reading the installation script (just after the installerboot is done) at line 259 (which is the end of python code). As i only have some basic python knowledge, i am not able to figure out, what is wrong, if debug = True.
I just retried it with a fresh git clone and everything, same experience.
William Lam says
Did you make sure to manually remove the vSAN Cluster and clean up the disks so they're un-claim for each new run?
Patric Stuerzebecher says
absolutly, i am using
esxcli vsan storage diskgroup unmount --ssd $SSDNAME
and then delete the file systems using the webclient. i also used a live linux to remove all partitioning data in my tries.
William Lam says
Its actually much easier than that, just run the following two commands and you can reboot afterwards:
esxcli vsan cluster leave
esxcli vsan storage remove -s [INSERT-SSD-CACHE-ID]
I'm still unable to determine why you're having issues nor why you're not seeing the expected logs ... which I think may have to do with the system not being in a clean state. As mentioned, you can see the logic of the python script, it just iterates through devices seen in vdq and then compares the sizes. You can try adding "dryrun" to the top of the kickstart and manually try to run the section in python that does the size comparison to see if you can identify where/how its failing
William Lam says
OK, here's a python script (https://pastebin.com/wmk5jGU2) that you can upload to ESXi host and run which will walk through the exact same logic. It only does print statements.
Just create a normal bootable ESXi image on USB key and when it starts the interactive installer, enable SSH (/etc/init.d/SSH start) and upload this script and then run it and see what it returns
Here's sample output of what it should look like as it iterates through the disks:
[root@localhost:~] python /tmp/simulate.py
Found Disk: naa.6000c2995124c1351611d7f32912cbe2
Running disk capacity command: localcli storage core device capacity list -d naa.6000c2995124c1351611d7f32912cbe2 | tail -1 | awk '{print $5}'
Largest Capacity Disk so far: naa.6000c2995124c1351611d7f32912cbe2 (8192)
Found Disk: naa.6000c29ede38305c2003b1df8b8cbeb8
Running disk capacity command: localcli storage core device capacity list -d naa.6000c29ede38305c2003b1df8b8cbeb8 | tail -1 | awk '{print $5}'
Largest Capacity Disk so far: naa.6000c29ede38305c2003b1df8b8cbeb8 (4096)
Running disk capacity tagging command: localcli vsan storage tag add -d naa.6000c2995124c1351611d7f32912cbe2 -t capacityFlash
Running disk group create command: localcli vsan storage add -s naa.6000c29ede38305c2003b1df8b8cbeb8 -d naa.6000c2995124c1351611d7f32912cbe2
William Lam says
Actually, could you provide the output to the following:
localcli storage core device capacity list -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001
localcli storage core device capacity list -d t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03
Patric Stuerzebecher says
Hi,
i already posted the requested numbers in my first post, i guessed, you would be interested in them 😉
here is the output of the python script:
[root@ilocz17080367:/tmp] python script.py
Found Disk: t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03
Running disk capacity command: localcli storage core device capacity list -d t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 | tail -1 | awk '{print $5}'
Largest Capacity Disk so far: t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 (1001562)
Found Disk: t10.NVMe____Force_MP500_____________________________170379320001225301C900000001
Running disk capacity command: localcli storage core device capacity list -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 | tail -1 | awk '{print $5}'
Largest Capacity Disk so far: t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 (114473)
Running disk capacity tagging command: localcli vsan storage tag add -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 -t capacityFlash
Running disk group create command: localcli vsan storage add -s t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001
here are the requested capacities:
Device Physical Blocksize Logical Blocksize Logical Block Count Size Format Type
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 4096 512 234441648 114473 MiB 512e
Device Physical Blocksize Logical Blocksize Logical Block Count Size Format Type
--------------------------------------------------------------------------------------------------------------------------------------------------------------
t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 512 512 2051200368 1001562 MiB 512n
Patric Stuerzebecher says
Hey William,
just tinkered again (and learnd a lot of python basics, by the way, thats great, too), the main problem is, that diskSize is a string, while using ">" as operator is tricky, so i added some conversions into it. basically i convert the string into an integer and on my logging back to string, so python won´t bug around with printing an integer. My guess is, that as a string the 1001562 is bytewise compared smaller than 114473. I already forked the project with my changes.
HTH
Patric
William Lam says
I just took a look at your repo and changes make sense. I've just pushed it up my repo as well. In future, consider sending pull request 🙂
Jay Humphrey says
I'm trying to install on a SuperMicro e200-8D and am running into an issue with the script. I get an error stating "error:/tmp/ks.cfg:line 260: "/.pre" script returned with an error." When I press enter to continue another screen comes up and says "user-supplied %/.pre script failed. Error 256." then another window says "The system was not installed correctly." Pressing enter after that screen stops the install process then goes to a blank screen so I'm not sure how to troubleshoot the problem since there are no logs for me to view at that point.
Any tips on how to resolve the issue are appreciated.
Regards,
Jay
john says
I recieved the same error, did you end up working this issue out?
Patrick says
Run these two commands and then re-run the installation
esxcli vsan cluster leave
esxcli vsan storage remove -s [INSERT-SSD-CACHE-ID]
Virtual Newb says
I'm receiving the same error as well on the SuperMicro e200-8D. Anyone resolve this? I'm using brand new flash disks and
esxcli vsan cluster leave
esxcli vsan storage remove -s [INSERT-SSD-CACHE-ID]
Doesn't work. Any assistance is appreciated.