Right before going on PTO, I caught this really interesting tweet from my buddy Robert Guske that we now support building your own custom Tanzu Kubernetes Releases (TKR), the Kubernetes software distributions that is signed and supported by VMware, which is typically provided by VMware through the online TKR Content Library.
Dear vSphere with Tanzu (TKGS) users - do you know that building your own TKG node image is now supported with our latest #vSphere 8 U1 update? š#VMware #vExperthttps://t.co/pxVbPJzmYh
— Robert Guske (@vmw_rguske) June 29, 2023
While there are already a number of existing customizations that can be applied when deploying a Tanzu Kubernetes Workload Cluster (TKC), there may still be certain VM configurations that you would like to add, which is simply not possible today. In some of the customer requests, it can be as simple as changing the default size of the primary disk for a TKR, which is statically configured today as 20GB.
With this and many other use cases, it is nice to see that we now finally provide customers with a supported method to build their own custom TKR that might include additional customizations that is required by their organization for use with vSphere with Tanzu.
I recently got a chance to play with the new vSphere Tanzu Kubernetes Grid Image Builder tool, which is also an open source project from VMware and leverages the existing Kubernetes Image Builder, which I have also used before (see this blog post HERE for more details). While getting started, it took me a few tries but I eventually got it working after speaking with the Developers as I ran into a few issues.
The TKG Image Builder tool uses the popular Packer tool for building the TKR images and you will need to have the following prerequisites setup before getting started. As of writing this blog post, the TKG Image Builder currently only supports one version of Kubernetes (v1.24.9) which we support building a custom TKR from. As new TKR versions are supported with vSphere with Tanzu, you will see the same versions included in the TKG Image Builder for customization and these will be in sync starting from the v1.24.9 release.
Disclaimer: While using a custom TKR is officially supported by VMware, due to the nature of any user customizations, support is provided as best effort depending on the type of change that has been added to the default TKR images. The best place to start is by filing a Github Issue in the repo as this is the best way to engage with the developers of the project.
One issue that I immediately ran into while attempting to build from my local machine, which is a macOS system is that TKG Image Builder tool selected the wrong network interface for serving out the build files and it currently does not play nicely if you have additional virtual interfaces, which I did as both Docker was running as well as VMware Fusion and other VPN services. Multiple network interfaces on the build system can now be used with the latest fix by creating the following Jinja entry that specifies the HTTP port using the container IP Address.
{ "http_port_max": "{{ packer_http_port }}", "http_port_min": "{{ packer_http_port }}", "http_ip": "{{ artifacts_container_ip }}" }
In speaking with the Developer, the short term solution was to run the build from a dedicated Build VM, that is configured with just a single network interface. This known issueĀ is already being worked on HERE, which should benefit users once the change has been merged.
In my setup, I decided to just deploy a basic Photon OS 5.0 VM (configured with 40GB of storage) and for ease of setup, you can run the following commands which will prepare the VM for use with the TKG Image Builder
systemctl disable iptables;systemctl stop iptables systemctl enable docker;systemctl start docker tdnf -y install git make jq
Step 1 - Clone TKG Image Builder repo and change into the directory:
git clone https://github.com/vmware-tanzu/vsphere-tanzu-kubernetes-grid-image-builder.git
cd vsphere-tanzu-kubernetes-grid-image-builder
Step 2 - You will need to edit packer-variables/vsphere.j2 which contains the various settings for your vCenter Server endpoint.
Note: These j2 extension files are Jinja templating files that is used by TKG Image Builder to ingest user input and then applied to build process. This is important for anyone familiar with Packer syntax, that not all Packer syntax and variables are exposed as "dot" notation variables can not be leverage due to this templating process.
Step 3 - If you do not have an ESXi host that can support up to 16 vCPU, you will also need to edit packer-variables/default-args.j2 and change the cpu property to something lower such as 8. Technically, the build process does not actually require that large of a VM, which I have already filed an issue, but this bite me as I can only deploy up to a 12 vCPU and the error message was a little miss-leading as it mentions it could not power on the VM as it could not find a valid host. The resource issue I had filed has now been resolved and you only need 2 vCPU and 2GB of memory!
Step 4 - The cool thing about the Jinja files is that you can easily add more and it will automatically get picked up as part of the build.Ā For my custom TKR, I want to change the default TKR disk from 20GB to 60GB. To do so, you would create a new .j2 file, I named mines custom.j2 which should be placed in packer-variables/custom.j2Ā and the content should look like the following:
{ "disk_size": "61440" }
Note: The TKG Image Builder uses the Packer vsphere-iso provider and you can influence the settings by using any of the supported parameters which is supported with that provider. You can also refer to the Kubernetes Image Builder manifest files for the node image to see a list of parameters that is used and this is where I derived the used of the disk_size property.
Step 5 (Optional) - If you wish to modify additional VM configurations provided by vSphere, you can tweak the hack/tkgs_ovf_template.xml file and simply append valid VMX entries after this LINE. Depending on your configuration change, you can also use supported parameters from vsphere-iso provider, but the former might be easier for folks who may not be familiar with Packer.
Step 6 - Run the following command to view the available Kubernetes versions, today there is only one and in future there will be more:
make list-versions
Step 7 - Run the following command with the specified version of Kubernetes which will then down the container that will be used to build that version of the TKR:
make run-artifacts-container KUBERNETES_VERSION=v1.24.9+vmware.1
Step 8 - Finally, run the following command to build your TKR which will need to include the IP Address of the machine that is running the build container (from previous step) along with an unused port number andĀ the local path on where to save the final TKR OVA:
make build-node-image OS_TARGET=photon-3 KUBERNETES_VERSION=v1.24.9+vmware.1 TKR_SUFFIX=byoi ARTIFACTS_CONTAINER_IP=192.168.30.95 IMAGE_ARTIFACTS_PATH=/root/image ARTIFACTS_CONTAINER_PORT=8081
Note: In the example above, I am building a Photon OS TKR, but you can also select the Ubuntu target based on the output from Step 6 and you can also modify the TKR suffix which will be included in the TKR metadata.
Step 9 - Once the build starts, all output will be in the build container and you will see a command suggestion on how to view the logs which will be to run the following command:
docker logs -f v1.24.9---vmware.1-photon-3-image-builder
For those familiar with Packer, this output should look familiar and it will start off by uploading the ISO image to your vSphere environment and then creating the VM and powering it on to begin the deployment. You can monitor this for progress as well as look at the VM Console to check if there are any connectivity issues between the machine you are running the build container and vSphere endpoint.
If you do run into failures, for example if you had a typo, you need to "clean" up the build container by running the following command each time prior to starting the build
make clean-containers LABEL=byoi_image_builder
The TKR build time will vary based on your available resources but a successful build should look like the following screenshot as shown below:
If you look at the local path that you had provided in the build step, you find your custom TKR OVA as shown in screenshot below:
To make use of our custom TKR in our vSphere with Tanz setup, we just need to upload the new OVA to a local vSphere Content Library that you will need to create if you have not already. Ensure you also update the TKR Content Library for the specific Supervisor Cluster from the default VMware TKR Content Library to your newly created vSphere Content Library.
Using kubectl, we can now discover our custom TKR by running the following command:
kubectl get tkr
Next, we construct a TKC manifest referencing our custom TKR like the following:
apiVersion: run.tanzu.vmware.com/v1alpha1 kind: TanzuKubernetesCluster metadata: name: william-tkc-01 namespace: primp-industries spec: distribution: version: v1.24.9---vmware.1-byoi settings: network: cni: name: antrea pods: cidrBlocks: - 193.0.2.0/16 serviceDomain: managedcluster.local services: cidrBlocks: - 195.51.100.0/12 topology: controlPlane: class: best-effort-small count: 1 storageClass: vsan-default-storage-policy workers: class: best-effort-small count: 1 storageClass: vsan-default-storage-policy
and then we can run the following command to deploy TKC:
kubectl apply -f byoi-tkc.yaml
We can check on the progress of our TKC deployment by running the following:
kubectl get tkc
As you can see from screenshot below, our custom TKR has successfully deployed and is configured with a much larger disk compared to the default TKR that VMware provides š
While the example above is very basic example, this has also been a very common request on being able to have a larger disk for a given TKR. I am sure there are many more interesting use cases and with the TKG Image Builder, you can now take advantage of more advanced configurations from the vSphere platform and enable that for your modern workloads!
I definitely will be interested to see what our users do with TKG Image Builder and if you have any feedback or comments, feel free to share it here and I will make sure it reaches the TKG Image Builder development team.
Saravanan Subbiah says
When I run "make run-artifacts-container" I get the following error:
Using default port for artifacts container 8081
Error response from daemon: No such container: v1.25.7---vmware.3-fips.1-artifacts-server
make: *** [run-artifacts-container] Error 1
When I lookup supported-version.json, the artifacts_url is present there seems resolvable from the node. When I run docker pull manually against the URL it runs too. Not sure how the v1.25.7+vmware.3-fips.1 gets converted to the container name above.
Also, I am trying to send in a post-processor command to the image. Would that work the same way, by just creating a j2 file with appropriate packer syntax?