Hybrid (x86 and Arm) Kubernetes clusters using Tanzu Community Edition (TCE) and ESXi-Arm

With the recent introduction of Tanzu Community Edition (TCE), users can now easily get first hand experience across VMware's Tanzu portfolio, including VMware's Enterprise Kubernetes (K8s) runtime called Tanzu Kubernetes Grid (TKG), all completely for free. One popular request that frequently comes up from our community is the ability to use TCE with the ESXi-Arm Fling.

Currently, TCE is only supported with x86 hardware platforms which includes ESXi-x86 and there is certainly a desire to be able to use TCE with Arm-based hardware running on top of ESXi-Arm, especially with inexpensive Raspberry Pi for learning and exploration purposes.

I recently came to learn about a really cool project that is being developed as part of VMware's Office of the CTO (OCTO) for a new Cluster API (CAPI) provider where you can Bring Your own Host (BYOH) that is already running Linux. What really intrigued me about their project was not the fact that they could create a TCE Workload Cluster that comprised of physical hosts but the fact that they were actually running on Arm hardware! 🤩

My immediate reaction was to see if this would also work with just Linux VMs? With some trial/error and help from Jixing Jia, one of the project maintainers, I was able to confirm that this indeed does works using Ubuntu VMs running on ESXi-Arm. What was even more impressive was the realization that this not only works for both physical and virtual Arm Linux systems, but that users could also create a hybrid TCE Workload Cluster that consists of BOTH x86 and Arm nodes! 🤯

I can only imagine the possibilities that this could enable in the future where application(s) could potentially span across CPU architecture, virtual and physical worker nodes which exposes different capabilities that can then be delivered based on the requirements of the application such as GPU as an example. It will be interesting to see the types of use cases the BYOH Cluster API Provider will help enable, especially pertaining to Edge computing.

If you are interested in playing with the BYOH Cluster API Provider, check out the detailed instructions below on how to get started. Since this is still currently in Alpha development, there are still a few manual steps and currently there is no native TCE integration. If this is something that is interesting to you, feel free to leave any feedback or better yet, leave comments directly on the Github repo asking for feature enhancements that you would like to see such as native support for TCE 😀

Step 1 - Install Tanzu Community Edition (TCE) Management Cluster using the Managed Cluster option running on vSphere (x86)

Step 2 - Setup Arm VM for TCE Workload Cluster using the latest Ubuntu (21.10) Arm ISO and perform a standard OS installation into ESXi-Arm VM. In my example, I have two VMs called: ubuntu-arm-vm-1 and ubuntu-arm-vm-2, each configured with 2 vCPU, 4GB memory & 16GB storage running on a single Raspberry Pi 4 (8GB).

Once the Ubuntu VMs have been installed, you will need to prepare them before they can be used with the BYOH Cluster API Provider which simply requires kubeadm, kubelet and containerd. You can run the following commands and once the OS has rebooted, it will be ready.

swapoff -a
rm /swap.img
sed -i "/swap.img/d" /etc/fstab

cat <<EOF | tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

cat <<EOF | tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

apt-get update;apt-get install -y containerd
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
sed -i 's/SystemdCgroup.*/SystemdCgroup = true/g' /etc/containerd/config.toml

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
add-apt-repository 'deb https://apt.kubernetes.io/ kubernetes-xenial main' -y
apt update

apt install -y kubelet=1.22.0-00 kubeadm=1.22.0-00 kubectl=1.22.0-00
apt-mark hold containerd kubelet kubeadm kubectl
systemctl enable kubelet.service
systemctl enable containerd.service

reboot

Step 3 - Before we can install the BYOH Cluster API Provider, we first need to upgrade the Cluster API version which TCE uses from v1alpha3 to v1beta1. Download the latest clusterctrl binary to your local system which has access to your TCE Management Cluster and add that to your local system path

chmod +x clusterctl
mv clusterctl /usr/local/bin

Run the following command to verify that we can upgrade:

clusterctl upgrade plan

Before starting the upgrade, you also need to set the following two environment variables for the vSphere credentials used for your TCE Management Cluster

export VSPHERE_PASSWORD=VMware1!
export VSPHERE_USERNAME=*protected email*

then run the following to start the Cluster API upgrade:

clusterctl upgrade apply --contract v1beta1

Once the upgrade has completed, we can confirm that our TCE Management Cluster is running the latest Cluster API version which will be v1.0.1 as of writing this blog post by running the following command:

tanzu mc get

Step 4 - To install the BYOH Cluster API Provider, we need to first add it to our clusterctl repository by running the following command:

cat > ~/.cluster-api/clusterctl.yaml <<EOF
providers:
  - name: byoh
    url: https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/releases/latest/infrastructure-components.yaml
    type: InfrastructureProvider
EOF

We can then verify that our new provider is in repository by running the following command:

clusterctl config repositories

Next, run the following command to install the BYOH Cluster API Provider into the TCE Management Cluster:

clusterctl init --infrastructure byoh

Lastly, we need to make a copy of the TCE Management Cluster kubeconfig file, in this example is it named management.conf and copy (SCP) that to each of your Ubuntu Arm VMs.

cp ~/.kube/config management.conf

Step 5 - Next, we need to clone the BYOH Cluster API Provider Github repo and build an Arm version of the BYOH Agent

git clone https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost.git
cd cluster-api-provider-bringyourownhost
sed -i .bak "s/amd64/arm64/g" Makefile
make host-agent-binaries

Once completed, you should see the file under bin/byoh-hostagent-linux-arm64 and you will need to copy (SCP) that copy (SCP) that to each of your Ubuntu Arm VMs.

Note: It is currently recommended to also build your own x86 image as well since the latest version of the BYOH Cluster API Provider contains the -skip-installation argument which the current pre-built x86 binary does not currently support.

Step 6 - SSH to each of your Ubuntu Arm VMs and confirm that you have both the management.conf (from Step 3) and byoh-hostagent-linux-arm64 (from Step 4) file.

To make our Ubuntu Arm VM available for consumption using the BYOH Cluster API Provider, we need to start the BYOH Agent by running the following command on each system:

./byoh-hostagent-linux-arm64 -kubeconfig management.conf -skip-installation true > byoh-agent.log 2>&1 &

It may also useful to tail the byoh-agent.log file to monitor for progress and/or errors.

Step 7 - Navigating back to your local system which has kubectl access to your TCE Management Cluster, we should now be able to see each of our Ubuntu VM register with BYOH Cluster API Provider when listing all BYOH using the following command:

kubectl get byoh

Since the BYOH Cluster API Provider is not officially supported by TCE, which also means it is not integrated into the Workload Cluster provisioning commands, we will need to use clusterctl to create our YAML specification for our BYOH Arm Workload Cluster and make a few minor edits before we can deploy.

For ease of use, fill out the following environment variables (including CONTROL_PLANE_ENDPOINT which needs to reside on the same line as the clusterctl command or export the variable).

CONTROL_PLANE_ENDPOINT - IP Address for the BYOH Arm Cluster Control Plane
K8S_VERSION - This is the K8s version that we wish to use AND must match the version from kubeadm from Step 2
CIDR_BLOCK - Pod CIDR block (non-routable) and should not overlap with any of your existing networks (default is 192.168.0.0/16 which actually caused a conflict and took some time to debug and hence this is extracted out as a variable)
CONTROL_PLANE_COUNT - This is the number of BYOH that will be used for control plane
WORKER_COUNT - This is the number of BYOH that will be used for worker nodes

K8S_VERSION=v1.22.0
CIDR_BLOCK=172.30.0.0/16
CONTROL_PLANE_COUNT=1
WORKER_COUNT=1

CONTROL_PLANE_ENDPOINT_IP=192.168.30.151 clusterctl generate cluster byoh-arm-cluster \
    --infrastructure byoh \
    --kubernetes-version ${K8S_VERSION} \
    --control-plane-machine-count ${CONTROL_PLANE_COUNT} \
    --worker-machine-count ${WORKER_COUNT} > byoh-arm-cluster.yaml

sed -i .bak -e "/cgroup-driver: cgroupfs/d" byoh-arm-cluster.yaml
sed -i .bak "s#192.168.0.0/16#${CIDR_BLOCK}#g" byoh-arm-cluster.yaml

Lastly, we are now ready to deploy our BYOY Arm Workload Cluster by running the following command:

kubectl apply -f byoh-arm-cluster.yaml

To monitor for progress, you can watch the byoh-agent.log file on each of the Ubuntu Arm VM as well as using:

kubectl get machine

This can take a few minutes depending on your setup but if everything was configured correctly, we should now see a running status for all BYOH system as shown in the screenshot above. You will also notice which Ubuntu Arm VM have been mapped to the specific BYOH Arm Cluster such as ubuntu-arm-vm-1 happens to map to control plane node and ubuntu-arm-vm-2 happens to map to worker node.

Note: In the future, it will be possible to use K8s labels to associate specific systems to either a control plane and/or worker node. Right now, this is performed dynamically and users do not have any control on which system will be used for a control plane or worker node.

Step 8 - Before we can connect to our new BYOH Arm Workload Cluster using the tanzu CLI, we need to first deploy a CNI (Container Network Interface) plugin. This is most evident when you run the following tanzu command and you will notice both the control plane and worker nodes are still in creating status, which is to be expected:

tanzu cluster list

SSH to the Ubuntu Arm VM running the control plane function and run the following command to install the Antrea CNI. You certainly install other supported K8s CNI, but in this example, I will be using Antrea.

mkdir ~/.kube
cp /etc/kubernetes/admin.conf ~/.kube/config
kubectl apply -f https://github.com/antrea-io/antrea/releases/download/v1.4.0/antrea.yml

Once CNI is up and running, we can verify that the BYOH Arm Workload Cluster is now fully functional by running: tanzu cluster list

Step 9 - To finally use our new BYOH Arm K8s Workload Cluster, we just use the tanzu CLI to retrieve the kubeconfig and then switch to cluster context as show in the commands below:

tanzu cluster kubeconfig get byoh-arm-cluster --admin
kubectl config use-context byoh-arm-cluster-admin@byoh-arm-cluster

kubectl get nodes ubuntu-arm-vm-1 -o jsonpath='{.items[*].status.nodeInfo.architecture}'

As we can see from the above screenshot, we now have a functional k8s cluster and both the control plane and worker node shows arm64 as the architecture.

In addition to a pure Arm workload cluster, the BYOH Cluster API Provider can also enable an interesting scenario where a hybrid of both x86 and Arm is used within the same workload cluster as shown below.

The ability to mix and match between CPU architectures along with both physical and virtual nodes to construct a k8s workload cluster will definitely open up some interesting capabilities and use cases.

More from my site

Thanks for the comment!Cancel reply