As part of the VMware Event Broker Appliance (VEBA) project, I was recently evaluating a newer version of Kubernetes (v1.21.3) and also switching the container runtime from Docker to Containerd. I figured this probably should not be that difficult, especially since we are already use Containerd within Tanzu Kubernetes Grid (TKG) which is our commercial Kubernetes (k8s) offering and that base OS is VMware Photon OS. How hard could this be, right!? (famous last words) 😂
We use kubeadm to setup K8s and read in a very basic configuration file and after following the official K8s instructions for prepping the environment to use containerd, I was surprised when I ran into the following error:
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
- 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
Unfortunately, this lead me down a huge rat hole of troubleshooting and trying various configurations and suggestions from the Internet. Ultimately, none of the suggested solutions solved my problem. After exhausting all my options and spending more time that I would like to admit, I decided to ask in the Kubernetes Slack community to see if anyone might have an idea. There were not any specific suggestions that helped me understand the issue further but there was a question about how Containerd came to be on the system and that gave me one more thing to try.
Both Photon OS 3.0 and 4.0 ships with Containerd and after installing the desired kubeadm, kubectl and kubelet, I had wrongfully assumed that the version of Containerd would simply work.
It turns out the issue was with the version of Containerd (v1.4.4) that was probably not compatible with K8s v1.21.3 but sadly, I was not able to find any sort of compatibility matrix that stated as such and what version is required and hence I did not think it was related to version of Containerd on the system. The workaround is to download the latest version of Containerd, the latest release as of this blog post is v1.5.4 and that immediately solved the problem! 😩
I have already reported this issue internally to the Photon OS team, given there is not a clear error message from K8s stating the version of Containerd can not be used, I can see this come up again in the future. The Photon OS team will be looking to see if they can bump the supported version of Containerd. Lastly, I was also browsing through the K8s Image Builder tool which is used to build out the TKG OVA images and I saw that they also manually downloaded Containerd rather than rely on the system default, which gave me confidence that this was ultimately the correct solution.
For those interested in the step by steps, you can find the instructions below:
Step 1 - Remove both Docker and existing version of Containerd
tdnf -y remove docker containerd
Step 2 - Download and extract the latest Containerd release into /usr/bin
tar -zxvf containerd-1.5.4-linux-amd64.tar.gz -C /usr
Step 3 - Create the systemd startup file for Containerd and then enable/start the service:
cat > /usr/lib/systemd/system/containerd.service <<EOF [Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/bin/containerd Restart=always RestartSec=5 KillMode=process Delegate=yes OOMScoreAdjust=-999 LimitNOFILE=1048576 # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity [Install] WantedBy=multi-user.target EOF systemctl enable containerd systemctl start containerd
Step 4 - Configure the required kernel module and load it:
cat > /etc/modules-load.d/containerd.conf <<EOF br_netfilter EOF modprobe br_netfilter
Step 5 - Configure the required kernel settings and load it without rebooting:
cat > /etc/sysctl.d/99-kubernetes-cri.conf <<EOF net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF sysctl --system
Step 6 - Override the default Containerd settings provided by Photon OS to simply use the containerd defaults:
containerd config default > /etc/containerd/config.toml
systemctl restart containerd
Step 7 - Run kubeadm init to begin the K8s setup and in a few minutes, everything should be successfully configured
kubeadm init --ignore-preflight-errors SystemVerification --skip-token-print --config config.yaml