For troubleshooting your vSphere with Tanzu environment, you may have a need to SSH to the Control Plane of your Tanzu Kubernetes Grid (TKG) Cluster. This was something I had to do to verify some basic network connectivity. At a high level, we need to login to our Supervisor Cluster and retrieve the SSH secret to our TKG Cluster and since this question recently came up, below are the instructions.
UPDATE (10/10/20) - It looks like it is also possible to retrieve the TKG Cluster credentials without needing SSH directly to the Supervisor Control Plane VM, see Option 1 for the alternate solution.
Option 1:
Step 1 - Login to the Supervisor Control Plane using the following command:
kubectl vsphere login --server=172.17.31.129 -u *protected email* --insecure-skip-tls-verify
Step 2 - Next, we need to retrieve the SSH password secret for our TKG Cluster and perform a base64 decode to retrieve the plain text value. You will need two pieces of information and then substitute that into the command below
- The name of your vSphere Namespace which was created in your vSphere with Tanzu environment, in my example it is called primp-industries
- The name of your TKG Cluster, in my example it is called william-tkc-01 and the secret name will be [tkg-cluster-name]-ssh-password as shown in the example below
kubectl -n primp-industries get secrets william-tkc-01-ssh-password -o jsonpath={.data.ssh-passwordkey} | base64 -d
Step 3 - Finally, you can now SSH to TKG Cluster from a system which has network connectivity, this can be from the Supervisor Cluster Control Plane VM or another system. The SSH username for the TKG Cluster is vmware-system-user and use the credentials that was provided from the previous screen.
Option 2:
Step 1 - SSH to the VCSA and then run the following script to retrieve the Supervisor Cluster Control Plane VM credentials:
/usr/lib/vmware-wcp/decryptK8Pwd.py
Step 2 - SSH to the IP Address using root username and the password provided from the previous command
Step 3- Next, we need to retrieve the SSH password secret for our TKG Cluster and perform a base64 decode to retrieve the plain text value. You will need two pieces of information and then substitute that into the command below
- The name of your vSphere Namespace which was created in your vSphere with Tanzu environment, in my example it is called primp-industries
- The name of your TKG Cluster, in my example it is called william-tkc-01 and the secret name will be [tkg-cluster-name]-ssh-password as shown in the example below
kubectl -n primp-industries get secrets william-tkc-01-ssh-password -o jsonpath={.data.ssh-passwordkey} | base64 -d
Step 4 - Finally, you can now SSH to TKG Cluster from a system which has network connectivity, this can be from the Supervisor Cluster Control Plane VM or another system. The SSH username for the TKG Cluster is vmware-system-user and use the credentials that was provided from the previous screen.
Paul says
Hi William, thank also for this ver helpfull post. Do you have any hint for shuting down gracefull the Tanz Workload Environment? I test it a lot and have found out, that after a "normal" shutdown of the ESX Cluster the Workload did not work any more... Thanks a lot a always! Paul
William Lam says
Its nothing something I've really looked into, but fellow colleague Ryan Johnson did look into this when we first released vSphere w/Tanzu using NSX-T (since HAProxy wasn't out at the time). You could see if this still applies https://tenthirtyam.org/2020/07/20/shutdown-startup-vk8s-wld/
Paul says
Thanks a lot William, I will try it out and maybe post here the results!
Paul says
Update: It works perfect. Thank you so much. The important hint is, that the vcenter must not be powered on before the ESX hosts are online. This was my fault. Now it works perfect. Thank you again! Paul
Paul says
Unfortunatly I have to say, that it dos not work perfekt... After startup the supervisor nodes become ready, the tkg control-node nodes also. But the worker node have the problem, that the seams to be reade for about a minute, the the node become not-ready. I tried a few time always the same problem. The only solution i found so far is to delete the tlg and redeploy it (which is not really a solution..) Maybe you have time to test a shutdown, would be intresting if you face the same problems. Testet with 17.7 and 18.5 Thanks a lot Paul
Paul says
William I would have another question: In trying to automate the installation of Tanzu I have this problem: After installing Tanzu (thanks for your skripts again), I would also automatically install a tkg cluster. Since we have to use the kubectl commands to do so, I would be great if kubectl vshpere login would also have an --password option. I could not manage it to automate also this, since the Password has to be entered manually. I am sure you have here also an idea how to automate this. Thanks a lot Paul