As much as I enjoy kubectl'ing logs in real time for troubleshooting and debugging purposes, this usually does not scale beyond a couple of Kubernetes (K8s) Clusters if you are lucky. Even then, you will not retain any of the historical logs which may be required for deeper analysis or for auditing purposes. This is usually solved by having a centralized log management platform and while working with Tanzu Kubernetes Grid (TKG) running on VMware Cloud on AWS, a solution like vRealize Log Insight Cloud (vRLIC) makes a lot of sense.
While browsing through the vRLIC console, I noticed that it supports a number of log sources including K8s which was exactly what I was looking for. However, after going through the instructions in configuring fluentd on my TKG Cluster, I found that that nothing was being sent. After a bit of debugging, I realized a few steps were actually missing that was required to setup this up on TKG Cluster.
I eventually figured it out and will be sharing this feedback with the vRLIC folks but in the meantime, you can follow the instructions below on how to forward both system and application logs from your TKG Cluster or any K8s deployment for that matter which has outbound connectivity to connect to vRLIC.
Step 1 - Create a new API key that will be used to send the logs to vRLIC
Step 2 - Next, we need to create a new ServiceAccount called fluentd-lint-logging that will be used to access K8s system and application logs which will be mapped to a specific ClusterRole using a ClusterRoleBinding as shown in the snippet below. The following command will create the required sections into a file called rbac.yml
cat > rbac.yml << EOF --- apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: fluentd name: fluentd-lint-logging namespace: kube-system --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: fluentd-clusterrole rules: - apiGroups: - "" resources: - "namespaces" - "pods" verbs: - "list" - "get" - "watch" --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: fluentd-clusterrole roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: fluentd-clusterrole subjects: - kind: ServiceAccount name: fluentd-lint-logging namespace: kube-system EOF
Step 3 - Create our new ServiceAccount by running the following command:
kubectl apply -f rbac.yml
Step 4 - Run the following command to create the fluent.conf which will specifies how to collect the the system and application logs from our TKG Clusters. You will need to edit the file after wards and replace FILL-ME-IN with the API Key you had retrieve in Step 1.
cat > fluent.conf << EOF <source> @id in_tail_container_logs @type tail path /var/log/containers/*.log pos_file /var/log/fluentd-containers.log.pos tag raw.kubernetes.* read_from_head true <parse> @type multi_format <pattern> format json time_key time time_format %Y-%m-%dT%H:%M:%S.%NZ </pattern> <pattern> format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/ time_format %Y-%m-%dT%H:%M:%S.%N%:z </pattern> </parse> </source> # Detect exceptions in the log output and forward them as one log entry. <match raw.kubernetes.**> @id raw.kubernetes @type detect_exceptions remove_tag_prefix raw message log stream stream multiline_flush_interval 5 max_bytes 500000 max_lines 1000 </match> # Concatenate multi-line logs <filter **> @id filter_concat @type concat key message multiline_end_regexp /\n$/ separator "" </filter> # Enriches records with Kubernetes metadata <filter kubernetes.**> @id filter_kubernetes_metadata @type kubernetes_metadata watch false </filter> <match **> @type vmware_log_intelligence endpoint_url https://data.mgmt.cloud.vmware.com/le-mans/v1/streams/ingestion-pipeline-stream verify_ssl false <headers> Content-Type application/json Authorization Bearer FILL-ME-IN structure simple </headers> <buffer> chunk_limit_records 300 flush_interval 3s retry_max_times 3 </buffer> <format> @type json tag_key text </format> </match> EOF
Step 5 - To make our fluent.conf available to K8s, we need to create a ConfigMap resource by running the following command:
kubectl -n kube-system create configmap lint-fluent-config --from-file=fluent.conf
Step 7 - Run the following command to create our fluentd daemonset YAML which references our ConfigMap from the previous step:
cat > lint-fluent.yml << EOF apiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd-lint-logging namespace: kube-system labels: k8s-app: fluentd-lint-logging app: fluentd-lint-logging version: v1 kubernetes.io/cluster-service: "true" spec: selector: matchLabels: name: fluentd-lint-logging template: metadata: labels: name: fluentd-lint-logging app: fluentd-lint-logging version: v1 kubernetes.io/cluster-service: "true" spec: serviceAccount: fluentd-lint-logging serviceAccountName: fluentd-lint-logging tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: fluentd-lint image: docker.io/vmware/log-intelligence-fluentd command: ["fluentd"] env: - name: FLUENTD_ARGS value: --no-supervisor -q resources: limits: memory: 500Mi requests: cpu: 100m memory: 200Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlogcontainers mountPath: /var/log/containers - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: config-volume mountPath: /etc/fluent volumes: - name: varlog hostPath: path: /var/log - name: varlogcontainers hostPath: path: /var/log/containers - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: config-volume configMap: name: lint-fluent-config - name: lint-fluent-volume emptyDir: {} - name: var-logs emptyDir: {} EOF
Step 8 - Finally, we deploy fluentd into our TKG Cluster by running the following command:
kubectl apply -f lint-fluent.yml
If everything was setup correctly, we should see logs from our TKG Cluster as shown in the screenshot below
scott rosenberg says
Would this work with on prem vrli as well?