Kubernetes: Defining a CronJob for Collecting Log Data

Sebastian
5 min readSep 7, 2020

--

This is the final article about my Kube Log Exporter series. The first article introduced the basic design and how to execute the log exporter from your machine that runs the kubectl commands. The second article explained the necessary ServiceAccounts, ClusterRole and ClusterRoleBinding resources that we need to run the log exporter inside the cluster. And now in this article I explain how to define a cron job that runs the Kube Log Exporter automatically and regularly scheduled.

This article originally appeard at my blog.

Kubernetes CronJobs

A CronJob is in its basic specification similar to a Deployment or ReplicaSet: It defines a set of containers that run, and it defines its run conditions. These conditions are what makes a CronJob a unique Kubernetes resource. Let's briefly discuss the most important ones:

  • scheduleTime - Uses the Linux crontab syntax to precisly define the exact, day, hour, minute and second when the job should run.
  • completions - The number of successful completions that need to be achieved before this cron job is considered successful. Potential use cases are jobs that optimize storage in a database, or clean up files and which may have a different set of success criteria.
  • parallelism - Controls if the the cron job can run in multiple parallel jobs or only sequentially
  • activeDeadlineSeconds - the maximum timespan in which the batch job needs to be finished. If it reaches this limit, the Kublet scheduler stops the cron job and considers it failed

There are many more options available, so take a look at the official Kubernetes documentation as well.

KubeLogExporter CronJob

The KubeLogExporter cron job that I’m using needs to fulfil the following requirements:

  • It needs to run every hour
  • It needs to be successful, e.g. no error when reading or storing a log occurs
  • It needs the use the service account that we discussed in the last article to have the proper access rights to namespaces, pods and logs
  • It needs to use a persistent volume to store the log files, so that independent of the actual node where the job runs the same log files are amended

Let’s develop the CronJob resource definition bit by bit. The first part fulfils the scheduling requirements. The schedule is defines at spec.schedule to run each hour. The job will need to be completed exactly one time (spec.jobTemplate.spec.completions) and will restart in case of an error (spec.jobTemplate.spec.restartPolicy).

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: kube-log-exporter-cron
spec:
schedule: "0 * * * *"
jobTemplate:
spec:
completions: 1
template:
spec:
restartPolicy: OnError
...

The declaration of the ServiceAccount is very simple: We add spec.jobTemplate.spec.serviceAccountName.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: kube-log-exporter-cron
spec:
schedule: "0 * * * *"
jobTemplate:
spec:
completions: 1
template:
spec:
restartPolicy: OnError
serviceAccountName: log-exporter-sa
...

Now we need to add the persistent volume declaration. The volume needs to be mounted only by one process, and I reserve 1Gi size for it.

kind: PersistentVolumeClaim
metadata:
name: log-exporter-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

In the Kubernetes distribution of my choice, K3S, persistent volumes are create automatically when the PersistentVolumeClaim is defined. In another distribution you also need the setup the PersistentVolume, but this is not in the focus of this article.

Now we use the PersistentVolumeClaim to define a volume in spec.jobTemplate.spec.volumes, and then reference this volume as a mounted volume in the container at spec.jobTemplate.spec.containers.volumeMounts.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: kube-log-exporter-cron
spec:
schedule: '0 * * * *'
jobTemplate:
spec:
completions: 1
template:
spec:
serviceAccountName: log-exporter-sa
containers:
- name: kube-log-exporter
image: docker.admantium.com/kube-log-exporter:0.1.9.12
volumeMounts:
- name: log-exporter-volume
mountPath: /etc/kube-log-exporter/logs
restartPolicy: Never
volumes:
- name: log-exporter-volume
persistentVolumeClaim:
claimName: log-exporter-pvc

Executing the CronJob

Now we create the cron job with kubectl create -f kube-log-exporter-cron-job.yaml. Once the job runs (for testing purposes you can also run the job every minute with schedule: /1 ** **), we can see the job history.

kb describe cronjob kube-log-exporter   130 

Name: kube-log-exporter-cron
Namespace: default
Labels: <none>
Annotations: Schedule: 0 * * * *
...Last Schedule Time: Mon, 25 Aug 2020 19:00:00 +0200
Active Jobs: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12m cronjob-controller Created job kube-log-exporter-cron-1590426000
Normal SawCompletedJob 12m cronjob-controller Saw completed job: kube-log-exporter-cron-1590426000, status: Complete
Normal SuccessfulDelete 12m cronjob-controller Deleted job kube-log-exporter-cron-1590415200

And here is an example of the created logfiles.

> ls -la /etc/kube-log-exporter/logs-rw-r--r-- 1 root root    4515 Aug 25 19:00 lighthouse-78cc7475c7-74ctt_lighthouse.log
-rw-r--r-- 1 root root 6012 Aug 25 19:00 lighthouse-78cc7475c7-gcl94_lighthouse.log
-rw-r--r-- 1 root root 6873 Aug 25 19:00 lighthouse-78cc7475c7-k2cv7_lighthouse.log
-rw-r--r-- 1 root root 7634 Aug 25 19:00 lighthouse-78cc7475c7-l7zpv_lighthouse.log
-rw-r--r-- 1 root root 4636 Aug 25 19:00 lighthouse-78cc7475c7-wh2gk_lighthouse.log
-rw-r--r-- 1 root root 25741 Aug 25 19:00 redis-6b746f4d9b-8tjds_redis.log
....
> cat /etc/kube-log-exporter/logs/redis-6b746f4d9b-8tjds_redis.log1:C 25 Aug 2020 16:21:04.675 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 25 Aug 2020 16:21:04.675 # Redis version=6.0.1, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 25 Aug 2020 16:21:04.675 # Configuration loaded
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.1 (00000000/0) 64
.-`` .-```
```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 7139
| `-._ `._ / _.-' | PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
| `-._`-._ _.-'_.-' |
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
1:M 25 Aug 2020 16:21:04.678 # Server initialized

CronJob: Complete Resource Defintion

Here is the complete version again.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: kube-log-exporter-cron
spec:
schedule: '0 * * * *'
jobTemplate:
spec:
completions: 1
template:
spec:
serviceAccountName: log-exporter-sa
containers:
- name: kube-log-exporter
image: docker.admantium.com/kube-log-exporter:0.1.9.12
args: ['node', 'cluster.js']
volumeMounts:
- name: log-exporter-volume
mountPath: /etc/kube-log-exporter/logs
restartPolicy: Never
volumes:
- name: log-exporter-volume
persistentVolumeClaim:
claimName: log-exporter-pvc
imagePullSecrets:
- name: registry-secret

Conclusion

Kubernetes CronJob define periodic scheduled tasks in your cluster. Typical use cases are maintenance tasks such as leaning files, updating index or collect data. If you want to store log data in plain files, then using a cron job is a straightforward solution. This article showed how to define a cron job that uses the KubeLogExplorer to persists Pod log data in files as a persistent volume.

--

--