Protect Rok System Pods¶
This guide describes the necessary steps to patch an existing Rok cluster on Kubernetes, in order to protect essential Rok System Pods from being terminated in case of a CPU pressure scenario.
Overview
What You’ll Need¶
- A configured management environment.
- An existing Kubernetes cluster.
- A working Rok deployment.
Procedure¶
Get the version of your Rok Operator:
root@rok-tools:~# kubectl get -n rok-system sts rok-operator --no-headers \ > -o custom-columns=:.spec.template.spec.containers[0].image gcr.io/arrikto-deploy/rok-operator:release-1.1-l0-release-1.1If the image tag of your Rok Operator is release-1.1-l0-release-1.1 or newer, you may proceed to the Verify section.
Watch the
rok-csi-controller
logs and ensure that no pipelines or snapshot policies are running, namely nothing will be logged for 30 secs:root@rok-tools:~# kubectl -n rok logs -l app=rok-csi-controller -c csi-controller -f --tail=100Scale down the
rok-operator
StatefulSet:root@rok-tools:~# kubectl -n rok-system scale sts rok-operator --replicas=0 statefulset.apps/rok-operator scaledEnsure
rok-operator
has scaled down to zero:root@rok-tools:~# kubectl -n rok-system get sts rok-operator NAME READY AGE rok-operator 0/0 2hScale down the
rok-csi-controller
StatefulSet:root@rok-tools:~# kubectl -n rok scale sts rok-csi-controller --replicas=0 statefulset.apps/rok-csi-controller scaledEnsure
rok-csi-controller
has scaled down to zero:root@rok-tools:~# kubectl get -n rok sts rok-csi-controller NAME READY AGE rok-csi-controller 0/0 2hWatch the
rok-csi-node
logs and ensure that all pending operations have finished, namely nothing will be logged for 30 secs:root@rok-tools:~# kubectl -n rok logs -l app=rok-csi-node -c csi-node -f --tail=100Delete the
rok-csi-node
DaemonSet:root@rok-tools:~# kubectl -n rok delete ds rok-csi-node daemonset.apps "rok-csi-node" deletedSpecify the image for the new
rok-operator
, which will assign appropriate CPU requests to all Rok and Rok CSI resources:root@rok-tools:~# export ROK_OPERATOR_IMAGE=gcr.io/arrikto-deploy/rok-operator:release-1.1-l0-release-1.1Patch
rok-operator
to pull the new image:root@rok-tools:~# kubectl -n rok-system patch sts rok-operator \ > --patch "{\"spec\": {\"template\": {\"spec\": {\"containers\": [{\"name\": \"rok-operator\", \"image\": \"${ROK_OPERATOR_IMAGE}\"}]}}}}" statefulset.apps/rok-operator patchedScale back up
rok-operator
to its initial size to recreate the Rok and Rok CSI resources:root@rok-tools:~# kubectl -n rok-system scale sts rok-operator --replicas=1 statefulset.apps/rok-operator scaled
Verify¶
Ensure that the Rok cluster is up and running:
root@rok-tools:~# watch kubectl get rokcluster -n rok NAME VERSION HEALTH TOTAL MEMBERS READY MEMBERS PHASE AGE rok release-1.1-l0-release-1.1-rc5 OK 3 3 Running 2hEnsure that
rok
,rok-csi-node
androk-csi-controller
now have CPU requests:root@rok-tools:~# kubectl get -n rok daemonset rok --no-headers \ > -o custom-columns=:.spec.template.spec.containers[*].resources map[requests:map[cpu:1 memory:2Gi]]root@rok-tools:~# kubectl get -n rok daemonset rok-csi-node --no-headers \ > -o custom-columns=:.spec.template.spec.containers[*].resources map[requests:map[cpu:450m memory:900Mi]],map[requests:map[cpu:50m memory:100Mi]]root@rok-tools:~# kubectl get -n rok sts rok-csi-controller --no-headers \ > -o custom-columns=:.spec.template.spec.containers[*].resources map[requests:map[cpu:100m memory:125Mi]],map[requests:map[cpu:100m memory:125Mi]],map[requests:map[cpu:100m memory:125Mi]],map[requests:map[cpu:200m memory:250Mi]]
Summary¶
You have successfully patched all Rok System pods with CPU requests and have protected them against CPU starvation and CPU intensive scenarios.
What’s Next¶
The next step is to protect the Rok External Services Pods and the Arrikto EKF Pods.