Protect Rok System Pods

This guide describes the necessary steps to patch an existing Rok cluster on Kubernetes, in order to protect essential Rok System Pods from being terminated in case of a CPU pressure scenario.

Procedure

  1. Get the version of your Rok Operator:

    root@rok-tools:~# kubectl get -n rok-system sts rok-operator --no-headers \
    > -o custom-columns=:.spec.template.spec.containers[0].image
    gcr.io/arrikto-deploy/rok-operator:release-1.1-l0-release-1.1
    

    If the image tag of your Rok Operator is release-1.1-l0-release-1.1 or newer, you may proceed to the Verify section.

  2. Watch the rok-csi-controller logs and ensure that no pipelines or snapshot policies are running, namely nothing will be logged for 30 secs:

    root@rok-tools:~# kubectl -n rok logs -l app=rok-csi-controller -c csi-controller -f --tail=100
    
  3. Scale down the rok-operator StatefulSet:

    root@rok-tools:~# kubectl -n rok-system scale sts rok-operator --replicas=0
    statefulset.apps/rok-operator scaled
    
  4. Ensure rok-operator has scaled down to zero:

    root@rok-tools:~# kubectl -n rok-system get sts rok-operator
    NAME           READY   AGE
    rok-operator   0/0     2h
    
  5. Scale down the rok-csi-controller StatefulSet:

    root@rok-tools:~# kubectl -n rok scale sts rok-csi-controller --replicas=0
    statefulset.apps/rok-csi-controller scaled
    
  6. Ensure rok-csi-controller has scaled down to zero:

    root@rok-tools:~# kubectl get -n rok sts rok-csi-controller
    NAME                 READY   AGE
    rok-csi-controller   0/0     2h
    
  7. Watch the rok-csi-node logs and ensure that all pending operations have finished, namely nothing will be logged for 30 secs:

    root@rok-tools:~# kubectl -n rok logs -l app=rok-csi-node -c csi-node -f --tail=100
    
  8. Delete the rok-csi-node DaemonSet:

    root@rok-tools:~# kubectl -n rok delete ds rok-csi-node
    daemonset.apps "rok-csi-node" deleted
    
  9. Specify the image for the new rok-operator, which will assign appropriate CPU requests to all Rok and Rok CSI resources:

    root@rok-tools:~# export ROK_OPERATOR_IMAGE=gcr.io/arrikto-deploy/rok-operator:release-1.1-l0-release-1.1
    
  10. Patch rok-operator to pull the new image:

    root@rok-tools:~# kubectl -n rok-system patch sts rok-operator \
    > --patch "{\"spec\": {\"template\": {\"spec\": {\"containers\": [{\"name\": \"rok-operator\", \"image\": \"${ROK_OPERATOR_IMAGE}\"}]}}}}"
    statefulset.apps/rok-operator patched
    
  11. Scale back up rok-operator to its initial size to recreate the Rok and Rok CSI resources:

    root@rok-tools:~# kubectl -n rok-system scale sts rok-operator --replicas=1
    statefulset.apps/rok-operator scaled
    

Verify

  1. Ensure that the Rok cluster is up and running:

    root@rok-tools:~# watch kubectl get rokcluster -n rok
    NAME   VERSION                          HEALTH   TOTAL MEMBERS   READY MEMBERS  PHASE     AGE
    rok    release-1.1-l0-release-1.1-rc5   OK       3               3              Running   2h
    
  2. Ensure that rok, rok-csi-node and rok-csi-controller now have CPU requests:

    root@rok-tools:~# kubectl get -n rok daemonset rok --no-headers \
    > -o custom-columns=:.spec.template.spec.containers[*].resources
    map[requests:map[cpu:1 memory:2Gi]]
    
    root@rok-tools:~# kubectl get -n rok daemonset rok-csi-node --no-headers \
    > -o custom-columns=:.spec.template.spec.containers[*].resources
    map[requests:map[cpu:450m memory:900Mi]],map[requests:map[cpu:50m memory:100Mi]]
    
    root@rok-tools:~# kubectl get -n rok sts rok-csi-controller --no-headers \
    > -o custom-columns=:.spec.template.spec.containers[*].resources
    map[requests:map[cpu:100m memory:125Mi]],map[requests:map[cpu:100m memory:125Mi]],map[requests:map[cpu:100m memory:125Mi]],map[requests:map[cpu:200m memory:250Mi]]
    

Summary

You have successfully patched all Rok System pods with CPU requests and have protected them against CPU starvation and CPU intensive scenarios.

What's Next

The next step is to protect the Rok External Services Pods and the Arrikto EKF Pods.