Protect Arrikto EKF Pods

This guide describes the necessary steps to patch an existing Rok cluster on Kubernetes, in order to protect essential Arrikto EKF Pods from being terminated in case of a memory pressure scenario. To make this possible you are going to assign one of the pre-defined Kubernetes Priority Classes (system-node-critical, system-cluster-critical) to all System Pods.

Procedure

  1. Patch all Deployments in the relevant namespaces to assign them with the system-cluster-critical Priority Class:

    root@rok-tools:~# for ns in auth cert-manager ingress-nginx istio-system knative-monitoring knative-serving kubeflow ; do
    >     for name in $(kubectl get deployments -n $ns --no-headers | awk '{print $1}') ; do
    >         kubectl patch deployment -n $ns $name --patch '{"spec": {"template": {"spec": {"priorityClassName": "system-cluster-critical"}}}}'
    >     done
    > done
    deployment.apps/dex patched
    deployment.apps/cert-manager patched
    deployment.apps/cert-manager-cainjector patched
    ...
    
  2. Patch all StatefulSets in the relevant namespaces to assign them with the system-cluster-critical Priority Class:

    root@rok-tools:~# for ns in istio-system knative-monitoring kubeflow rok-system ; do
    >     for name in $(kubectl get sts -n $ns --no-headers | awk '{print $1}') ; do
    >         kubectl patch statefulset -n $ns $name --patch '{"spec": {"template": {"spec": {"priorityClassName": "system-cluster-critical"}}}}'
    >     done
    > done
    statefulset.apps/authservice patched
    statefulset.apps/prometheus-system patched
    statefulset.apps/application-controller-stateful-set patched
    ...
    
  3. Patch all DaemonSets in the relevant namespaces to assign them with the system-node-critical Priority Class:

    root@rok-tools:~# for ns in knative-monitoring rok-system ; do
    >     for name in $(kubectl get daemonsets -n $ns --no-headers | awk '{print $1}') ; do
    >         kubectl patch daemonset -n $ns $name --patch '{"spec": {"template": {"spec": {"priorityClassName": "system-node-critical"}}}}'
    >     done
    > done
    daemonset.apps/node-exporter patched
    daemonset.apps/rok-disk-manager patched
    daemonset.apps/rok-kmod patched
    
  4. Wait for a while until all Pods restart and go in running state.

Verify

Ensure that all Pods in the cluster are up and running:

root@rok-tools:~# kubectl get pods -A
NAMESPACE       NAME                               READY   STATUS    RESTARTS  AGE
auth            dex-7c9b56d8f-whmjn                1/1     Running   0         2h
cert-manager    cert-manager-cainjector-c5cc9b5c6  1/1     Running   0         2h
cert-manager    cert-manager-dfcd64965-29v2g       1/1     Running   0         2h
...

Summary

You have successfully patched all Arrikto EKF Pods with the highest pre-defined Kubernetes Priority Classes and have protected them against evictions and terminations under memory pressure scenarios.

What’s Next

Check out the rest of the maintenance operations that you can perform on your cluster.