Protect Arrikto EKF Pods

This guide describes the necessary steps to patch an existing Rok cluster on Kubernetes, in order to protect essential Arrikto EKF Pods from being terminated in case of a memory pressure scenario. To make this possible you are going to assign one of the pre-defined Kubernetes Priority Classes (system-node-critical, system-cluster-critical) to all System Pods.

Procedure

  1. Patch all Deployments in the relevant namespaces to assign them with the system-cluster-critical Priority Class:

    root@rok-tools:~# for ns in auth cert-manager ingress-nginx istio-system knative-monitoring knative-serving kubeflow ; do > for name in $(kubectl get deployments -n $ns --no-headers | awk '{print $1}') ; do > kubectl patch deployment -n $ns $name --patch '{"spec": {"template": {"spec": {"priorityClassName": "system-cluster-critical"}}}}' > done > done deployment.apps/dex patched deployment.apps/cert-manager patched deployment.apps/cert-manager-cainjector patched ...
  2. Patch all StatefulSets in the relevant namespaces to assign them with the system-cluster-critical Priority Class:

    root@rok-tools:~# for ns in istio-system knative-monitoring kubeflow rok-system ; do > for name in $(kubectl get sts -n $ns --no-headers | awk '{print $1}') ; do > kubectl patch statefulset -n $ns $name --patch '{"spec": {"template": {"spec": {"priorityClassName": "system-cluster-critical"}}}}' > done > done statefulset.apps/authservice patched statefulset.apps/prometheus-system patched statefulset.apps/application-controller-stateful-set patched ...
  3. Patch all DaemonSets in the relevant namespaces to assign them with the system-node-critical Priority Class:

    root@rok-tools:~# for ns in knative-monitoring rok-system ; do > for name in $(kubectl get daemonsets -n $ns --no-headers | awk '{print $1}') ; do > kubectl patch daemonset -n $ns $name --patch '{"spec": {"template": {"spec": {"priorityClassName": "system-node-critical"}}}}' > done > done daemonset.apps/node-exporter patched daemonset.apps/rok-disk-manager patched daemonset.apps/rok-kmod patched
  4. Wait for a while until all Pods restart and go in running state.

Verify

Ensure that all Pods in the cluster are up and running:

root@rok-tools:~# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE auth dex-7c9b56d8f-whmjn 1/1 Running 0 2h cert-manager cert-manager-cainjector-c5cc9b5c6 1/1 Running 0 2h cert-manager cert-manager-dfcd64965-29v2g 1/1 Running 0 2h ...

Summary

You have successfully patched all Arrikto EKF Pods with the highest pre-defined Kubernetes Priority Classes and have protected them against evictions and terminations under memory pressure scenarios.

What’s Next

Check out the rest of the maintenance operations that you can perform on your cluster.