Cordon Degraded Nodes

There are times when one or more nodes in your cluster may not work properly, and your cloud provider may report them as degraded, for example. In this scenario, you first need to isolate these nodes, that is, prevent new workloads to land on them. This guide will walk you through cordoning a node of your cluster.

Procedure

Note

The output of the commands may slightly differ, depending on your cloud provider.

  1. List the Kubernetes nodes of your cluster along with their provider ID to identify the problematic node easier:

    root@rok-tools:~# kubectl get nodes \ > -o custom-columns=NAME:metadata.name,ID:spec.providerID NAME ID ip-192-168-173-207.eu-central-1.compute.internal aws:///eu-central-1a/i-01cd4d54e8861740c ip-192-168-189-255.eu-central-1.compute.internal aws:///eu-central-1a/i-0eafe2e4a7b9e758b ip-192-168-191-241.eu-central-1.compute.internal aws:///eu-central-1a/i-08943c863882cb8c2
  2. Specify the node you want to cordon:

    root@rok-tools:~# export NODE=<NODE>

    Replace <NODE> with the node name. For example:

    root@rok-tools:~# export NODE=ip-192-168-189-255.eu-central-1.compute.internal
  3. Ensure that the node is reachable, and as such field STATUS is Ready:

    root@rok-tools:~# kubectl get nodes ${NODE?} NAME STATUS ROLES AGE VERSION ip-192-168-189-255.eu-central-1.compute.internal Ready <none> 18m v1.22.12-eks-ba743266
  4. Cordon the selected node:

    root@rok-tools:~# kubectl cordon ${NODE?} node/ip-192-168-189-255.eu-central-1.compute.internal cordoned

Verify

  1. Verify that the selected node is cordoned. Verify that field STATUS is Ready,SchedulingDisabled:

    root@rok-tools:~# kubectl get nodes ${NODE?} NAME STATUS ROLES AGE VERSION ip-192-168-189-255.eu-central-1.compute.internal Ready,SchedulingDisabled <none> 29m v1.22.12-eks-ba74326
  2. Verify that the selected node has at least a taint with effect NoSchedule:

    root@rok-tools:~# kubectl get nodes ${NODE?} -o json \ > | jq -r '.spec.taints[] | select(.effect == "NoSchedule") | .key' node.kubernetes.io/unschedulable

Summary

You have successfully cordoned a degraded node from your cluster.

What’s Next

The next step is to drain the cordoned node.