Handle Degraded Nodes

There are times when one or more nodes in your cluster may not work properly, and your cloud provider may report them as degraded, for example. In this scenario, you need to remove the degraded nodes from your cluster. In order to achieve graceful removal, you have to follow specific steps. In a nutshell, you need to:

  1. Cordon the degraded node so that you prevent new workloads to land on it.
  2. Drain the node so that Rok snapshots all local data and migrates workloads to other nodes.

After that, the node will not pose any threat to the cluster and everything will be functional again. At this point, the Cluster Autoscaler will consider the node unneeded and as such, after a small period of time, it will remove the node from the cluster.

This section will guide you through cordoning and draining degraded nodes in a graceful manner, so that you do not lose any data.

Warning

Do not remove any nodes in an unexpected manner, as you may lose data. Follow the Procedure presented in this guide instead.

What You’ll Need