Drain Degraded Nodes

There are times when one or more nodes in your cluster may not work properly, and your cloud provider may report them as degraded, for example. In this scenario, you need to drain the node so that Rok snapshots all local data and allows Kubernetes to migrate workloads to other nodes.

After that, the node will not pose any threat to the cluster and everything will be functional again. At this point, the Cluster Autoscaler will consider the node unneeded and as such, after a small period of time, it will remove the node from the cluster.

This section will guide you through draining a degraded node in a graceful manner, so that you do not lose any data.

Warning

Do not remove any nodes in an unexpected manner, as you may lose data. Follow the Procedure presented in this guide instead.

Procedure

Note

The output of the commands may slightly differ, depending on your cloud provider.

  1. List the Kubernetes nodes of your cluster:

    root@rok-tools:~# kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-173-207.eu-central-1.compute.internal Ready <none> 154m v1.21.5-eks-bc4871b ip-192-168-189-255.eu-central-1.compute.internal Ready,SchedulingDisabled <none> 18m v1.21.5-eks-bc4871b ip-192-168-191-241.eu-central-1.compute.internal Ready <none> 69s v1.21.5-eks-bc4871b
  2. Specify the node you want to drain. Choose the already cordoned one from the list above, that is, the one for which field STATUS is Ready,SchedulingDisabled:

    root@rok-tools:~# export NODE=<NODE>

    Replace <NODE> with the node name. For example:

    root@rok-tools:~# export NODE=ip-192-168-189-255.eu-central-1.compute.internal
  3. Start a drain operation for the selected node:

    root@rok-tools:~# kubectl drain --ignore-daemonsets --delete-local-data ${NODE?} node/ip-192-168-189-255.eu-central-1.compute.internal already cordoned ... evicting pod kubeflow-user/test2-0 ... pod/test2-0 evicted evicting pod rok/rok-csi-guard-ip-192-168-189-255.eu-central-1.compute.intextxtz error when evicting pod "rok-csi-guard-ip-192-168-189-255.eu-central-1.compute.intextxtz" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. ... evicting pod rok/rok-csi-guard-ip-192-168-189-255.eu-central-1.compute.intextxtz pod rok/rok-csi-guard-ip-192-168-189-255.eu-central-1.compute.intextxtz evicted node/ip-192-168-189-255.eu-central-1.compute.internal evicted

    Note

    This operation may take a while, since Rok will unpin all volumes on this node, which means that it will snapshot all Rok volumes on this node.

    Once done, Rok operator will delete the Rok CSI guard Pod on this node. Since these Pods are protected by a PodDisruptionBudget, they are expected to be evicted last.

    Warning

    Do not delete rok-csi-guard Pods manually, since this might cause data loss.

    Note

    After the drain operation completes, the Rok CSI guard Pod and the corresponding PDB for this node will still exist. We expect that the Pod will be unschedulable and show up as Pending.

    Troubleshooting

    The drain command does not complete.

    Most likely the unpinning of a Rok PVC fails. See Gather Logs for Troubleshooting to debug further.

Verify

  1. Ensure that no Rok volume lives on the node that you just drained, that is, the following command produces no output:

    root@rok-tools:~# kubectl get pv -o json \ > | jq -r '.items[] | select(.spec.storageClassName == "rok") | .spec.claimRef.namespace, .spec.claimRef.name, .metadata.name, (try .spec.nodeAffinity.required.nodeSelectorTerms[].matchExpressions[].values[] catch "none")' \ > | paste - - - - \ > | column -t \ > | grep ${NODE?}

    Note

    The above command executes the following:

    1. Iterates over all PVs.
    2. Finds those of storage class rok.
    3. For each PV, it shows:
      • PVC namespace
      • PVC name
      • PV name
      • Node mentioned in PV nodeAffinity (if any, otherwise none).
    4. Filters the entries that refer to the drained node.
  2. Optional

    The Cluster Autoscaler will consider this node as unneeded, and as such, after a period of time, it will remove the node from the cluster. This is evident from the events the Cluster Autoscaler will emit for this node:

    root@rok-tools:~# kubectl get events -w LAST SEEN TYPE REASON OBJECT MESSAGE ... 22m Normal NodeNotSchedulable node/... Node ... status is now: NodeNotSchedulable 20m Normal ScaleDown node/... node removed by cluster autoscaler 19m Normal NodeNotReady node/... Node ... status is now: NodeNotReady 19m Normal Deleting node ... because it does not exist in the cloud provider node/... Node ... event: DeletingNode 19m Normal RemovingNode node/... Node ... event: Removing Node ... from Controller

    Note

    The period of time for the Autoscaler to take action is configurable via the scale-down-unneeded-time argument. It defaults to 2 minutes on EKF deployments.

    After that time has passed, list the nodes and verify that the Cluster Autoscaler has removed the drained node:

    root@rok-tools:~/ops/deployments# kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-173-207.eu-central-1.compute.internal Ready <none> 154m v1.21.5-eks-bc4871b ip-192-168-191-241.eu-central-1.compute.internal Ready <none> 69s v1.21.5-eks-bc4871b

Summary

You have successfully drained a degraded node from your cluster.

What’s Next

Check out the rest of the maintenance operations that you can perform on your cluster.