Scale In EKS Cluster¶

EKF supports automatic scaling operations on the Kubernetes cluster using a modified version of the Cluster Autoscaler that supports Rok volumes.

This guide will walk you through manually scaling in your EKS cluster, by selecting and removing nodes one-by-one.

What You’ll need ¶

A configured management environment.
An existing EKS cluster.
One or more managed or self-managed node groups.
Optional

A working Cluster Autoscaler.

Procedure ¶

Go to your GitOps repository, inside your rok-tools management environment:

root@rok-tools:~# cd ~/ops/deployments
List the Kubernetes nodes of your cluster:

root@rok-tools:~# kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-147-191.eu-central-1.compute.internal Ready <none> 18d v1.21.5-eks-bc4871b ip-192-168-168-207.eu-central-1.compute.internal Ready <none> 18d v1.21.5-eks-bc4871b
Specify the node you want to remove:

root@rok-tools:~# export NODE=<NODE>

Replace <NODE> with the node name. For example:

root@rok-tools:~# export NODE=ip-192-168-168-207.eu-central-1.compute.internal
Note

Normally, the Cluster Autoscaler finds a scale-in candidate automatically. In order to find a good candidate manually, you have to
1. Pick an underutilized node.
2. Ensure that you don’t try to scale in past the ASG’s minSize.
3. Ensure that existing EBS volumes are reachable from other nodes in the cluster.
Start a drain operation for the selected node:

root@rok-tools:~# kubectl drain --ignore-daemonsets --delete-local-data ${NODE?} ... node/ip-192-168-168-207.eu-central-1.compute.internal evicted

Note

This may take a while, since Rok is unpinning all volumes on this node, and as such, rok-csi-guard pods are expected to be evicted last.

Warning

Do not delete rok-csi-guard pods manually, since this might cause data loss.

Troubleshooting

The command does not complete.

Most likely the unpinning of a Rok PVC fails. Inspect the logs of Rok CSI controller to debug further.
Once the drain operation completes, remove the node.

Fast Forward

Skip this step if you have a Cluster Autoscaler instance running in your cluster, since it will see the drained node, will consider it as unneeded, and after a period of time (based on scale-down-unneeded-time option) it will automatically terminate the EC2 instance and decrement the desired size of the Auto Scaling group.
1. Find the EC2 instance of the drained node:
  
  root@rok-tools:~# export INSTANCE=$(kubectl get nodes ${NODE?} \ > -o jsonpath={.spec.providerID} \ > | sed 's|aws:///.*/||')
2. Terminate the instance and decrement the desired capacity of its Auto Scaling group:
  
  root@rok-tools:~# aws autoscaling terminate-instance-in-auto-scaling-group \ > --instance-id ${INSTANCE?} \ > --should-decrement-desired-capacity

Verify ¶

Ensure that the selected node has been removed from your Kubernetes cluster:

root@rok-tools:~# kubectl get nodes ${NODE?} Error from server (NotFound): nodes "ip-192-168-168-207.eu-central-1.compute.internal" not found
Ensure that the underlying instance has been deleted:

root@rok-tools:~# aws ec2 describe-instances --instance-id ${INSTANCE?} An error occurred (InvalidInstanceID.NotFound) when calling the DescribeInstances operation: The instance ID 'i-0f992f0b02d777901' does not exist