Scale In EKS Cluster

EKF supports automatic scaling operations on the Kubernetes cluster using a modified version of the Cluster Autoscaler that supports Rok volumes.

This guide will walk you through manually scaling in your EKS cluster, by selecting and removing nodes one-by-one.

See also

What You’ll need

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. List the Kubernetes nodes of your cluster:

    root@rok-tools:~# kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-147-191.eu-central-1.compute.internal Ready <none> 18d v1.21.5-eks-bc4871b ip-192-168-168-207.eu-central-1.compute.internal Ready <none> 18d v1.21.5-eks-bc4871b
  3. Specify the node you want to remove:

    root@rok-tools:~# export NODE=<NODE>

    Replace <NODE> with the node name. For example:

    root@rok-tools:~# export NODE=ip-192-168-168-207.eu-central-1.compute.internal

    Note

    Normally, the Cluster Autoscaler finds a scale-in candidate automatically. In order to find a good candidate manually, you have to

    1. Pick an underutilized node.
    2. Ensure that you don’t try to scale in past the ASG’s minSize.
    3. Ensure that existing EBS volumes are reachable from other nodes in the cluster.
  4. Start a drain operation for the selected node:

    root@rok-tools:~# kubectl drain --ignore-daemonsets --delete-local-data ${NODE?} ... node/ip-192-168-168-207.eu-central-1.compute.internal evicted

    Note

    This may take a while, since Rok is unpinning all volumes on this node, and as such, rok-csi-guard pods are expected to be evicted last.

    Warning

    Do not delete rok-csi-guard pods manually, since this might cause data loss.

    Troubleshooting

    The command does not complete.

    Most likely the unpinning of a Rok PVC fails. Inspect the logs of Rok CSI controller to debug further.

  5. Once the drain operation completes, remove the node.

    Fast Forward

    Skip this step if you have a Cluster Autoscaler instance running in your cluster, since it will see the drained node, will consider it as unneeded, and after a period of time (based on scale-down-unneeded-time option) it will automatically terminate the EC2 instance and decrement the desired size of the Auto Scaling group.

    1. Find the EC2 instance of the drained node:

      root@rok-tools:~# export INSTANCE=$(kubectl get nodes ${NODE?} \ > -o jsonpath={.spec.providerID} \ > | sed 's|aws:///.*/||')
    2. Terminate the instance and decrement the desired capacity of its Auto Scaling group:

      root@rok-tools:~# aws autoscaling terminate-instance-in-auto-scaling-group \ > --instance-id ${INSTANCE?} \ > --should-decrement-desired-capacity

Verify

  1. Ensure that the selected node has been removed from your Kubernetes cluster:

    root@rok-tools:~# kubectl get nodes ${NODE?} Error from server (NotFound): nodes "ip-192-168-168-207.eu-central-1.compute.internal" not found
  2. Ensure that the underlying instance has been deleted:

    root@rok-tools:~# aws ec2 describe-instances --instance-id ${INSTANCE?} An error occurred (InvalidInstanceID.NotFound) when calling the DescribeInstances operation: The instance ID 'i-0f992f0b02d777901' does not exist

Summary

You have successfully scaled in your EKS cluster.

What’s Next

Check out the rest of the EKS maintenance operations that you can perform on your cluster.