Autoscaling

This section mentions all the necessary actions that you, as the administrator, should do in order to scale in and out an EKS cluster gracefully without losing any data.

Warning

If an EC2 instance (EKS worker node) terminates in an unexpected manner, data will be lost. As such, you should avoid the following actions:

  • Decrement the desired size of the ASG.
  • Terminate an EC2 instance directly from the console.
  • Delete a whole nodegroup.

Find ASG

To find the Auto Scaling groups associated with the your EKS cluster based on an EKS specific tag that is mandatory for both managed and self-managed node groups:

$ aws autoscaling describe-auto-scaling-groups | \
>     jq -r '.AutoScalingGroups[] | select(.Tags[] | .Key == "kubernetes.io/cluster/'${EKS_CLUSTER?}'" and .Value == "owned") | .AutoScalingGroupName'

Scale-in Protection

Scaling down the node group using the ASG can have catastrophic implications, since it does not allow Rok to properly drain the node (and migrate any volumes) before deleting the corresponding EC2 instance. This is described in more details at the Amazon EC2 Auto Scaling instance lifecycle document, where we see that ASG will remove the instance after about 15 minutes, even if the drain operation has not finished.

To prevent that from happening, you need to enable scale-in protection:

  • at the ASG level, i.e., for newly created instances, and
  • at the instance level, i.e., for existing instances.

Since setting the scale-in protection cannot be done via EKS, we will operate directly on the underlying ASG after creating the node group.

First find the Auto Scaling groups associated with your EKS cluster, and then, repeat the following steps for each one of the Auto Scaling groups found:

$ export ASG=<asg>
  1. Check the current configuration wrt scale-in protection at ASG level:

    $ aws autoscaling describe-auto-scaling-groups \
    >     --auto-scaling-group-names $ASG | \
    >     jq -r '.AutoScalingGroups[] | .AutoScalingGroupName, .NewInstancesProtectedFromScaleIn' | \
    >         paste - -
    

    and at instance level:

    $ aws autoscaling describe-auto-scaling-groups \
    >     --auto-scaling-group-names $ASG | \
    >     jq -r '.AutoScalingGroups[].Instances[] | .InstanceId, .ProtectedFromScaleIn' | \
    >         paste - -
    
  2. Enable scale-in protection at ASG level:

    $ aws autoscaling update-auto-scaling-group \
    >    --auto-scaling-group-name $ASG \
    >    --new-instances-protected-from-scale-in
    
  3. Enable scale-in protection at instance level:

    $ aws autoscaling describe-auto-scaling-groups \
    >    --auto-scaling-group-name $ASG | \
    >       jq -r '.AutoScalingGroups[].Instances[].InstanceId' | \
    >          xargs aws autoscaling set-instance-protection \
    >             --auto-scaling-group-name $ASG \
    >             --protected-from-scale-in \
    >             --instance-ids
    

Suspend Unsafe ASG Scaling Processes

Since Rok uses local NVMe disks to store user data, terminating/replacing a node before properly draining it would result to data loss. So, you have to suspend scaling processes that would result to a node termination, i.e., ReplaceUnhealthy, AZRebalance, InstanceRefresh. Suspending the above processes means that:

  • Unhealthy instances, i.e., EC2 instances that their status checks have failed, will remain in-service and will require a manual action. See Manage unhealthy instances for more details.
  • There will be no rebalancing across availability zones. Still, since you create ASG on single-AZ because you make use of EBS volumes, this should not affect you.
  • To refresh all instances, you should perform a rolling update similar to the one you do in case of an Upgrade EKS Node Group, i.e., increase the ASG size, drain the old nodes, and let the Cluster Autoscaler remove them.

For more information on the available scaling processes and how to suspend/resume them see official docs.

To disable the aforementioned dangerous operations, given that you already have created your EKS node group, first find the Auto Scaling groups associated with your EKS cluster, and for each ASG found run the following CLI command:

$ aws autoscaling suspend-processes \
>     --auto-scaling-group-name $ASG \
>     --scaling-processes AZRebalance InstanceRefresh ReplaceUnhealthy

Manage Unhealthy Instances

Since we have suspended the ReplaceUnhealthy operation, if an instance is marked as unhealthy by the ASG, it will remain in-service and will require a manual action.

If there was a temporarily failure, e.g., a system crash, that made the system freeze for a while but eventually the node got rebooted the EC2 instance can be considered healthy again, i.e., EC2 will report it as such. To manually reset the health status of an instance run:

$ aws autoscaling set-instance-health \
>     --health-status Healthy \
>     --instance-id i-123abc45d

Warning

In case the failure is permanent, e.g., corrupted file system, the node must be replaced. In such cases, it helps if the you have set up Snapshot policies for Backup so that you restore your volumes from the latest available snapshot. To terminate such an instance run:

$ aws autoscaling terminate-instance-in-auto-scaling-group \
>     --no-should-decrement-desired-capacity \
>     --instance-id i-123abc45d

Scale-in

EKF supports automatic scaling operations on the Kubernetes cluster using a modified version of the Cluster Autoscaler that supports Rok volumes.

This guide will walk you through manually scaling-in your EKS cluster by selecting and removing nodes one-by-one. If you want to forcefully scale your EKS cluster to a desired size without manually selecting the nodes that you will remove, please follow the Scale In Kubernetes Cluster guide instead.

In order to manually scale in the cluster, you, as the administrator, should:

  1. Select a Kubernetes node that you want to remove (see Find a scale-in candidate).

  2. Start a drain operation on the selected node:

    $ kubectl drain --ignore-daemonsets --delete-local-data NODE
    
  3. Rok will snapshot the volumes on that node and move them elsewhere, unguard that node and allow the drain operation to complete.

  4. When the drain has finished, the Cluster Autoscaler will see that the node is now empty, and considers it as unneeded.

  5. After a period of time (scale-down-unneeded-time) the Cluster Autoscaler will terminate the EC2 instance and reduce the desired size of the ASG.

Find a Scale-in Candidate

Normally, Cluster Autoscaler finds a scale-in candidate automatically. In order to find a good candidate manually, you have to:

  1. Pick an underutilized node.
  2. Ensure that you don't try to scale in past the ASG's minSize.
  3. Ensure that existing EBS volumes are reachable from other nodes in the cluster.

Note

If your nodegroups spawn a single AZ only, you can skip any EBS related checks. Note that using a signle AZ per nodegroup is considered best practice (see Cluster Autoscaler docs and this Amazon blog for more info).

To find a scale-in candidate that covers the above prerequisites, follow the steps below:

  1. Find nodes with low utilization, e.g., less that 0.5, by inspecting the Cluster Autoscaler logs:

    $ kubectl logs -n kube-system deploy/cluster-autoscaler -f --tail 100 | \
    >     grep "utilization 0.[0-4]"
    

    Note

    The Autoscaler does not report nodes that belong to an ASG that has already reached its minSize.

  2. Find out in which AZ your nodes are located:

    $ kubectl get nodes -o json | \
    >    jq -r '.items[] | .metadata.labels["failure-domain.beta.kubernetes.io/zone"], .metadata.name' | \
    >        paste - - | sort -k 1
    
  3. Find out in which AZ your EBS volumes are located:

    $ kubectl get pv -o json | \
    >    jq -r '.items[] | select(.spec.storageClassName == "gp2") | .metadata.labels["failure-domain.beta.kubernetes.io/zone"], .spec.claimRef.name' | \
    >       paste - - | sort -k 1
    
  4. Pick a node from the ones found in step 1 that satisfies any of the following conditions:

    • It is not the last node in an AZ.
    • It is the last node in an AZ where no EBS volumes exist.
  5. Go ahead, drain the node and let the Cluster Autoscaler eventually remove it.

Scale-out

Currently, we do not support automatic scale-out in case of insufficient Rok storage.

Important

If a Pod gets scheduled on a node with insuffient Rok storage, the PVC will be stuck Pending. Reporting storage capacity and rescheduling pods if storage fails to be provisioned is supported in Kubernetes 1.19 and is in alpha state (see https://kubernetes.io/docs/concepts/storage/storage-capacity/#rescheduling).

Still, if a pod becomes un-schedulable due to insufficient resources (CPU, RAM), the Cluster Autoscaler will trigger a scale-out, i.e., will increase the desired size of the ASG, and eventually, a new Kubernetes node will be added.

To scale-up the cluster manually, you can do it directly from EKS with:

$ aws eks update-nodegroup-config \
>     --cluster-name ${EKS_CLUSTER?} \
>     --nodegroup-name general-workers \
>     --scaling-config minSize=2,maxSize=5,desiredSize=4

This will add a new node to the Kubernetes cluster and the Rok operator will scale the RokCluster members accordingly.