Disable Rok on a Node Group

Rok runs as a DaemonSet, so it runs on all the nodes of a cluster, except for any nodes that have taints that the DaemonSet does not tolerate.

EKF ships its own custom Cluster Autoscaler, which expects that every node of the cluster runs Rok, except for nodes that are marked with a specific label to indicate that Rok is disabled.

In order to disable Rok on a specific node group, while allowing the Cluster Autoscaler to seamlessly scale every node group in the cluster, the node group needs appropriate configuration.

This guide will walk you through configuring an existing node group to disable Rok on it, while allowing the seamless autoscaling of the node group.

What You’ll Need

Check Your Environment

Before proceeding to the configuration of the existing node group, you need to ensure that the desired size of the node group is set to zero and that the deployment of the Cluster Autoscaler is scaled down to zero.

This is to safeguard that there are not any nodes of the node group that already run Rok, and that the Cluster Autoscaler will not trigger any undesired scale-up while configuring the node group.

Attention

By disabling Rok on a node group, any Pods requesting Rok storage will not be scheduled on the nodes of the node group.

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Restore the required context:

    root@rok-tools:~/ops/deployments:~# source deploy/env.eks-cluster
    root@rok-tools:~/ops/deployments:~# export EKS_CLUSTER
  3. Check that the Cluster Autoscaler deployment is scaled down to zero:

    root@rok-tools:~/ops/deployments# kubectl get deployments \ > -n kube-system cluster-autoscaler -ojson \ > | jq -e '.spec.replicas == 0' >/dev/null && echo OK || echo FAIL OK

    Troubleshooting

    The output of the command is FAIL

    The Cluster Autoscaler is not scaled down to zero replicas. Scale it down by running:

    root@rok-tools:~/ops/deployments# kubectl scale deployment -n kube-system cluster-autoscaler --replicas=0
  4. Ensure that the node group for which you want to disable Rok has desired size zero. Choose one of the following options based on your node group type.

    1. Set the name of the node group:

      root@rok-tools:~/ops/deployments# export NODEGROUP=<NODEGROUP>

      Replace <NODEGROUP> with the node group name. For example:

      root@rok-tools:~/ops/deployments# export NODEGROUP=general-non-rok-workers
    2. Check that the desired size of the node group is zero:

      root@rok-tools:~/ops/deployments# [[ $(aws eks describe-nodegroup \ > --cluster-name ${EKS_CLUSTER?} \ > --nodegroup-name ${NODEGROUP?} \ > --query nodegroup.scalingConfig.desiredSize) == 0 ]] && echo OK || echo FAIL OK

      Troubleshooting

      The output of the command is FAIL

      The desired size of the node group is not zero. Scale down the node group by following the Scale In EKS Cluster guide.

      It’s important to drain the existing nodes of the node group, as the guide instructs you, to unpin any Rok volumes that may live on these nodes.

    This section is work in progress.

Procedure

  1. Restore the required context from previous sections:

    root@rok-tools:~/ops/deployments# source deploy/env.eks-cluster
  2. Update the node group configuration. Choose one of the following options based on your node group type.

    1. Set the name of the node group:

      root@rok-tools:~/ops/deployments# export NODEGROUP=<NODEGROUP>

      Replace <NODEGROUP> with the node group name. For example:

      root@rok-tools:~/ops/deployments# export NODEGROUP=general-non-rok-workers
    2. Set the taints and labels to be added to the node group:

      root@rok-tools:~/ops/deployments# export LABEL_KEY=rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export LABEL_VALUE=true
      root@rok-tools:~/ops/deployments# export TAINT_KEY=rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export TAINT_EFFECT=NO_SCHEDULE

      Note

      The rok.arrikto.com/disabled: true label is used by the Cluster Autoscaler in order to determine if Rok is disabled on a node group. By configuring the managed node group with the label, any new nodes of it will have this label. The value of the label must be set to true, otherwise the Cluster Autoscaler will ignore it.

      The rok.arrikto.com/disabled:NoSchedule taint is used in order to prevent Rok from running on the nodes of the node group. By configuring the node group with this taint, any new nodes of it will have this taint.

    3. Update the node group configuration with the labels and the taints:

      root@rok-tools:~/ops/deployments# aws eks update-nodegroup-config \ > --cluster-name ${EKS_CLUSTER?} \ > --nodegroup-name=${NODEGROUP?} \ > --labels addOrUpdateLabels="{${LABEL_KEY?}=${LABEL_VALUE?}}" \ > --taints addOrUpdateTaints="{key=${TAINT_KEY?},effect=${TAINT_EFFECT?}}"
    4. Retrieve the underlying Auto Scaling group of the managed node group:

      root@rok-tools:~/ops/deployments# ASG=$(aws eks describe-nodegroup \ > --cluster-name ${EKS_CLUSTER?} \ > --nodegroup-name ${NODEGROUP?} \ > | jq .nodegroup.resources.autoScalingGroups[0].name)
    5. Set the tags to be added to the Auto Scaling group:

      root@rok-tools:~/ops/deployments# export LABEL_TAG_KEY=k8s.io/cluster-autoscaler/node-template/label/rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export LABEL_TAG_VALUE=true
      root@rok-tools:~/ops/deployments# export TAINT_TAG_KEY=k8s.io/cluster-autoscaler/node-template/taint/rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export TAINT_TAG_VALUE=:NoSchedule

      Note

      The Cluster Autoscaler relies on these tags of the Auto Scaling group to understand what labels and taints a node of the node group will have. The Cluster Autoscaler will use these tags only the first time it scales up from zero the node group. As soon as a live node joins the cluster, the Cluster Autoscaler will use that node in order to determine the actual labels and taints a node of the node group has. EKS does not derive these tags from the taints and the labels of the managed node group, so they have to be manually added.

      Moreover, even though EKS supports tags for both managed node groups and their underlying Auto Scaling Group, it does not propagate any tags of the managed node group to the underlying Auto Scaling group, so the user has to explicitly add the tags to the Auto Scaling group. See https://github.com/aws/containers-roadmap/issues/608.

    6. Update the tags of the Auto Scaling Group:

      root@rok-tools:~/ops/deployments# aws autoscaling create-or-update-tags \ > --tags ResourceId=${ASG?},ResourceType=auto-scaling-group,Key=${TAINT_TAG_KEY?},Value=${TAINT_TAG_VALUE?},PropagateAtLaunch=true \ > ResourceId=${ASG?},ResourceType=auto-scaling-group,Key=${LABEL_TAG_KEY?},Value=${LABEL_TAG_VALUE?},PropagateAtLaunch=true

    This section is work in progress.

Verify

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Restore the required context:

    root@rok-tools:~/ops/deployments:~# source deploy/env.eks-cluster
    root@rok-tools:~/ops/deployments:~# export EKS_CLUSTER
  3. Choose one of the following options based on your node group type.

    1. Set the name of the node group:

      root@rok-tools:~/ops/deployments# export NODEGROUP=<NODEGROUP>

      Replace <NODEGROUP> with the node group name. For example:

      root@rok-tools:~/ops/deployments# export NODEGROUP=general-non-rok-workers
    2. Set the taints and labels to be added to the node group:

      root@rok-tools:~/ops/deployments# export LABEL_KEY=rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export LABEL_VALUE=true
      root@rok-tools:~/ops/deployments# export TAINT_KEY=rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export TAINT_EFFECT=NO_SCHEDULE
    3. Verify that the rok.arrikto.com/disabled: true label exists:

      root@rok-tools:~/ops/deployments# aws eks describe-nodegroup \ > --cluster-name ${EKS_CLUSTER?} \ > --nodegroup-name ${NODEGROUP} \ > --query nodegroup.labels \ > --output json \ > | jq "to_entries[] | select(.key == \"${LABEL_KEY?}\" and .value == \"${LABEL_VALUE?}\" )" { "key": "rok.arrikto.com/disabled", "value": "true" }
    4. Verify that the rok.arrikto.com/disabled:NoSchedule taint exists:

      root@rok-tools:~/ops/deployments# aws eks describe-nodegroup \ > --cluster-name ${EKS_CLUSTER?} \ > --nodegroup-name ${NODEGROUP} \ > --query nodegroup.taints \ > --output json \ > | jq "values | .[] | select(.key == \"${TAINT_KEY?}\" and .effect == \"${TAINT_EFFECT?}\")" { "key": "rok.arrikto.com/disabled", "value": "true" }
    5. Retrieve the underlying Auto Scaling group of the managed node group:

      root@rok-tools:~/ops/deployments# export ASG=$(aws eks describe-nodegroup \ > --cluster-name ${EKS_CLUSTER?} \ > --nodegroup-name ${NODEGROUP?} \ > | jq .nodegroup.resources.autoScalingGroups[0].name)
    6. Set the tags that must have been added to the Auto Scaling group:

      root@rok-tools:~/ops/deployments# export LABEL_TAG_KEY=k8s.io/cluster-autoscaler/node-template/label/rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export LABEL_TAG_VALUE=true
      root@rok-tools:~/ops/deployments# export TAINT_TAG_KEY=k8s.io/cluster-autoscaler/node-template/taint/rok.arrikto.com/disabled
      root@rok-tools:~/ops/deployments# export TAINT_TAG_VALUE=:NoSchedule
    7. Verify that the label tag of the underlying Auto Scaling group exists:

      root@rok-tools:~/ops/deployments# aws autoscaling describe-tags \ > --filters Name=auto-scaling-group,Values=${ASG?} Name=key,Values=${LABEL_TAG_KEY?} \ > Name=value,Values=${LABEL_TAG_VALUE?} --query=Tags [ { "ResourceId": "eks-general-non-rok-workers-64c306b8-788c-043a-ff32-4d8e853647d6", "ResourceType": "auto-scaling-group", "Key": "k8s.io/cluster-autoscaler/node-template/label/rok.arrikto.com/disabled", "Value": "true", "PropagateAtLaunch": true } ]
    8. Verify that the taint tag of the underlying Auto Scaling group exists:

      root@rok-tools:~/ops/deployments# aws autoscaling describe-tags \ > --filters Name=auto-scaling-group,Values=${ASG?} Name=key,Values=${TAINT_TAG_KEY?} \ > Name=value,Values=${TAINT_TAG_VALUE?} --query=Tags [ { "ResourceId": "eks-general-non-rok-workers-64c306b8-788c-043a-ff32-4d8e853647d6", "ResourceType": "auto-scaling-group", "Key": "k8s.io/cluster-autoscaler/node-template/taint/rok.arrikto.com/disabled", "Value": ":NoSchedule", "PropagateAtLaunch": true } ]

    This section is work in progress.

Summary

You have successfully disabled Rok on the node group.

What’s Next

Check out the rest of the maintenance operations that you can perform on your cluster.