Create EKS Managed Node Group

This section will guide you through creating a managed node group.

Choose one of the following options to create a managed node group:

What You’ll Need

Check Your Environment

To create a managed node group, you are going to deploy a CloudFormation stack. When working with AWS CloudFormation stacks to manage resources, not only do you need sufficient permissions on AWS CloudFormation, but also on the underlying resources that are defined in the template.

In order to create a managed node group for your EKS cluster using AWS CloudFormation you need permissions for the following actions:

  • Deploy AWS CloudFormation stacks.
  • Create EKS node groups.
  • Pass IAM roles to EKS resources.
  • Describe EC2 resources.
  • (Optional) Permissions to create launch templates if you are going to create a managed node group without NVMe disks attached to it.

Note

If you do not have the above permissions, contact your AWS administrator to grant sufficient permissions to your IAM user or deploy the below AWS CloudFormation stack for you.

Option 1: Create EKS Managed Node Group Automatically (preferred)

Choose one of the following options based on the node group type you want to create.

Create an EKS managed node group by following the on-screen instructions on the rok-deploy user interface.

If rok-deploy is not already running, start it with:

root@rok-tools:~# rok-deploy --run-from eks-nodegroup
../../../../_images/eks-nodegroup.png

Proceed to the Summary section.

rok-deploy does not currently support the automatic creation of GPU-enabled node groups. Please follow Option 2: Create EKS Managed Node Group Manually to create a GPU-enabled node group manually.

Option 2: Create EKS Managed Node Group Manually

If you want to create an EKS managed node group manually, follow this section.

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments/
  2. Restore the required context from previous sections:

    root@rok-tools:~/ops/deployments# source <(cat deploy/env.{envvars-aws,\ > eks-cluster,aws-subnets,eks-iam-node})
    root@rok-tools:~/ops/deployments# export AWS_ACCOUNT_ID AWS_DEFAULT_REGION EKS_CLUSTER \ > EKS_IAM_NODE_ROLE EKS_CLUSTER_VERSION AWS_SUBNETS_PUBLIC AWS_SUBNETS_PRIVATE
  3. Specify a name for your node group:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP=<NODEGROUP>

    Replace NODEGROUP with the name of your node group. For example, you can choose one of the following options, based on the type of node group you want to create:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP=general-workers
    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP=gpu-workers
  4. Choose an instance type of your preference. Choose one of the following options, based on the type of the node group you want to create:

    We recommend that you use an instance type that has instance store volumes (local NVMe storage) attached, such as m5d.4xlarge:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_INSTANCE_TYPE=m5d.4xlarge

    Choose an instance type with NVIDIA GPUs. We recommend that you use an instance type that has instance store volumes (local NVMe storage) attached, such as p4d.24xlarge:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_INSTANCE_TYPE=p4d.24xlarge
  5. Specify the AMI version of the Amazon EKS optimized AMI you want to use. Choose one of the following options based on the Kubernetes version of your EKS cluster.

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_AMI_VERSION=1.21.5-20220429
    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_AMI_VERSION=1.20.11-20220429
    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_AMI_VERSION=1.19.15-20220429
  6. Specify the subnets that the EKS node group will use. Choose on of the following options, based on whether you want your node group to have public or private subnets:

    Use the first public subnet of your VPC:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_SUBNETS=$(echo ${AWS_SUBNETS_PUBLIC?} | cut -d' ' -f1)

    Use the first private subnet of your VPC:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_SUBNETS=$(echo ${AWS_SUBNETS_PRIVATE?} | cut -d' ' -f1)

    Note

    Advanced Networking: We recommend you use one of the available subnets based on whether you want nodes that are publicly or privately accessible. However, if you have specific networking requirements, you can explicitly specify any subset of your public and private subnets with:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_SUBNETS="<SUBNET1> <SUBNET2>"

    Ensure all the subnets that the EKS node group will use belong to the same availability zone.

  7. Specify the root device disk size:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_DISK_SIZE=200
  8. Specify whether to add extra EBS volumes to your worker nodes. Extra EBS volumes are necessary only if your instance type does not have NVMe disks attached. Run the following command to determine whether to add extra EBS volumes, based on your instance type:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_ADD_EXTRA_EBS_VOLUMES=$(aws ec2 \ > describe-instance-types \ > --instance-types ${EKS_NODEGROUP_INSTANCE_TYPE?} \ > --query "!(InstanceTypes[].InstanceStorageSupported | [0])" \ > --output text \ > | tr '[:upper:]' '[:lower:]')
  9. Optional

    Specify if you want the node group to be used exclusively for GPU workload:

    Skip to the next step.

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_GPU_WORKLOAD=true

    Warning

    Do not set this environment variable if you do not want your node group to be dedicated only to GPU workload.

    By setting the aforementioned environment variable, you will add the key=nvidia.com/gpu,effect=NO_SCHEDULE taint to the node group. This will only permit pods tolerating the specified taint to run on the GPU node group. On Kubernetes 1.19, the ExtendedResourceToleration admission controller will add the appropriate toleration on pods that request the nvidia.com/gpu resource. In order to run pods that do not request this resource and thus do not tolerate the taint, you should add a second non-GPU node group.

  10. Set the name of the CloudFormation stack you will deploy:

    root@rok-tools:~/ops/deployments# export EKS_NODEGROUP_CF_STACK=rok-${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-managed-nodegroup

    Important

    If you already have a CF stack with the proposed name, for example you already have deployed a node group for your cluster, or you are in the process of upgrading your node groups, choose a different name.

  11. Verify that the CloudFormation stack name you specified is not longer than 128 characters:

    root@rok-tools:~/ops/deployments# [[ ${#EKS_NODEGROUP_CF_STACK} -le 128 ]] && echo OK || echo FAIL OK

    Troubleshooting

    The output of the command is FAIL

    Go back to step step 11 and specify a shorter name.

  12. Ensure that the CloudFormation stack name you specified is not already in use:

    root@rok-tools:~/ops/deployments# aws cloudformation describe-stacks \ > --stack-name ${EKS_NODEGROUP_CF_STACK?} &>/dev/null \ > && echo "Stack name already in use" \ > || echo "OK" OK

    Troubleshooting

    The output of the command is “Stack name already in use

    Go back to step step 11 and specify a different name name.

  13. Generate the AWS CloudFormation stack. Choose one of the following options, based on the type of node group you want to create:

    root@rok-tools:~/ops/deployments# rok-j2 rok/eks/eks-managed-nodegroup.yaml.j2 \ > -o rok/eks/eks-managed-nodegroup.yaml

    Alternatively, save the eks-managed-nodegroup CloudFormation template provided below and use it locally.

    eks-managed-nodegroup.yaml.j2
    1AWSTemplateFormatVersion: "2010-09-09"
    2
    3Description: Amazon EKS - Managed Node Group
    4-63
    4
    5Metadata:
    6 Rok::StackName: {{EKS_NODEGROUP_CF_STACK}}
    7
    8Resources:
    9
    10{%- set EKS_NODEGROUP_ADD_EXTRA_EBS_VOLUMES=EKS_NODEGROUP_ADD_EXTRA_EBS_VOLUMES|str2py("bool") %}
    11{%- set EKS_NODEGROUP_SUBNETS=EKS_NODEGROUP_SUBNETS|str2py("list") %}
    12
    13 NodeLaunchTemplate:
    14 Type: AWS::EC2::LaunchTemplate
    15 Properties:
    16 LaunchTemplateData:
    17 BlockDeviceMappings:
    18 - DeviceName: /dev/xvda
    19 Ebs:
    20 DeleteOnTermination: true
    21 VolumeSize: {{EKS_NODEGROUP_DISK_SIZE}}
    22 VolumeType: gp2
    23 {%- if EKS_NODEGROUP_ADD_EXTRA_EBS_VOLUMES %}
    24 - DeviceName: /dev/sdf
    25 Ebs:
    26 DeleteOnTermination: true
    27 VolumeSize: 1000
    28 VolumeType: gp2
    29 {%- endif %}
    30 MetadataOptions:
    31 HttpTokens: required
    32 HttpPutResponseHopLimit: 1
    33
    34 EKSNodegroup:
    35 Type: 'AWS::EKS::Nodegroup'
    36 Properties:
    37 AmiType: AL2_x86_64
    38 ClusterName: {{EKS_CLUSTER}}
    39 InstanceTypes:
    40 - {{EKS_NODEGROUP_INSTANCE_TYPE}}
    41 NodegroupName: {{EKS_NODEGROUP}}
    42 NodeRole: 'arn:aws:iam::{{AWS_ACCOUNT_ID}}:role/{{EKS_IAM_NODE_ROLE}}'
    43 ScalingConfig:
    44 MinSize: 1
    45 DesiredSize: 2
    46 MaxSize: 3
    47 Version: '{{EKS_CLUSTER_VERSION}}'
    48 ReleaseVersion: {{EKS_NODEGROUP_AMI_VERSION}}
    49 Labels:
    50 role: {{EKS_NODEGROUP}}
    51 Subnets:
    52 {%- for subnet in EKS_NODEGROUP_SUBNETS %}
    53 - {{subnet}}
    54 {%- endfor %}
    55 LaunchTemplate:
    56 Id: !Ref NodeLaunchTemplate
    57
    58Outputs:
    59
    60 Nodegroup:
    61 Description: The Nodegroup
    62 Value: !Ref EKSNodegroup
    63
    64 LaunchTemplate:
    65 Description: The generated LaunchTemplate for the Nodegroup
    66 Value: !Ref NodeLaunchTemplate
    root@rok-tools:~/ops/deployments# rok-j2 rok/eks/eks-managed-nodegroup-gpu.yaml.j2 \ > -o rok/eks/eks-managed-nodegroup-gpu.yaml

    Alternatively, save the eks-managed-nodegroup-gpu CloudFormation template provided below and use it locally.

    eks-managed-nodegroup-gpu.yaml.j2
    1AWSTemplateFormatVersion: "2010-09-09"
    2
    3Description: Amazon EKS - Managed Node Group
    4-73
    4
    5Metadata:
    6 Rok::StackName: {{EKS_NODEGROUP_CF_STACK}}
    7
    8Resources:
    9
    10{%- set EKS_NODEGROUP_ADD_EXTRA_EBS_VOLUMES=EKS_NODEGROUP_ADD_EXTRA_EBS_VOLUMES|str2py("bool") %}
    11{%- if EKS_NODEGROUP_GPU_WORKLOAD is defined %}
    12{%- set EKS_NODEGROUP_GPU_WORKLOAD=EKS_NODEGROUP_GPU_WORKLOAD|str2py("bool") %}
    13{%- endif %}
    14{%- set EKS_NODEGROUP_SUBNETS=EKS_NODEGROUP_SUBNETS|str2py("list") %}
    15
    16 NodeLaunchTemplate:
    17 Type: AWS::EC2::LaunchTemplate
    18 Properties:
    19 LaunchTemplateData:
    20 BlockDeviceMappings:
    21 - DeviceName: /dev/xvda
    22 Ebs:
    23 DeleteOnTermination: true
    24 VolumeSize: {{EKS_NODEGROUP_DISK_SIZE}}
    25 VolumeType: gp2
    26 {%- if EKS_NODEGROUP_ADD_EXTRA_EBS_VOLUMES %}
    27 - DeviceName: /dev/sdf
    28 Ebs:
    29 DeleteOnTermination: true
    30 VolumeSize: 1000
    31 VolumeType: gp2
    32 {%- endif %}
    33 MetadataOptions:
    34 HttpTokens: required
    35 HttpPutResponseHopLimit: 1
    36
    37 EKSNodegroup:
    38 Type: 'AWS::EKS::Nodegroup'
    39 Properties:
    40 AmiType: AL2_x86_64_GPU
    41 ClusterName: {{EKS_CLUSTER}}
    42 InstanceTypes:
    43 - {{EKS_NODEGROUP_INSTANCE_TYPE}}
    44 NodegroupName: {{EKS_NODEGROUP}}
    45 NodeRole: 'arn:aws:iam::{{AWS_ACCOUNT_ID}}:role/{{EKS_IAM_NODE_ROLE}}'
    46 ScalingConfig:
    47 MinSize: 0
    48 DesiredSize: 3
    49 MaxSize: 10
    50 Version: '{{EKS_CLUSTER_VERSION}}'
    51 ReleaseVersion: {{EKS_NODEGROUP_AMI_VERSION}}
    52 Labels:
    53 role: {{EKS_NODEGROUP}}
    54 Subnets:
    55 {%- for subnet in EKS_NODEGROUP_SUBNETS %}
    56 - {{subnet}}
    57 {%- endfor %}
    58 LaunchTemplate:
    59 Id: !Ref NodeLaunchTemplate
    60 {%- if EKS_NODEGROUP_GPU_WORKLOAD is defined %}
    61 {%- if EKS_NODEGROUP_GPU_WORKLOAD == true %}
    62 Taints:
    63 - Key: nvidia.com/gpu
    64 Effect: NO_SCHEDULE
    65 {%- endif %}
    66 {%- endif %}
    67
    68Outputs:
    69
    70 Nodegroup:
    71 Description: The Nodegroup
    72 Value: !Ref EKSNodegroup
    73
    74 LaunchTemplate:
    75 Description: The generated LaunchTemplate for the Nodegroup
    76 Value: !Ref NodeLaunchTemplate
  14. Save your state:

    root@rok-tools:~/ops/deployments# j2 deploy/env.eks-nodegroup.j2 \ > -o deploy/env.eks-nodegroup
  15. Commit your changes:

    root@rok-tools:~/ops/deployments# git commit -am "Create EKS Managed Node Group"
  16. Mark your progress:

    root@rok-tools:~/ops/deployments# export DATE=$(date -u "+%Y-%m-%dT%H.%M.%SZ")
    root@rok-tools:~/ops/deployments# git tag -a deploy/${DATE?}/release-1.5/eks-nodegroup-managed \ > -m "Create EKS Managed Node Group"
  17. Deploy the CloudFormation stack. Choose one of the following options, based on the type of node group you want to create:

    root@rok-tools:~/ops/deployments# aws cloudformation deploy \ > --stack-name ${EKS_NODEGROUP_CF_STACK?} \ > --template-file rok/eks/eks-managed-nodegroup.yaml Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - rok-us-west-2-arrikto-cluster-managed-nodegroup
    root@rok-tools:~/ops/deployments# aws cloudformation deploy \ > --stack-name ${EKS_NODEGROUP_CF_STACK?} \ > --template-file rok/eks/eks-managed-nodegroup-gpu.yaml Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - rok-us-west-2-arrikto-cluster-managed-nodegroup

    Troubleshooting

    AccessDenied

    If the above command fails with an error message similar to the following:

    An error occurred (AccessDenied) when calling the DescribeStacks operation: User: arn:aws:iam::123456789012:user/user is not authorized to perform: cloudformation:DescribeStacks on resource: arn:aws:cloudformation:us-west-2:123456789012:stack/rok-us-west-2-arrikto-cluster-managed-nodegroup

    it means that your IAM user does not have sufficient permissions to perform an action necessary to deploy an AWS CloudFormation stack.

    To proceed, Check Your Environment and contact your AWS administrator to grant sufficient permissions to your IAM user or deploy the AWS CloudFormation stack for you.

    Failed to create/update the stack

    If the above command fails with an error message similar to the following:

    Failed to create/update the stack. Run the following command to fetch the list of events leading up to the failure aws cloudformation describe-stack-events --stack-name rok-us-west-2-arrikto-cluster-managed-nodegroup

    describe the events of the CloudFormation stack to identify the root cause of the failure:

    root@rok-tools:~/ops/deployments# aws cloudformation describe-stack-events --stack-name ${EKS_NODEGROUP_CF_STACK?}
    • A stack event like the following:

      { "StackId": "arn:aws:cloudformation:us-west-2:123456789012:stack/rok-us-west-2-arrikto-cluster-managed-nodegroup/abdc3710-31cf-11ec-bcc2-0638bc4e4432", "EventId": "EKSNodegroup-CREATE_FAILED-2021-10-20T18:01:06.023Z", "StackName": "rok-us-west-2-arrikto-cluster-managed-nodegroup", "LogicalResourceId": "EKSNodegroup", "PhysicalResourceId": "", "ResourceType": "AWS::EKS::Nodegroup", "Timestamp": "2021-10-20T18:01:06.023000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "arrikto-cluster|general-workers already exists in stack arn:aws:cloudformation:us-west-2:123456789012:stack/rok-us-west-2-arrikto-cluster-managed-nodegroup/15096010-31cf-11ec-a9ba-02ff3360d2ec", "ResourceProperties": "{\"NodegroupName\":\"general-workers\",\"NodeRole\":\"arn:aws:iam::123456789012:role/eksNodeRole\",\"Subnets\":[\"subnet-0df5835ed92a7a4e3\"],\"AmiType\":\"AL2_x86_64\",\"ScalingConfig\":{\"MinSize\":\"1\",\"DesiredSize\":\"2\",\"MaxSize\":\"3\"},\"Version\":\"1.21\",\"DiskSize\":\"200\",\"ClusterName\":\"arrikto-cluster\",\"Labels\":{\"role\":\"general-worker\"},\"InstanceTypes\":[\"m5d.4xlarge\"],\"ReleaseVersion\":\"1.21.5-20220429\"}" }

      means that the node group that the AWS CloudFormation stack defines already exists, leading to name conflicts. To proceed, go back to step 3, specify a different name for the node group and follow the rest of the guide.

      Important

      Rok will run in all the node groups of the cluster, including the existing one.

    • A stack event like the following:

      { "StackId": "arn:aws:cloudformation:us-west-2:123456789012:stack/rok-us-west-2-arrikto-cluster-managed-nodegroup/32095760-31d9-11ec-8997-0a68dd1c0600", "EventId": "EKSNodegroup-CREATE_FAILED-2021-10-20T19:09:16.574Z", "StackName": "rok-us-west-2-arrikto-cluster-managed-nodegroup", "LogicalResourceId": "EKSNodegroup", "PhysicalResourceId": "", "ResourceType": "AWS::EKS::Nodegroup", "Timestamp": "2021-10-20T19:09:16.574000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "User: arn:aws:iam::123456789012:user/user/AWSCloudFormation is not authorized to perform: eks:TagResource on resource: arn:aws:eks:us-west-2:123456789012:cluster/arrikto-cluster (Service: AmazonEKS; Status Code: 403; Error Code: AccessDeniedException; Request ID: d68d784d-6a4c-48aa-b789-aa87e674849d; Proxy: null)", "ResourceProperties": "{\"NodegroupName\":\"general-workers\",\"NodeRole\":\"arn:aws:iam::123456789012:role/eksNodeRole\",\"Subnets\":[\"subnet-0df5835ed92a7a4e3\"],\"AmiType\":\"AL2_x86_64\",\"ScalingConfig\":{\"MinSize\":\"1\",\"DesiredSize\":\"2\",\"MaxSize\":\"3\"},\"Version\":\"1.21\",\"DiskSize\":\"200\",\"ClusterName\":\"arrikto-cluster\",\"Labels\":{\"role\":\"general-worker\"},\"InstanceTypes\":[\"m5d.4xlarge\"],\"ReleaseVersion\":\"1.21.5-20220429\"}" }

      means that your IAM user does not have sufficient permissions to perform an action necessary to create a node group via the AWS CloudFormation stack.

      To proceed, Check Your Environment and contact your AWS administrator to grant your IAM user sufficient permissions or deploy the AWS CloudFormation stack for you.

    • A stack event like the following:

      { "StackId": "arn:aws:cloudformation:us-west-2:123456789012:stack/rok-us-west-2-arrikto-cluster-managed-nodegroup/53539590-5e64-11ec-8d5a-0a0f12e4f94e", "EventId": "NodeLaunchTemplate-CREATE_FAILED-2021-12-16T11:36:06.483Z", "StackName": "rok-us-west-2-arrikto-cluster-managed-nodegroup", "LogicalResourceId": "NodeLaunchTemplate", "PhysicalResourceId": "", "ResourceType": "AWS::EC2::LaunchTemplate", "Timestamp": "2021-12-16T11:36:06.483000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "You are not authorized to perform this operation. Encoded authorization failure message: EGZ4pJLWnc8Fj297xGavQaF2Jo5aMk4ZQ-xHQegd69Z6AHB5orDoamNsCHQgIEKqKUZzscuctTrf_XXkONeFk1SoHc5xUTCpG_Czm6-oCCiQYLG_w6RyDPbR8Er31oGuyP19WCafeIWlsEQ4L6KfLpQn_oJSxAOjxbI7WBnG5UNa1VvEPl1M9wqYkqdGyPxac4rlxTDa1DHzCQ81fnIB9mC7VGGRswyY5V4h6Wm-Iz_IkxEazf8cFjEy2QAGf2twvG9D4TQpVrY9AYyjZfKW6aEeR0teBz_QBiqXj8VWWuBfl1c0dq9OZef9kpK2H4sLHxSYv9fT76NEdVt8t7Vh7saq6l8ReDi3VDXoBjygLrwFrI4hKeDLka4ssKWn1YWYY6X8WI9nI8hn8jOdlnKErm3o6fA250fGP6tuT5t9jqYU04cFgKRpQsOemWFS-NGgbm2aK6zzFlDCKT-HR3cllhdIp9fhQZGaO4OijLVLnphsDvyJGetN8IzJmLH2S2CpXCmQQPzCjBgOURO-eKkXPuwN_gLFs-58lr-5g0eF (Service: AmazonEC2; Status Code: 403; Error Code: UnauthorizedOperation; Request ID: 83c42355-86e2-45ba-82db-6ecd4c5a439f; Proxy: null)", "ResourceProperties": "{\"LaunchTemplateData\":{\"BlockDeviceMappings\":[{\"Ebs\":{\"VolumeType\":\"gp2\",\"VolumeSize\":\"200\",\"DeleteOnTermination\":\"true\"},\"DeviceName\":\"/dev/xvda\"},{\"Ebs\":{\"VolumeType\":\"gp2\",\"VolumeSize\":\"1000\",\"DeleteOnTermination\":\"true\"},\"DeviceName\":\"/dev/sdf\"}]}}" }

      means that your IAM user does not have sufficient permissions to create a launch template via the AWS CloudFormation stack.

      To proceed, Check Your Environment and contact your AWS administrator to grant your IAM user sufficient permissions or deploy the AWS CloudFormation stack for you.

    • A stack event like the following:

      { "StackId": "arn:aws:cloudformation:us-west-2:123456789012:stack/rok-us-west-2-arrikto-cluster-managed-nodegroup/e7bbc180-31eb-11ec-8eae-069520e483b6", "EventId": "EKSNodegroup-CREATE_FAILED-2021-10-20T21:23:14.135Z", "StackName": "rok-us-west-2-arrikto-cluster-managed-nodegroup", "LogicalResourceId": "EKSNodegroup", "PhysicalResourceId": "", "ResourceType": "AWS::EKS::Nodegroup", "Timestamp": "2021-10-20T21:23:14.135000+00:00", "ResourceStatus": "CREATE_FAILED", "ResourceStatusReason": "Requested release version 1.16.15-20210310 is not valid for kubernetes version 1.21. (Service: AmazonEKS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 943a6345-8a2d-4718-ac70-417027c2026a; Proxy: null)", "ResourceProperties": "{\"NodegroupName\":\"general-workers\",\"NodeRole\":\"arn:aws:iam::123456789012:role/eksNodeRole\",\"Subnets\":[\"subnet-0df5835ed92a7a4e3\"],\"AmiType\":\"AL2_x86_64\",\"ScalingConfig\":{\"MinSize\":\"1\",\"DesiredSize\":\"2\",\"MaxSize\":\"3\"},\"Version\":\"1.21\",\"DiskSize\":\"200\",\"ClusterName\":\"arrikto-cluster\",\"Labels\":{\"role\":\"general-worker\"},\"InstanceTypes\":[\"m5d.4xlarge\"],\"ReleaseVersion\":\"1.21.5-20220429\"}" }

      means that the selected AMI release version is invalid.

      Contact Arrikto

      Coordinate with Arrikto’s Tech Team to ensure that you are using the latest AMI and that you have container images that support the latest AMI and kernel.

    ValidationError

    If the above command fails with an error message similar to the following:

    An error occurred (ValidationError) when calling the CreateChangeSet operation: Stack:arn:aws:cloudformation:us-west-2:123456789012:stack/rok-us-west-2-arrikto-cluster-managed-nodegroup/88226820-31e9-11ec-a043-0609da0ab820 is in ROLLBACK_COMPLETE state and can not be updated.

    delete the stack and deploy it again.

Verify

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Restore the required context from previous sections:

    root@rok-tools:~/ops/deployments# source <(cat deploy/env.{eks-cluster,\ > eks-nodegroup,aws-subnets,aws-vpc})
    root@rok-tools:~/ops/deployments# export EKS_CLUSTER EKS_NODEGROUP_SUBNETS \ > AWS_SUBNETS_PUBLIC AWS_SUBNETS_PRIVATE AWS_VPC_ID
  3. Ensure all subnets that the EKS node group will use are among the public and private subnets:

    root@rok-tools:~/ops/deployments# echo ${EKS_NODEGROUP_SUBNETS?} subnet-018e3b5b3ec930ccb root@rok-tools:~/ops/deployments# echo ${AWS_SUBNETS_PUBLIC?} ${AWS_SUBNETS_PRIVATE?} subnet-0b936cdc4fae6862a subnet-0110cc3509ed64a7e subnet-018e3b5b3ec930ccb subnet-074cebd1b78c50066
  4. Ensure all the subnets that the EKS node group will use belong to the same availability zone. List the given subnets and ensure that the second column refers to exactly one AZ across all subnets:

    root@rok-tools:~/ops/deployments# aws ec2 describe-subnets \ > --subnet-ids ${EKS_NODEGROUP_SUBNETS?} \ > --filter Name=vpc-id,Values=${AWS_VPC_ID?} \ > --query 'Subnets[].[SubnetId,AvailabilityZone]' \ > --output table -------------------------------------------- | DescribeSubnets | +---------------------------+--------------+ | subnet-018e3b5b3ec930ccb | us-east-1a | +---------------------------+--------------+
  5. Verify that EC2 instances have been created:

    root@rok-tools:~/ops/deployments# aws ec2 describe-instances \ > --filters Name=tag-key,Values=kubernetes.io/cluster/${EKS_CLUSTER?} { "Reservations": [ { "Groups": [], "Instances": [ { "AmiLaunchIndex": 0, "ImageId": "ami-012b81faa674369fc", "InstanceId": "i-0a1795ed2c92c16d5", "InstanceType": "p4d.24xlarge", "LaunchTime": "2021-07-27T08:39:41+00:00", "Monitoring": { "State": "disabled" }, "Placement": { "AvailabilityZone": "eu-central-1b", "GroupName": "", "Tenancy": "default" }, ...
  6. Verify that all EC2 instances use IMDSv2 only:

    1. Retrieve the EC2 instance IDs of your cluster:

      root@rok-tools:~/ops/deployments# aws ec2 describe-instances \ > --filters Name=tag:kubernetes.io/cluster/${EKS_CLUSTER?},Values=owned \ > --query "Reservations[*].Instances[*].InstanceId" --output text i-075363bbf64a60e04 i-06a5ee72c6eed1bad
    2. Repeat the steps below for each one of the EC2 instance IDs in the list of the previous step.

      1. Specify the ID of the EC2 instance to operate on:

        root@rok-tools:~/ops/deployments# export INSTANCE_ID=<INSTANCE_ID>

        Replace <INSTANCE_ID> with one of the IDs you found in the previous step, for example:

        root@rok-tools:~/ops/deployments# export INSTANCE_ID=i-075363bbf64a60e04
      2. Verify that the HttpTokens metadata option is set to required:

        root@rok-tools:~/ops/deployments# aws ec2 get-launch-template-data \ > --instance-id ${INSTANCE_ID?} \ > --query 'LaunchTemplateData.MetadataOptions.HttpTokens == `required`' true
      3. Verify that the HttpPutResponseHopLimit metadata option is set to 1:

        root@rok-tools:~/ops/deployments# aws ec2 get-launch-template-data \ > --instance-id ${INSTANCE_ID?} \ > --query 'LaunchTemplateData.MetadataOptions.HttpPutResponseHopLimit == `1`' true
      4. Go back to step i and repeat steps i-iv for the remaining instance IDs.

  7. Verify that Kubernetes nodes have appeared:

    root@rok-tools:~/ops/deployments# kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-31-0-86.us-west-2.compute.internal Ready <none> 8m2s v1.21.5-eks-bc4871b ip-172-31-24-96.us-west-2.compute.internal Ready <none> 8m4s v1.21.5-eks-bc4871b

Summary

You have successfully created a managed node group.

What’s Next

The next step to disable unsafe operations for your EKS cluster.