Upgrade AKS Node Pools¶

This section will guide you through upgrading the node pools of your AKS cluster to match the Kubernetes version of the control plane.

Overview

What You’ll Need
Procedure
Verify
Summary
What’s Next

What You’ll Need ¶

A configured management environment.
An existing AKS cluster.
An existing Rok deployment.

Procedure ¶

Ensure that Rok is up and running:

root@rok-tools:~# kubectl get rokcluster -n rok rok \ > -o jsonpath='{.status.health}{"\n"}' OK
Ensure that rest of the Pods are running. Verify that field STATUS is Running and field READY is N/N for all Pods:

root@rok-tools:~# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE auth dex-0 2/2 Running 0 1h cert-manager cert-manager-686bcc964d 1/1 Running 0 1h ...
List the node pools of your cluster and the corresponding Kubernetes version:

root@rok-tools:~# az aks nodepool list -o table \ > --resource-group ${AZ_RESOURCE_GROUP} \ > --cluster-name ${AKS_CLUSTER} Name OsType KubernetesVersion VmSize Count MaxPods ProvisioningState Mode --------- -------- ------------------- --------------- ------- --------- ------------------- ------ agentpool Linux v1.23.8 Standard_DS2_v2 2 110 Succeeded System workers Linux v1.23.8 standard_l8s_v2 3 250 Succeeded User
Specify the name of the system node pool to upgrade:

root@rok-tools:~# export SYSTEM_NODE_POOL_NAME=<NODE_POOL>

Replace <NODE_POOL> with the name of the system node pool running the old Kubernetes version. For example:

root@rok-tools:~# export SYSTEM_NODE_POOL_NAME=agentpool
Upgrade the system node pool to the new Kubernetes version:

root@rok-tools:~# az aks nodepool upgrade \ > --cluster-name ${AKS_CLUSTER?} \ > --name ${SYSTEM_NODE_POOL_NAME?} \ > --resource-group ${AZ_RESOURCE_GROUP?} \ > --kubernetes-version ${CLUSTER_VERSION?}
Specify the name of the user node pool to upgrade:

root@rok-tools:~# export USER_NODE_POOL_NAME=<NODE_POOL>

Replace <NODE_POOL> with the name of the user node pool running the old Kubernetes version. For example:

root@rok-tools:~# export USER_NODE_POOL_NAME=workers
Inspect the configuration of the old user node pool and note down the following configurations, as you are going to use them later:
- the number of nodes:
  
  root@rok-tools:~# az aks nodepool show \ > --cluster-name ${AKS_CLUSTER} \ > --name ${USER_NODE_POOL_NAME?} \ > --resource-group ${AZ_RESOURCE_GROUP} \ > --query count 3
- the VM size:
  
  root@rok-tools:~# az aks nodepool show \ > --cluster-name ${AKS_CLUSTER} \ > --name ${USER_NODE_POOL_NAME?} \ > --resource-group ${AZ_RESOURCE_GROUP} \ > --query vmSize "standard_l8s_v2"
- the zones in which to deploy the node pool:
  
  root@rok-tools:~# az aks nodepool show \ > --cluster-name ${AKS_CLUSTER} \ > --name ${USER_NODE_POOL_NAME?} \ > --resource-group ${AZ_RESOURCE_GROUP} \ > --query availabilityZones[0] "1"
Follow the Create User Node Pool guide to create a new user node pool with a new name, the same Kubernetes minor version as the control plane, and the same machine type, number of nodes, and number of local NVMe SSDs that you found in the previous step. Then, come back to this guide and continue with this procedure.
Find the nodes of the old user node pool:

root@rok-tools:~# nodes=$(kubectl get nodes \ > -o jsonpath="{range .items[?(@.metadata.labels.agentpool== \ > \"${USER_NODE_POOL_NAME?}\")]}{.metadata.name}{\"\n\"}") \ > && echo "${nodes?}" aks-workers-42403446-vmss000000 aks-workers-42403446-vmss000001 aks-workers-42403446-vmss000002
Cordon old nodes, that is, disable scheduling on them:

root@rok-tools:~# for node in $nodes; do kubectl cordon $node; done node/aks-workers-42403446-vmss000000 cordoned node/aks-workers-42403446-vmss000001 cordoned node/aks-workers-42403446-vmss000002 cordoned
Drain the old nodes one-by-one. Repeat steps a-d for each one of the old nodes:
1. Pick a node from the old user node pool:
  
  root@rok-tools:# export node=<NODE>
  
  Replace <NODE> with the node you want to drain, for example:
  
  root@rok-tools:~# export node=aks-workers-42403446-vmss000000
2. Drain the node:
  
  root@rok-tools:~# kubectl drain --ignore-daemonsets --delete-emptydir-data $node node/aks-workers-42403446-vmss000000 already cordoned evicting pod "rok-redis-0" evicting pod "ml-pipeline-scheduledworkflow-7bddd546b-4f4j5" ...
  
  Note
  
  This may take a while, since Rok is unpinning all volumes on this node and will evict rok-csi-guard Pods last.
  
  Warning
  
  Do not delete rok-csi-guard Pods manually, since this might cause data loss.
  
  Troubleshooting
  
  The command does not complete.
  
  Most likely the unpinning of a Rok PVC fails. Inspect the logs of the Rok CSI Controller to debug further.
3. Wait for the drain command to finish successfully.
4. Ensure that all Pods that got evicted have migrated correctly and are up and running again.
  1. Ensure that Rok has scaled up and is up and running:
    
    root@rok-tools:# kubectl get rokcluster -n rok rok \ > -o jsonpath='{.status.health}{"\n"}' OK
  2. Ensure that rest of the Pods are running. Verify that field STATUS is Running and field READY is N/N for all Pods:
    
    root@rok-tools:# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE auth dex-0 2/2 Running 0 1h cert-manager cert-manager-686bcc964d 1/1 Running 0 1h ...
    
    Note
    
    rok-csi-guard Pods are expected to be in Pending status.
5. Go back to step a, and repeat the steps for the remaining old nodes.
Delete the old user node pool:

root@rok-tools:~# az aks nodepool delete \ > --cluster-name ${AKS_CLUSTER?} \ > --name workers \ > --resource-group ${AZ_RESOURCE_GROUP?}

Verify ¶

Ensure that all nodes in the node pools are ready and run the new Kubernetes version. Verify that field STATUS is Ready and field VERSION shows the new Kubernetes version. Choose one the following options, based on the upgrade you performed:

Kubernetes 1.23 to 1.24

Kubernetes 1.22 to 1.23

root@rok-tools:~# kubectl get nodes NAME STATUS ROLES AGE VERSION aks-agentpool-42403446-vmss000000 Ready agent 34m v1.24.9 aks-agentpool-42403446-vmss000001 Ready agent 34m v1.24.9 aks-new-workers-24949525-vmss000000 Ready agent 9m45s v1.24.9 aks-new-workers-24949525-vmss000001 Ready agent 9m45s v1.24.9 aks-new-workers-24949525-vmss000002 Ready agent 9m45s v1.24.9

root@rok-tools:~# kubectl get nodes NAME STATUS ROLES AGE VERSION aks-agentpool-42403446-vmss000000 Ready agent 34m v1.23.8 aks-agentpool-42403446-vmss000001 Ready agent 34m v1.23.8 aks-new-workers-24949525-vmss000000 Ready agent 9m45s v1.23.8 aks-new-workers-24949525-vmss000001 Ready agent 9m45s v1.23.8 aks-new-workers-24949525-vmss000002 Ready agent 9m45s v1.23.8

Summary ¶

You have successfully upgraded your node pools.

What’s Next ¶

The next step is to configure the Rok Scheduler for the Kubernetes version of your AKS cluster.

Configure Rok Scheduler for your Kubernetes Version

Previous Next

Upgrade AKS Node Pools¶

What You’ll Need¶

Procedure¶

Verify¶

Summary¶