Recover Pods From Out of Space Errors

Starting from release 1.5, Rok ships with Rok Scheduler, a custom extension of the Kubernetes scheduler that supports capacity aware scheduling.

The Rok Scheduler schedules Pods to nodes with sufficient free space to provision their new volumes. However, it does not yet support capacity aware scheduling for unpinned volumes. As a consequence, when the user drains a node, and Rok unpins the volumes, it may schedule the Pods of the drained node on nodes without enough storage available for their Rok volumes.

Scheduling a Pod on a node with insufficient space for its unpinned volumes will result in the Pod getting stuck at Init state, because there is not enough free space for Rok to recover the volumes.

This guide will walk you through recovering such Pods and migrating their volumes to new nodes.

Check Your Environment

  1. Check if a Pod using one or more Rok PVCs is stuck at Init state:

    root@rok-tools:~# kubectl get pods -n personal-user NAME READY STATUS RESTARTS AGE test-notebook-0 0/2 Init:0/1 0 60s
  2. Specify the name of the Pod:

    root@rok-tools:~# export POD=<POD_NAME>

    Replace <POD_NAME> with the name of the Pod, for example:

    root@rok-tools:~# export POD=test-notebook-0
  3. Specify the Pod’s namespace:

    root@rok-tools:~# export NAMESPACE=<POD_NAMESPACE>

    Replace <POD_NAMESPACE> with the namespace of the Pod, for example:

    root@rok-tools:~# export NAMESPACE=personal-user
  4. List all Rok PVCs used by the Pod, along with their access mode:

    root@rok-tools:~# kubectl get pods -n ${NAMESPACE:?} ${POD:?} -ojson \ > | jq -r '.spec.volumes[]?.persistentVolumeClaim.claimName | values' \ > | xargs -r -n1 kubectl get pvc -n ${NAMESPACE:?} -ojson \ > | jq -r 'select(.spec.storageClassName=="rok") | .metadata.name,.spec.accessModes[0]' \ > | paste - - test-notebook-datavol-1-6grlb ReadWriteMany test-notebook-workspace-65qvh ReadWriteOnce
  5. Verify that the Pod is stuck because the node does not have enough free space to restore the Pod’s volumes. For each PVC listed in the output of the previous step:

    1. Specify the name of the PVC:

      root@rok-tools:~# export PVC=<PVC_NAME>

      Replace <PVC_NAME> with the name of the PVC, for example:

      root@rok-tools:~# export PVC=test-notebook-datavol-1-6grlb
    2. Describe the PVC. If the node doesn’t have enough free space to restore the volume, you will see events like the following:

      root@rok-tools:~# kubectl describe pvc -n ${NAMESPACE:?} ${PVC:?} Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning JobFailed 2m9s rok-csi Job Failed: Insufficient free space: 318901321728 bytes required, but only 258708865024 bytes available: Command `<ExtCommand [2mwNle-iCr8] `lvcreate -n roklvm-9e0e95a6-6d3d-4086-bf4f-b92746d2044c-data -L 318901321728B rokvg --wipesignatures n --config "devices { global_filter = [ 'r|^/dev/.*roklvm.*|' ] }"', status=FINISHED (ret: 5), PID=7360, shell=False>' failed. Error log: Volume group "rokvg" has insufficient free space (61681 extents): 76032 required.\n: Run `kubectl logs -n rok rok-csi-node-pmv4z -c csi-node' for more information
    3. Note the name of the PVC that has failed and its access mode, as you are going to use them later.

Procedure

  1. For each failed RWX (ReadWriteMany) PVC you noted earlier, do the following. If there are no affected RWX PVCs, skip this step.

    1. Specify the name of the RWX PVC:

      root@rok-tools:~# export RWX_PVC=<RWX_PVC_NAME>

      Replace <RWX_PVC_NAME> with the name of the RWX PVC, for example:

      root@rok-tools:~# export RWX_PVC=test-notebook-datavol-1-6grlb
    2. Get the name of the RWX PV:

      root@rok-tools:~# export RWX_PV=$(kubectl get pvc \ > -n ${NAMESPACE:?} ${RWX_PVC:?} -ojson \ > | jq -r '.spec.volumeName')
    3. Get the name of the RWO PV backing the RWX volume:

      root@rok-tools:~# export ACCESS_SERVER_PV=$(kubectl get pvc \ > -n rok vol-rok-access-${RWX_PV:?}-0 -ojson \ > | jq -r '.spec.volumeName')
    4. Initialize an empty array to store the cordoned nodes:

      root@rok-tools:~# NODES=()
    5. Find the node where the volume lives:

      root@rok-tools:~# export NODE=$(kubectl get pv ${ACCESS_SERVER_PV:?} -ojson \ > | jq -r '.spec.nodeAffinity.required.nodeSelectorTerms[]?.matchExpressions[].values[]')
    6. Append the node to the array of cordoned nodes:

      root@rok-tools:~# NODES+=(${NODE:?})
    7. Cordon the node:

      root@rok-tools:~# kubectl cordon ${NODE:?} node/ip-192-168-173-13.eu-central-1.compute.internal cordoned
    8. Delete the access-server Pod:

      root@rok-tools:~# kubectl delete pods -n rok rok-access-${RWX_PV:?}-0 pod "rok-access-pvc-4d7b0f3d-b9da-49af-a089-a468e912d531-0" deleted
    9. Wait until Rok moves the volume to a new node. Describe the PVC and wait until you see the following events:

      root@rok-tools:~# watch "kubectl describe pvc -n ${NAMESPACE:?} ${RWX_PVC:?} | tail" Every 2.0s: kubectl describe pvc -n personal-user test-notebook-datavol-1-6grlb | tail rok-tools: Thu Jun 23 11:07:17 2022 ... Normal INFO 47s rok-csi Successfully pinned PVC `vol-rok-access-pvc-4d7b0f3d-b9da-49af-a089-a468e912d531-0' (PV `pvc-1bf297c4-e8b4-4ef6-890f-bbed465dd565') to node `ip-192-168-144-56.eu-central-1.compute.internal' ... Normal INFO 45s rok-csi Successfully recovered volume `pvc-1bf297c4-e8b4-4ef6-890f-bbed465dd565' ...

      Troubleshooting

      The new node has insufficient free space.

      If the new node Kubernetes picks for the volume doesn’t have enough free space either, you will see the following events in the output of kubectl describe:

      Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning JobFailed 2m9s rok-csi Job Failed: Insufficient free space: 318901321728 bytes required, but only 258708865024 bytes available: Command `<ExtCommand [2mwNle-iCr8] `lvcreate -n roklvm-9e0e95a6-6d3d-4086-bf4f-b92746d2044c-data -L 318901321728B rokvg --wipesignatures n --config "devices { global_filter = [ 'r|^/dev/.*roklvm.*|' ] }"', status=FINISHED (ret: 5), PID=7360, shell=False>' failed. Error log: Volume group "rokvg" has insufficient free space (61681 extents): 76032 required.\n: Run `kubectl logs -n rok rok-csi-node-pmv4z -c csi-node' for more information

      In this case, go back to step 1e and repeat the steps for the new node.

      Note

      You may need to repeat these steps more than once, until the new node has sufficient free space for the Pod’s volume.

      No more nodes available in the cluster.

      In case you have cordoned all nodes in the cluster, which means that none of the existing nodes has sufficient free space available for the Pod’s volume, you will see the following events when decribing the access-server Pod:

      root@rok-tools:~# kubectl describe pod -n rok rok-access-${RWX_PV:?}-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Normal NotTriggerScaleUp 30s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max node group size reached

      In this case, you have to manually scale up your cluster and add one or more nodes with enough storage capacity to accommodate the Pod’s volume, or increase the maximum size of the appropriate nodegroup, so the Cluster Autoscaler can scale up the cluster automatically. For EKS check out the Scale Out EKS Cluster guide.

    10. Uncordon the nodes:

      root@rok-tools:~# kubectl uncordon ${NODES[@]:?} node/ip-192-168-173-13.eu-central-1.compute.internal uncordoned
  2. For affected RWO (ReadWriteOnce) PVCs, do the following. If there are no affected RWO PVCs, skip this step.

    1. List all RWO Rok PVCs used by the Pod:

      root@rok-tools:~# kubectl get pods -n ${NAMESPACE:?} ${POD:?} -ojson \ > | jq -r '.spec.volumes[]?.persistentVolumeClaim.claimName | values' \ > | xargs -r -n1 kubectl get pvc -n ${NAMESPACE:?} -ojson \ > | jq -r 'select(.spec.storageClassName=="rok" and .spec.accessModes[0]=="ReadWriteOnce") | .metadata.name' test-notebook-workspace-65qvh ReadWriteOnce
    2. Initialize an empty array to store the cordoned nodes:

      root@rok-tools:~# NODES=()
    3. Pick an affected RWO PVC and specify its name:

      root@rok-tools:~# export RWO_PVC=<RWO_PVC_NAME>

      Since all RWO PVCs live on the same node, it doesn’t matter which one you pick. Replace <RWO_PVC_NAME> with the name of the RWO PVC, for example:

      root@rok-tools:~# export RWO_PVC=test-notebook-workspace-65qvh
    4. Get the name of the RWO PV:

      root@rok-tools:~# export RWO_PV=$(kubectl get pvc \ > -n ${NAMESPACE:?} ${RWO_PVC:?} -ojson \ > | jq -r '.spec.volumeName')
    5. Find the node where the volume lives:

      root@rok-tools:~# export NODE=$(kubectl get pv ${RWO_PV:?} -ojson \ > | jq -r '.spec.nodeAffinity.required.nodeSelectorTerms[]?.matchExpressions[].values[]')
    6. Append the node to the array of cordoned nodes:

      root@rok-tools:~# NODES+=(${NODE:?})
    7. Cordon the node:

      root@rok-tools:~# kubectl cordon ${NODE:?} node/ip-192-168-151-238.eu-central-1.compute.internal cordoned
    8. Delete all Pods using the same RWO PVCs as the affected Pod. This will allow Rok to move the volumes to a new node:

      root@rok-tools:~# kubectl get pods -n ${NAMESPACE:?} ${POD:?} -ojson \ > | jq -r '.spec.volumes[]?.persistentVolumeClaim.claimName | values' \ > | xargs -r -n1 kubectl get pvc -n ${NAMESPACE:?} -ojson \ > | jq -r 'select(.spec.storageClassName=="rok") | select(.spec.accessModes[]?=="ReadWriteOnce") | .metadata.name' \ > | while read pvc; do kubectl get pods -n ${NAMESPACE:?} -ojson \ > | jq -r --arg PVC "${pvc:?}" \ > '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name' \ > | xargs -r -n1 kubectl delete pods -n ${NAMESPACE:?}; done pod "pvc-viewer-test-notebook-workspace-65qvh" deleted pod "test-notebook-0" deleted
    9. After deleting the Pod, its name might change. This can happen, for example, if the Pod is managed by a Deployment. In such a case, specify the name of the Pod again. Replace <POD_NAME> with the new name of the Pod:

      root@rok-tools:~# export POD=<POD_NAME>
    10. For each RWO PVC listed in the output of step 2a:

      1. Specify the name of the RWO PVC:

        root@rok-tools:~# export RWO_PVC=<RWO_PVC_NAME>

        Replace <RWO_PVC_NAME> with the name of the RWO PVC, for example:

        root@rok-tools:~# export RWO_PVC=test-notebook-workspace-65qvh
      2. Wait until Rok moves the volume to a new node. Describe the PVC and wait until you see the following events:

        root@rok-tools:~# watch "kubectl describe pvc -n ${NAMESPACE:?} ${RWO_PVC:?} | tail" Every 2.0s: kubectl describe pvc -n personal-user test-notebook-workspace-65qvh | tail rok-tools: Thu Jun 23 11:07:17 2022 ... Normal INFO 47s rok-csi Successfully pinned PVC `test-notebook-workspace-65qvh' (PV `pvc-9bf4843e-1630-403f-a418-cd279abc9813') to node `ip-192-168-144-56.eu-central-1.compute.internal' ... Normal INFO 45s rok-csi Successfully recovered volume `pvc-9bf4843e-1630-403f-a418-cd279abc9813' ...

        Troubleshooting

        The new node has insufficient free space.

        The new node Kubernetes picks for the volume doesn’t have enough free space either. In this case you will see the following events in the output of kubectl describe:

        Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning JobFailed 2m9s rok-csi Job Failed: Insufficient free space: 318901321728 bytes required, but only 258708865024 bytes available: Command `<ExtCommand [2mwNle-iCr8] `lvcreate -n roklvm-9e0e95a6-6d3d-4086-bf4f-b92746d2044c-data -L 318901321728B rokvg --wipesignatures n --config "devices { global_filter = [ 'r|^/dev/.*roklvm.*|' ] }"', status=FINISHED (ret: 5), PID=7360, shell=False>' failed. Error log: Volume group "rokvg" has insufficient free space (61681 extents): 76032 required.\n: Run `kubectl logs -n rok rok-csi-node-pmv4z -c csi-node' for more information

        In this case, go back to step 2c, and repeat the steps for the new node.

        Note

        You may need to repeat these steps more than once, until the new node has sufficient free space for the Pod’s volume(s).

        No more nodes available in the cluster.

        In case you have cordoned all nodes in the cluster, which means that none of the existing nodes has sufficient free space available for the Pod’s volume(s), you will see the following events when decribing the Pod:

        root@rok-tools:~# kubectl describe pod -n ${NAMESPACE:?} ${POD:?} Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Normal NotTriggerScaleUp 30s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max node group size reached

        In this case, you have to manually scale up your cluster and add one or more nodes with enough storage capacity to accommodate the Pod’s volume(s), or increase the maximum size of the appropriate nodegroup, so the Cluster Autoscaler can scale up the cluster automatically. For EKS check out the Scale Out EKS Cluster guide.

    11. Uncordon the nodes:

      root@rok-tools:~# kubectl uncordon ${NODES[@]:?} node/ip-192-168-151-238.eu-central-1.compute.internal uncordoned

Verify

  1. Wait until the Pod is up and running:

    root@rok-tools:~# watch kubectl get pod -n ${NAMESPACE:?} ${POD:?} Every 2.0s: kubectl get pod -n personal-user test-notebook-0 rok-tools: Thu Jun 23 11:05:52 2022 NAME READY STATUS RESTARTS AGE test-notebook-0 2/2 Running 0 8m31s

Summary

You have successfully recovered the stuck Pod and migrated its volumes to nodes with enough space to accommodate them.

What’s Next

Check out the rest of the maintenance operations that you can perform on your cluster.