Restrict Auto-Upgrades

This section will guide you through setting up a maintenance exclusion for your GKE cluster. This will prevent the following auto-upgrades:

  • minor and patch auto-upgrades on nodes
  • minor auto-upgrades on the control plane

The control plane will still be able to receive patch auto-upgrades.

Note

GKE minor auto-upgrades have an unpredictable cadence. This is why we have to disable them and perform them manually instead. We provide our own instructions for this.

Also, we cannot allow GKE to perform patch auto-upgrades on the nodes either. The reason is that GKE does not respect PodDisruptionBudgets indefinitely when upgrading a node. This could result into data loss since Rok relies on PodDisruptionBudgets to ensure that a drain operation will not complete until Rok has migrated all PVCs from the node.

Procedure

  1. Specify the name of the maintenance exclusion:

    root@rok-tools:~# export GKE_EXCLUSION=${GKE_CLUSTER?}-${CLUSTER_VERSION%*.*.*?}
  2. Specify a start date and time for the exclusion:

    root@rok-tools:~# export GKE_EXCLUSION_START=$(date --iso-8601=seconds)
  3. Specify an end date and time for the exclusion. Choose one of the following options, based on your Kubernetes version:

    root@rok-tools:~# export GKE_EXCLUSION_END=$(date \ > -d "${GKE_EXCLUSION_START?}+180 days" \ > --iso-8601=seconds)
    root@rok-tools:~# export GKE_EXCLUSION_END="2023-01-31T23:59:59-00:00"

    Note

    Maintenance exclusions have a maximum duration of 180 days and they cannot exceed the end of life date of the cluster’s Kubernetes minor version. The end of life date of each minor version is determined by the GKE release schedule.

  4. Specify the scope of maintenance to exclude:

    root@rok-tools:~# export GKE_EXCLUSION_SCOPE=no_minor_or_node_upgrades

    Important

    Do not use the no_minor_upgrades scope since node patch upgrades might cause data loss.

  5. Add the maintenance exclusion to your GKE cluster:

    root@rok-tools:~# gcloud container clusters update ${GKE_CLUSTER?} \ > --add-maintenance-exclusion-name ${GKE_EXCLUSION?} \ > --add-maintenance-exclusion-start ${GKE_EXCLUSION_START?} \ > --add-maintenance-exclusion-end ${GKE_EXCLUSION_END?} \ > --add-maintenance-exclusion-scope ${GKE_EXCLUSION_SCOPE?} Updating arrikto-cluster...done. Updated [https://container.googleapis.com/v1/projects/myproject/zones/us-east1-b/clusters/arrikto-cluster]. To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-east1-b/arrikto-cluster?project=myproject

    Troubleshooting

    End date exceeds minor version’s end of life

    If the above command fails with an error message similar to the following:

    ERROR: (gcloud.container.clusters.update) ResponseError: code=400, message=MaintenancePolicy.maintenanceExclusions["arrikto-cluster-1.20"].endTime needs to be before minor version 1.20 end of life: (2022-8). See release schedule at https://cloud.google.com/kubernetes-engine/docs/release-schedule.

    it means that the end date for the maintenance exclusion exceeds the end of life date of your cluster’s Kubernetes minor version.

    To proceed, run the following steps:

    1. Inspect the GKE release schedule for your cluster’s minor Kubernetes version in the Stable release channel and find the end of life date.

    2. Provide an end date for the exclusion that does not exceed the end of life date you found in the previous step:

      root@rok-tools:~# DATE=<DATE>

      Replace <DATE> with your desired end date. For example:

      root@rok-tools:~# DATE="2022-04-11"
    3. Specify the end date and time for the exclusion in ISO 8601 format, based on the date you provided in the previous step:

      root@rok-tools:~# export GKE_EXCLUSION_END="${DATE?}T23:59:59-00:00"
    4. Go back to step 5 and try again.

Verify

  1. Ensure that the maintenance exclusion exists:

    root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format=json \ > | jq -e \ > ".maintenancePolicy.window.maintenanceExclusions[\"${GKE_EXCLUSION?}\"]" >/dev/null \ > && echo OK || echo FAIL OK
  2. Verify that the start date and time of the maintenance exclusion is prior to the present date and time, that is, the maintenance exclusion is active:

    root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format=json \ > | jq -e \ > ".maintenancePolicy.window.maintenanceExclusions[\"${GKE_EXCLUSION?}\"].startTime \ > | fromdateiso8601 | select(. <= ($(date +%s)))" \ > >/dev/null && echo OK || echo FAIL OK

    Troubleshooting

    The output of the command is FAIL

    If the output of the above command is FAIL, it means that the maintenance exclusion has not started yet.

    To proceed, run the following steps:

    1. Delete the old maintenance exclusion:

      root@rok-tools:~# gcloud container clusters update ${GKE_CLUSTER?} \ > --remove-maintenance-exclusion ${GKE_EXCLUSION?} Updating arrikto-cluster...done. Updated [https://container.googleapis.com/v1/projects/myproject/zones/us-east1-b/clusters/arrikto-cluster]. To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-east1-b/arrikto-cluster?project=myproject
    2. Create a new maintenance exclusion, by re-running this guide from the beginning.

  3. Verify that the maintenance exclusion will not expire within the next 60 days:

    root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format=json \ > | jq -e \ > ".maintenancePolicy.window.maintenanceExclusions[\"${GKE_EXCLUSION?}\"].endTime \ > | fromdateiso8601 | select(. >= ($(date -d "$(date --iso-8601=seconds)+ 60 days" +%s)))" \ > >/dev/null && echo OK || echo FAIL OK

    Troubleshooting

    The output of the command is FAIL

    If the output of the above command is FAIL, it means that the maintenance exclusion expires in less than 60 days.

    To proceed, check the release schedule for your cluster’s version. If the end of life date is less than 60 days, then your cluster’s version has entered the maintenace period and is near its end of life. In this case, proceed with the rest of the installation, and then upgrade to a newer version immediately following our upgrade instructions. Otherwise, run the following steps to extend the exclusion period:

    1. Save the name of the maintenance exclusion:

      root@rok-tools:~# export GKE_EXCLUSION_OLD=${GKE_EXCLUSION?}
    2. Create a new maintenance exclusion with a new name by following the Procedure.

    3. Delete the old maintenance exclusion:

      root@rok-tools:~# gcloud container clusters update ${GKE_CLUSTER?} \ > --remove-maintenance-exclusion ${GKE_EXCLUSION_OLD?} Updating arrikto-cluster...done. Updated [https://container.googleapis.com/v1/projects/myproject/zones/us-east1-b/clusters/arrikto-cluster]. To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-east1-b/arrikto-cluster?project=myproject
    4. Rerun the Verify section for the new maintenance exclusion.

  4. Ensure that the scope of the maintenance exclusion disables minor and node patch upgrades:

    root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(maintenancePolicy.window.maintenanceExclusions[\"${GKE_EXCLUSION?}\"].maintenanceExclusionOptions.scope)" NO_MINOR_OR_NODE_UPGRADES

Summary

You have successfully set up a maintenance exclusion for your GKE cluster.

What’s Next

The next step is to get access to your GKE cluster.