Create GKE Cluster

This section will guide you through creating a GKE cluster using the Google Cloud SDK. After completing this guide you will have a GKE cluster with:

  • one of the Kubernetes versions EKF supports, that is, 1.21, 1.22, and 1.23.
  • Worker nodes with local NVMe SSDs.

Procedure

To create the GKE cluster follow the steps below:

  1. Switch to your management environment and specify the cluster name:

    root@rok-tools:~# export GKE_CLUSTER=arrikto-cluster
  2. Specify the Kubernetes version. Choose one of the following options, among the supported Kubernetes versions:

    root@rok-tools:~# export CLUSTER_VERSION=1.23.11-gke.300
    root@rok-tools:~# export CLUSTER_VERSION=1.22.15-gke.100
    root@rok-tools:~# export CLUSTER_VERSION=1.21.14-gke.7100
  3. Specify the name of the default node pool:

    root@rok-tools:~# export NODE_POOL_NAME=default-workers
  4. Specify the machine type:

    root@rok-tools:~# export MACHINE_TYPE=n1-standard-8
  5. Specify the number of nodes to create:

    root@rok-tools:~# export NUM_NODES=3
  6. Specify the number of local NVMe SSDs to add:

    root@rok-tools:~# export NUM_SSD=3

    Note

    Rok will automatically find and use all local SSDs, which are expected to be unformatted. Each local NVMe SSD is 375 GB in size. You can attach a maximum of 24 local SSD partitions for 9 TB per instance.

  7. Create the cluster:

    root@rok-tools:~# gcloud alpha container clusters create ${GKE_CLUSTER?} \ > --account ${CLUSTER_ADMIN_ACCOUNT?} \ > --cluster-version ${CLUSTER_VERSION?} \ > --release-channel stable \ > --no-enable-basic-auth \ > --node-pool-name ${NODE_POOL_NAME?} \ > --machine-type ${MACHINE_TYPE?} \ > --image-type UBUNTU_CONTAINERD \ > --disk-type pd-ssd \ > --disk-size 200 \ > --local-ssd-volumes count=${NUM_SSD?},type=nvme,format=block \ > --metadata disable-legacy-endpoints=True \ > --workload-pool=${PROJECT_ID?}.svc.id.goog \ > --scopes gke-default \ > --num-nodes ${NUM_NODES?} \ > --logging=SYSTEM,WORKLOAD \ > --monitoring=SYSTEM \ > --enable-ip-alias \ > --default-max-pods-per-node 110 \ > --no-enable-master-authorized-networks \ > --no-enable-intra-node-visibility \ > --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \ > --max-surge-upgrade 1 \ > --max-unavailable-upgrade 0 \ > --enable-autoupgrade \ > --enable-autorepair \ > --enable-shielded-nodes

    Troubleshooting

    The command fails with ‘Insufficient regional quota to satisfy request: resource “SSD_TOTAL_GB”’

    Ensure that your region has enough quotas for local SSD. To inspect the usage / limits run:

    root@rok-tools:~# gcloud compute regions describe ${REGION?} --format json | \ > jq -r '.quotas[] | select(.metric=="SSD_TOTAL_GB") | "\(.usage)/\(.limit)"'

    Either delete some resources or choose a different region/zone.

    The command fails with ‘Master version is unsupported’

    If the above command fails with an error message similar to the following:

    ERROR: (gcloud.alpha.container.clusters.create) ResponseError: code=400, message=Master version "1.23.11-gke.300" is unsupported.

    it means that the Kubernetes version you have specified is not supported in your selected zone.

    To proceed, do the following:

    1. Check the Kubernetes versions that are available in your selected zone:

      root@rok-tools:~# gcloud container get-server-config \ > --flatten="channels" \ > --filter="channels.channel=STABLE" \ > --format="yaml(channels.channel,channels.validVersions)" Fetching server config for us-east1-b --- channels: channel: STABLE validVersions: - 1.23.12-gke.2300 - 1.22.14-gke.4300
    2. Select one of the available Kubernetes versions you found in the previous step:

      root@rok-tools:~# export CLUSTER_VERSION=<CLUSTER_VERSION>

      Replace <CLUSTER_VERSION> with your selected Kubernetes version. For example:

      root@rok-tools:~# export CLUSTER_VERSION=1.23.12-gke.2300
      root@rok-tools:~# export CLUSTER_VERSION=1.22.14-gke.4300
      root@rok-tools:~# export CLUSTER_VERSION=1.21.15-gke.6300
    3. Go back to step 7 and create the cluster.

    Note

    This will create a zonal cluster with 3 nodes in the cluster’s primary zone. It will use the default network and subnet in the zone.

Verify

  1. Ensure that the GKE cluster exists and its status is RUNNING:

    root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(status)" RUNNING

    Troubleshooting

    The status is RECONCILING

    If the status of the GKE cluster is RECONCILING, it means that some work is actively being done on the cluster.

    One possibility is that there is an auto-upgrade in progress. Check for running control plane and node upgrade operations:

    root@rok-tools:~# gcloud container operations list \ > --filter="TYPE:(UPGRADE_MASTER OR UPGRADE_NODES) AND \ > TARGET:(${GKE_CLUSTER?} OR \ > ${NODE_POOL_NAME?}) AND STATUS:RUNNING"

    You can also check for other running operations:

    root@rok-tools:~# gcloud container operations list \ > --filter="STATUS:RUNNING"

    In any case, wait until the running operations complete and re-run this verification step.

  2. Ensure that the GKE cluster is enrolled in the STABLE release channel:

    root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(releaseChannel.channel)" STABLE
  3. Obtain the Kubernetes version of the control plane:

    root@rok-tools:~# VERSION=$(gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(currentMasterVersion)")
  4. Ensure that the control plane runs the desired Kubernetes minor version:

    root@rok-tools:~# [[ ${VERSION%*.*.*?} == ${CLUSTER_VERSION%*.*.*?} ]] \ > && echo OK || echo FAIL OK
  5. Get the list of the node pools:

    root@rok-tools:~# gcloud container node-pools list --cluster=${GKE_CLUSTER?} NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-workers n1-standard-8 200 1.23.11-gke.300
  6. Ensure the default node pool exists and its status is RUNNING:

    root@rok-tools:~# gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} \ > --format="value(status)" RUNNING
  7. Obtain the Kubernetes version of your default node pool:

    root@rok-tools:~# VERSION=$(gcloud container clusters describe ${GKE_CLUSTER?} \ > --format=json \ > | jq -r ".nodePools[] | select(.name == \"${NODE_POOL_NAME?}\") | .version")
  8. Ensure that the default node pool runs the desired Kubernetes version:

    root@rok-tools:~# [[ ${VERSION?} == ${CLUSTER_VERSION?} ]] \ > && echo OK || echo FAIL OK
  9. Verify that all instances of your node pool have the necessary storage attached:

    1. Find the instance group that corresponds to the default-workers node pool:

      root@rok-tools:~# export INSTANCE_GROUP=$(gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} \ > --format="value(instanceGroupUrls)")
    2. Find the template of the instance group:

      root@rok-tools:~# export TEMPLATE=$(gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(instanceTemplate)")
    3. Inspect the template and ensure that kube-env metadata key has the expected NODE_LOCAL_SSDS_EXT:

      root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.metadata.items[] | select(.key == "kube-env") | .value' | \ > grep NODE_LOCAL_SSDS NODE_LOCAL_SSDS_EXT: 3,nvme,block
    4. Inspect the template and ensure that it has NVMe local SSDs attached. The command below will list all disks of type SCRATCH and show their interface. It should be NVME:

      root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.disks[] | select(.type == "SCRATCH") | .index, .deviceName, .interface' | paste - - - 1 local-ssd-0 NVME 2 local-ssd-1 NVME 3 local-ssd-2 NVME
    5. Ensure that all instances inside the instance group run with the desired template:

      root@rok-tools:~# gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(status.versionTarget.isReached)" True

Summary

You have successfully created your GKE cluster.

What’s Next

The next step is to restrict auto-upgrades for your GKE cluster.