Create GKE Cluster¶

This section will guide you through creating a GKE cluster using the Google Cloud SDK. After completing this guide you will have a GKE cluster with:

one of the Kubernetes versions EKF supports, that is, 1.21, 1.22, and 1.23.
Worker nodes with local NVMe SSDs.

What You’ll Need ¶

A configured management environment.
A configured cloud environment.

Procedure ¶

To create the GKE cluster follow the steps below:

Switch to your management environment and specify the cluster name:

root@rok-tools:~# export GKE_CLUSTER=arrikto-cluster
Specify the Kubernetes version. Choose one of the following options, among the supported Kubernetes versions:

Kubernetes 1.23

Kubernetes 1.22

Kubernetes 1.21

root@rok-tools:~# export CLUSTER_VERSION=1.23.11-gke.300

root@rok-tools:~# export CLUSTER_VERSION=1.22.15-gke.100

root@rok-tools:~# export CLUSTER_VERSION=1.21.14-gke.7100
Specify the name of the default node pool:

root@rok-tools:~# export NODE_POOL_NAME=default-workers
Specify the machine type:

root@rok-tools:~# export MACHINE_TYPE=n1-standard-8
Specify the number of nodes to create:

root@rok-tools:~# export NUM_NODES=3
Specify the number of local NVMe SSDs to add:

root@rok-tools:~# export NUM_SSD=3

Note

Rok will automatically find and use all local SSDs, which are expected to be unformatted. Each local NVMe SSD is 375 GB in size. You can attach a maximum of 24 local SSD partitions for 9 TB per instance.
See also
- Official docs for adding local NVMe SSDs.
Create the cluster:

root@rok-tools:~# gcloud alpha container clusters create ${GKE_CLUSTER?} \ > --account ${CLUSTER_ADMIN_ACCOUNT?} \ > --cluster-version ${CLUSTER_VERSION?} \ > --release-channel stable \ > --no-enable-basic-auth \ > --node-pool-name ${NODE_POOL_NAME?} \ > --machine-type ${MACHINE_TYPE?} \ > --image-type UBUNTU_CONTAINERD \ > --disk-type pd-ssd \ > --disk-size 200 \ > --local-ssd-volumes count=${NUM_SSD?},type=nvme,format=block \ > --metadata disable-legacy-endpoints=True \ > --workload-pool=${PROJECT_ID?}.svc.id.goog \ > --scopes gke-default \ > --num-nodes ${NUM_NODES?} \ > --logging=SYSTEM,WORKLOAD \ > --monitoring=SYSTEM \ > --enable-ip-alias \ > --default-max-pods-per-node 110 \ > --no-enable-master-authorized-networks \ > --no-enable-intra-node-visibility \ > --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \ > --max-surge-upgrade 1 \ > --max-unavailable-upgrade 0 \ > --enable-autoupgrade \ > --enable-autorepair \ > --enable-shielded-nodes
Troubleshooting
The command fails with ‘Insufficient regional quota to satisfy request: resource “SSD_TOTAL_GB”’

Ensure that your region has enough quotas for local SSD. To inspect the usage / limits run:

root@rok-tools:~# gcloud compute regions describe ${REGION?} --format json | \ > jq -r '.quotas[] | select(.metric=="SSD_TOTAL_GB") | "\(.usage)/\(.limit)"'

Either delete some resources or choose a different region/zone.
The command fails with ‘Master version is unsupported’
If the above command fails with an error message similar to the following:

ERROR: (gcloud.alpha.container.clusters.create) ResponseError: code=400, message=Master version "1.23.11-gke.300" is unsupported.

it means that the Kubernetes version you have specified is not supported in your selected zone.

To proceed, do the following:

Check the Kubernetes versions that are available in your selected zone:

root@rok-tools:~# gcloud container get-server-config \ > --flatten="channels" \ > --filter="channels.channel=STABLE" \ > --format="yaml(channels.channel,channels.validVersions)" Fetching server config for us-east1-b --- channels: channel: STABLE validVersions: - 1.23.12-gke.2300 - 1.22.14-gke.4300

Select one of the available Kubernetes versions you found in the previous step:

root@rok-tools:~# export CLUSTER_VERSION=<CLUSTER_VERSION>

Replace <CLUSTER_VERSION> with your selected Kubernetes version. For example:

Kubernetes 1.23

Kubernetes 1.22

Kubernetes 1.21

root@rok-tools:~# export CLUSTER_VERSION=1.23.12-gke.2300

root@rok-tools:~# export CLUSTER_VERSION=1.22.14-gke.4300

root@rok-tools:~# export CLUSTER_VERSION=1.21.15-gke.6300

Go back to step 7 and create the cluster.
Note

This will create a zonal cluster with 3 nodes in the cluster’s primary zone. It will use the default network and subnet in the zone.

Verify ¶

Ensure that the GKE cluster exists and its status is RUNNING:

root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(status)" RUNNING

Troubleshooting

The status is RECONCILING

If the status of the GKE cluster is RECONCILING, it means that some work is actively being done on the cluster.

One possibility is that there is an auto-upgrade in progress. Check for running control plane and node upgrade operations:

root@rok-tools:~# gcloud container operations list \ > --filter="TYPE:(UPGRADE_MASTER OR UPGRADE_NODES) AND \ > TARGET:(${GKE_CLUSTER?} OR \ > ${NODE_POOL_NAME?}) AND STATUS:RUNNING"

You can also check for other running operations:

root@rok-tools:~# gcloud container operations list \ > --filter="STATUS:RUNNING"

In any case, wait until the running operations complete and re-run this verification step.
Ensure that the GKE cluster is enrolled in the STABLE release channel:

root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(releaseChannel.channel)" STABLE
Obtain the Kubernetes version of the control plane:

root@rok-tools:~# VERSION=$(gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(currentMasterVersion)")
Ensure that the control plane runs the desired Kubernetes minor version:

root@rok-tools:~# [[ ${VERSION%*.*.*?} == ${CLUSTER_VERSION%*.*.*?} ]] \ > && echo OK || echo FAIL OK
Get the list of the node pools:

root@rok-tools:~# gcloud container node-pools list --cluster=${GKE_CLUSTER?} NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-workers n1-standard-8 200 1.23.11-gke.300
Ensure the default node pool exists and its status is RUNNING:

root@rok-tools:~# gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} \ > --format="value(status)" RUNNING
Obtain the Kubernetes version of your default node pool:

root@rok-tools:~# VERSION=$(gcloud container clusters describe ${GKE_CLUSTER?} \ > --format=json \ > | jq -r ".nodePools[] | select(.name == \"${NODE_POOL_NAME?}\") | .version")
Ensure that the default node pool runs the desired Kubernetes version:

root@rok-tools:~# [[ ${VERSION?} == ${CLUSTER_VERSION?} ]] \ > && echo OK || echo FAIL OK
Verify that all instances of your node pool have the necessary storage attached:
1. Find the instance group that corresponds to the default-workers node pool:
  
  root@rok-tools:~# export INSTANCE_GROUP=$(gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} \ > --format="value(instanceGroupUrls)")
2. Find the template of the instance group:
  
  root@rok-tools:~# export TEMPLATE=$(gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(instanceTemplate)")
3. Inspect the template and ensure that kube-env metadata key has the expected NODE_LOCAL_SSDS_EXT:
  
  root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.metadata.items[] | select(.key == "kube-env") | .value' | \ > grep NODE_LOCAL_SSDS NODE_LOCAL_SSDS_EXT: 3,nvme,block
4. Inspect the template and ensure that it has NVMe local SSDs attached. The command below will list all disks of type SCRATCH and show their interface. It should be NVME:
  
  root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.disks[] | select(.type == "SCRATCH") | .index, .deviceName, .interface' | paste - - - 1 local-ssd-0 NVME 2 local-ssd-1 NVME 3 local-ssd-2 NVME
5. Ensure that all instances inside the instance group run with the desired template:
  
  root@rok-tools:~# gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(status.versionTarget.isReached)" True

Summary ¶

You have successfully created your GKE cluster.

What’s Next ¶

The next step is to restrict auto-upgrades for your GKE cluster.

Restrict Auto-Upgrades

Create GKE Cluster¶

What You’ll Need¶

Procedure¶

Verify¶

Summary¶

What’s Next¶