Upgrade Rok

This guide will walk you through upgrading Rok.

We assume that you are already running a 1.4 Rok cluster on Kubernetes and that you also have access to the 1.5.3 kustomization tree you are upgrading to. Since a Rok cluster on Kubernetes consists of multiple components, you will upgrade each one of them separately.

During the upgrade, Rok Operator will remove all members from the cluster and add a dedicated one to perform the upgrade. It will scale the cluster down to zero, and a Kubernetes Job will run to upgrade the cluster config on etcd and run any needed migrations. Finally, it will scale the cluster back to its initial size.

What You’ll Need

Procedure

Note

To increase observability and gain insight into the status of the cluster upgrade, run the following commands in a separate window:

  • Get the live cluster status:

    root@rok-tools:~# watch kubectl get rokcluster -n rok
  • Get the live cluster events:

    root@rok-tools:~# watch 'kubectl describe rokcluster -n rok rok | tail -n 20'
  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Set the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.

    Restore the required context from previous sections.

    root@rok-tools:~/ops/deployments# source deploy/env.cloudidentity
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
  3. Upgrade Rok Disk Manager:

    1. Apply the latest Rok Disk Manager manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-disk-manager/overlays/deploy
    2. Ensure Rok Disk Manager has become ready. Verify field READY is 1/1 and field STATUS is Running for all Pods:

      root@rok-tools:~# watch kubectl get pods -n rok-system -l name=rok-disk-manager Every 2.0s: kubectl get pods -n rok-system -l name=rok-disk-manager rok-tools: Thu Nov 25 09:36:49 2021 NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m
  4. Upgrade Rok kmod:

    1. Apply the latest Rok kmod manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-kmod/overlays/deploy
    2. Ensure Rok kmod has become ready. Verify field READY is 1/1 and field STATUS is Running for all Pods:

      root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-kmod Every 2.0s: kubectl get pods -n rok-system -l app=rok-kmod rok-tools: Thu Nov 25 09:39:58 2021 NAME READY STATUS RESTARTS AGE rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m

      Troubleshooting

      The STATUS field of some or all Rok kmod Pods is CrashLoopBackOff or ERROR

      1. Inspect the logs of the Pods in question:

        root@rok-tools:~/ops/deployments# kubectl logs -n rok-system <POD_NAME>

        Replace <POD_NAME> with the name of the failing Pod.

      2. If you see the following error in the logs:

        modprobe: FATAL: Module dm_era is in use.

        it means that Rok kmod failed to install the new version of the kernel module, because the old version is in use by some device.

        This is expected. Please go on with this guide and then follow the Upgrade Kernel Modules guide to finish upgrading the kernel modules.

  5. Upgrade Rok Operator:

    1. Apply the latest Rok Operator manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-operator/overlays/deploy

      Note

      The above command also updates the RokCluster CRD.

    2. Ensure Rok Operator has become ready. Verify field READY is 1/1 and field STATUS is Running:

      root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-operator Every 2.0s: kubectl get pods -n rok-system -l app=rok-operator rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-operator-0 1/1 Running 0 1m
  6. Upgrade the Rok cluster:

    1. Apply the latest Rok cluster manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-cluster/overlays/deploy
    2. Ensure Rok cluster has been upgraded:

      1. Check the status of the cluster upgrade Job:

        root@rok-tools:~/ops/deployments# kubectl get job -n ${ROK_CLUSTER_NAMESPACE?} rok-upgrade-release-1.5-l0-release-1.5.3 NAME COMPLETIONS DURATION AGE rok-upgrade-release-1.5-l0-release-1.5 1/1 45s 3m
      2. Ensure that Rok is up and running after the upgrade Job finishes. Verify field HEALTH is OK and field PHASE is Running:

        root@rok-tools:~/ops/deployments# kubectl get rokcluster -n ${ROK_CLUSTER_NAMESPACE?} rok NAME VERSION HEALTH TOTAL MEMBERS READY MEMBERS PHASE AGE rok release-1.5-l0-release-1.5 OK 2 2 Running 1m
  7. Upgrade Rok etcd:

    1. Apply the latest Rok etcd manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/etcd/overlays/deploy \ > --force --force-kinds StatefulSet

      Note

      You need to re-create the StatefulSet because Rok 1.5 changes the port names of the container, which are immutable fields. The underlying PVC will not be deleted.

    2. Ensure that Rok etcd has become ready. Verify field READY is 1/1 and field STATUS is Running:

      root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 1/1 Running 0 1m
  8. Upgrade Rok Monitoring stack:

    1. Apply the latest Rok Monitoring manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/monitoring/overlays/deploy \ > --force --force-kinds Deployment DaemonSet RoleBinding

      Note

      You need to re-create resources because Rok 1.5 renames Kube State Metrics cluster-scoped RBAC resources, and the refs to them are immutable fields.

    2. Remove a stale RBAC resource that is left behind by the previous version of Rok:

      root@rok-tools:~/ops/deployments# kubectl delete role -n monitoring kube-state-metrics \ > --ignore-not-found role.rbac.authorization.k8s.io "kube-state-metrics" deleted
  9. Upgrade the rest of the Rok installation components by applying the latest Rok manifests:

    root@rok-tools:~/ops/deployments# rok-deploy --apply install/rok

Verify

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Set the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.

    Restore the required context from previous sections.

    root@rok-tools:~/ops/deployments# source deploy/env.cloudidentity
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
  3. Ensure all pods in the rok-system namespace are up-and-running. Verify field READY is 1/1 and field STATUS is Running for all Pods:

    root@rok-tools:~/ops/deployments# kubectl get pods -n rok-system NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m rok-operator-0 1/1 Running 0 1m
  4. Ensure all pods in the Rok namespace are up-and-running. Verify field READY is n/n and field STATUS is Running for all Pods:

    root@rok-tools:~/ops/deployments# kubectl get pods -n ${ROK_CLUSTER_NAMESPACE?} NAME READY STATUS RESTARTS AGE rok-csi-controller-0 4/4 Running 0 1m rok-csi-guard-ip-172-31-34-181.eu-central-1.compute.interntthrs 1/1 Running 0 1m rok-csi-guard-ip-172-31-47-250.eu-central-1.compute.internnsgb5 1/1 Running 0 1m rok-csi-node-27422 2/2 Running 0 1m rok-csi-node-qs7pm 2/2 Running 0 1m rok-etcd-0 1/1 Running 0 1m rok-p7kqh 1/1 Running 0 1m rok-postgresql-0 1/1 Running 0 1m rok-redis-0 2/2 Running 0 1m rok-vd5lp 1/1 Running 0 1m
  5. Ensure that Dex is up-and-running. Verify that field READY is 1/1:

    root@rok-tools:~/ops/deployments# kubectl get deploy -n auth NAME READY UP-TO-DATE AVAILABLE AGE dex 1/1 1 1 1m
  6. Ensure that AuthService is up-and-running. Verify that field READY is 1/1:

    root@rok-tools:~/ops/deployments# kubectl get sts -n istio-system authservice NAME READY AVAILABLE AGE authservice 1/1 1 1m
  7. Ensure that Reception is up-and-running. Verify that field READY is 1/1:

    root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow kubeflow-reception NAME READY UP-TO-DATE AVAILABLE AGE kubeflow-reception 1/1 1 1 1m
  8. Ensure that the Profiles Controller is up-and-running. Verify that field READY is 1/1:

    root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow profiles-deployment NAME READY UP-TO-DATE AVAILABLE AGE profiles-deployment 1/1 1 1 1m
  9. Verify that the cert-manager pods are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1 for all pods:

    root@rok-tools:~/ops/deployments# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-bl9rs 1/1 Running 0 1m cert-manager-cainjector-5b9cd446fd-n5jpd 1/1 Running 0 1m cert-manager-webhook-64d967c45-cdfwh 1/1 Running 0 1m
  10. Verify that the Rok Monitoring Stack is up and running:

    root@rok-tools:~/ops/deployments# kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE grafana-6d7d7b78f7-6flm7 2/2 Running 0 1m kube-state-metrics-765c7c7f95-chkzn 4/4 Running 0 1m node-exporter-zng26 2/2 Running 0 1m prometheus-k8s-0 2/2 Running 1 1m prometheus-operator-5f75d76f9f-fmpp5 3/3 Running 0 1m

Summary

You have successfully upgraded Rok.

What’s Next

The next step is to upgrade the kernel modules that Rok uses.