Upgrade Rok

This guide will walk you through upgrading Rok.

We assume that you are already running a Rok cluster on Kubernetes and that you also have access to the 2.0.2 kustomization tree you are upgrading to. Since a Rok cluster on Kubernetes consists of multiple components, you will upgrade each one of them separately.

During the upgrade, Rok Operator will remove all members from the cluster and add a dedicated one to perform the upgrade. It will scale the cluster down to zero, and a Kubernetes Job will run to upgrade the cluster config on etcd and run any needed migrations. Finally, it will scale the cluster back to its initial size.

What You’ll Need

Procedure

Note

To increase observability and gain insight into the status of the cluster upgrade, run the following commands in a separate window:

  • Get the live cluster status:

    root@rok-tools:~# watch kubectl get rokcluster -n rok
  • Get the live cluster events:

    root@rok-tools:~# watch 'kubectl describe rokcluster -n rok rok | tail -n 20'
  1. If you are running on EKS and had previously deployed FluentD to send logs to Amazon CloudWatch, follow the updated Enable Logging guide to migrate from FluentD to Fluent Bit for significant performance and security gains.

  2. If you are running on EKS, install the EBS CSI driver by following the two guides below:

    1. Create IAM Role for EBS CSI Driver
    2. Deploy EBS CSI Driver
  3. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  4. Set the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.

    Restore the required context from previous sections.

    root@rok-tools:~/ops/deployments# source deploy/env.cloudidentity
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
  5. Upgrade Rok namespaces:

    1. Apply the latest Rok namespaces manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply \ > rok/rok-namespaces/overlays/deploy

      Note

      The above command updates namespace labels in order to enable Istio sidecar injection for workloads running in the Rok namespaces.

    2. Ensure that Istio sidecar injection is enabled for the rok and rok-system namespaces:

      root@rok-tools:~/ops/deployments# kubectl get ns -l istio-injection=enabled | grep rok rok Active 2d rok-system Active 2d
  6. Upgrade Rok Disk Manager:

    1. Apply the latest Rok Disk Manager manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-disk-manager/overlays/deploy
    2. Ensure Rok Disk Manager has become ready. Verify that field READY is 1/1 and field STATUS is Running for all Pods:

      root@rok-tools:~# watch kubectl get pods -n rok-system -l name=rok-disk-manager Every 2.0s: kubectl get pods -n rok-system -l name=rok-disk-manager rok-tools: Thu Nov 25 09:36:49 2021 NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m
  7. Upgrade Rok kmod:

    1. Apply the latest Rok kmod manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-kmod/overlays/deploy
    2. Ensure Rok kmod has become ready. Verify that field READY is 1/1 and field STATUS is Running for all Pods:

      root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-kmod Every 2.0s: kubectl get pods -n rok-system -l app=rok-kmod rok-tools: Thu Nov 25 09:39:58 2021 NAME READY STATUS RESTARTS AGE rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m
  8. Upgrade Rok Operator:

    1. Apply the latest Rok Operator manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply \ > rok/rok-operator/overlays/deploy

      Note

      The above command also updates the RokCluster CRD.

    2. Ensure Rok Operator has become ready. Verify field READY is 2/2 and field STATUS is Running:

      root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-operator Every 2.0s: kubectl get pods -n rok-system -l app=rok-operator rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-operator-0 2/2 Running 0 1m
  9. Configure port names of Rok etcd service:

    root@rok-tools:~/ops/deployments# kubectl patch service -n ${ROK_CLUSTER_NAMESPACE?} rok-etcd \ > --type='json' \ > --patch-file=rok/rok-external-services/etcd/overlays/arrikto/patches/service-port-names.yaml service/rok-etcd patched

    Note

    This is required so that Istio enabled Pods can access Rok etcd.

  10. Upgrade the Rok cluster:

    1. Apply the latest Rok cluster manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-cluster/overlays/deploy
    2. Ensure Rok cluster has been upgraded:

      1. Check the status of the cluster upgrade Job. Ensure that field COMPLETIONS is 1/1:

        root@rok-tools:~/ops/deployments# kubectl get job -n ${ROK_CLUSTER_NAMESPACE?} rok-upgrade-release-2.0-l0-release-2.0.2 NAME COMPLETIONS DURATION AGE rok-upgrade-... 1/1 45s 3m
      2. Ensure that Rok is up and running after the upgrade Job finishes. Verify that field Health is OK and field Phase is Running:

        root@rok-tools:~/ops/deployments# kubectl describe rokcluster -n ${ROK_CLUSTER_NAMESPACE?} rok ... Status: Health: OK Phase: Running Ready Members: 2 Total Members: 2 Version: release-2.0-l0-release-2.0.2
  11. Upgrade Rok Redis:

    1. Delete the old headless service:

      root@rok-tools:~/ops/deployments# kubectl delete service -n rok rok-redis service "rok-redis" deleted

      Note

      This is needed so that the new non-headleass service gets an IP allocated.

    2. Apply the latest Rok Redis manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/redis/overlays/deploy \ > --force --force-kinds StatefulSet

      Note

      You need to recreate the StatefulSet because Rok 2.0.X renames the headless service and as such needs to change some immutable fields.

    3. Ensure that Rok Redis has become ready. Verify field READY is 3/3 and field STATUS is Running:

      root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=redis Every 2.0s: kubectl get pods -n rok -l app=redis rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-redis-0 3/3 Running 0 1m
  12. Deploy the Kyverno resources that are needed to upgrade the Profile Controller:

    1. Apply the Kyverno manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/kyverno/overlays/deploy
    2. Verify that the Kyverno Pod is up and running. Check the Pod status and verify field STATUS is Running and field READY is 1/1:

      root@rok-tools:~/ops/deployments# kubectl -n kyverno get pods NAME READY STATUS RESTARTS AGE kyverno-544fc576bb-gbc9l 1/1 Running 0 9m
    3. Prune stale Kyverno resources:

      root@rok-tools:~/ops/deployments# rok-kf-prune --app kyverno
  13. Upgrade the Profile Controller:

    1. Apply the latest Profile Controller manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply \ > kubeflow/manifests/apps/profiles/upstream/overlays/deploy \ > --force --force-kinds Deployment

      Note

      You need to recreate the Deployment because Rok 2.0.X updates the label selectors, which is an immutable field.

    2. Ensure that the Profile Controller is up and running. Verify that field READY is 1/1:

      root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow profiles-deployment NAME READY UP-TO-DATE AVAILABLE AGE profiles-deployment 1/1 1 1 1m
  14. Remove Rok PostgreSQL:

    Note

    Rok 2.0.2 no longer requires a PostgreSQL instance to operate. That is, you can safely remove the existing PostgreSQL instance that you installed in an older EKF version.

    1. Delete the Rok PostgreSQL Kustomize package:

      root@rok-tools:~/ops/deployments# rok-deploy \ > --delete rok/rok-external-services/postgresql/overlays/deploy
    2. Delete the Kubernetes volume that Rok PostgreSQL was using to free up space in your cluster:

      root@rok-tools:~/ops/deployments# kubectl delete pvc -n rok -l app=postgresql \ > --ignore-not-found
    3. Ensure that Rok PostgreSQL resources no longer exist:

      root@rok-tools:~/ops/deployments# kubectl get svc,sts,pod,pvc -n rok -l app=postgresql No resources found in rok namespace.
  15. Upgrade Rok Monitoring stack:

    1. Apply the latest Rok Monitoring manifests:

      root@rok-tools:~/ops/deployments# rok-deploy --apply rok/monitoring/overlays/deploy \ > --force --force-kinds Deployment

      Note

      Rok 2.0.2 ships with an upgraded version of the Prometheus Operator that uses a different label selector to match Pods. Since this is an immutable field, you need to forcefully apply the corresponding Deployment.

    2. Ensure that Prometheus Operator is up and running. Verify that field READY is 1/1:

      root@rok-tools:~/ops/deployments# kubectl get deploy -n monitoring prometheus-operator NAME READY UP-TO-DATE AVAILABLE AGE prometheus-operator 1/1 1 1 1m
    3. Ensure that Prometheus Operator has endpoints. Verify that field ENDPOINTS is not <none>:

      root@rok-tools:~/ops/deployments# kubectl get endpoints -n monitoring prometheus-operator NAME ENDPOINTS AGE prometheus-operator 192.168.131.168:8080 1m

Verify

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Set the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.

    Restore the required context from previous sections.

    root@rok-tools:~/ops/deployments# source deploy/env.cloudidentity
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
    root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
  3. Ensure that all Pods in the rok-system namespace are up and running. Verify that field READY is N/N and field STATUS is Running for all Pods:

    root@rok-tools:~/ops/deployments# kubectl get pods -n rok-system NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m rok-operator-0 2/2 Running 0 1m rok-scheduler-d86974c9b-n2kln 1/1 Running 0 1m rok-scheduler-webhook-7cdb779b8-89g9z 1/1 Running 0 1m
  4. Ensure that all Pods in the Rok namespace are up and running. Verify that field READY is N/N and field STATUS is Running for all Pods:

    root@rok-tools:~/ops/deployments# kubectl get pods -n ${ROK_CLUSTER_NAMESPACE?} NAME READY STATUS RESTARTS AGE rok-csi-controller-0 5/5 Running 0 1m rok-csi-guard-ip-172-31-34-181.eu-central-1.compute.interntthrs 1/1 Running 0 1m rok-csi-guard-ip-172-31-47-250.eu-central-1.compute.internnsgb5 1/1 Running 0 1m rok-csi-node-27422 3/3 Running 0 1m rok-csi-node-qs7pm 3/3 Running 0 1m rok-etcd-0 2/2 Running 0 1m rok-etcd-1 2/2 Running 0 1m rok-etcd-2 2/2 Running 0 1m rok-p7kqh 2/2 Running 0 1m rok-redis-0 3/3 Running 0 1m rok-vd5lp 2/2 Running 0 1m
  5. Ensure that AuthService is up and running. Verify that field READY is 1/1:

    root@rok-tools:~/ops/deployments# kubectl get sts -n istio-system authservice NAME READY AGE authservice 1/1 1m
  6. Ensure that Reception is up and running. Verify that field READY is 1/1:

    root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow kubeflow-reception NAME READY UP-TO-DATE AVAILABLE AGE kubeflow-reception 1/1 1 1 1m
  7. Ensure that the Profile Controller is up and running. Verify that field READY is 1/1:

    root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow profiles-deployment NAME READY UP-TO-DATE AVAILABLE AGE profiles-deployment 1/1 1 1 1m
  8. Verify that the cert-manager Pods are up and running. Check the Pod status and verify that field STATUS is Running and field READY is 1/1 for all Pods:

    root@rok-tools:~/ops/deployments# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-bl9rs 1/1 Running 0 1m cert-manager-cainjector-5b9cd446fd-n5jpd 1/1 Running 0 1m cert-manager-webhook-64d967c45-cdfwh 1/1 Running 0 1m
  9. Verify that the Rok Monitoring Stack is up and running:

    root@rok-tools:~/ops/deployments# kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE grafana-6d7d7b78f7-6flm7 2/2 Running 0 1m kube-state-metrics-765c7c7f95-chkzn 4/4 Running 0 1m node-exporter-zng26 2/2 Running 0 1m prometheus-k8s-0 3/3 Running 1 1m prometheus-operator-5f75d76f9f-fmpp5 3/3 Running 0 1m

Summary

You have successfully upgraded Rok.

What’s Next

The next step is to upgrade Rok etcd.