Upgrade Rok¶

This guide will walk you through upgrading Rok.

We assume that you are already running a 1.4 Rok cluster on Kubernetes and that you also have access to the 1.5.3 kustomization tree you are upgrading to. Since a Rok cluster on Kubernetes consists of multiple components, you will upgrade each one of them separately.

During the upgrade, Rok Operator will remove all members from the cluster and add a dedicated one to perform the upgrade. It will scale the cluster down to zero, and a Kubernetes Job will run to upgrade the cluster config on etcd and run any needed migrations. Finally, it will scale the cluster back to its initial size.

Overview

What You’ll Need
Procedure
Verify
Summary
What’s Next

What You’ll Need ¶

An upgraded management environment.
An existing Kubernetes cluster.
An existing Rok 1.4 deployment.
Your local clone of the Arrikto GitOps repository.
Arrikto manifests for EKF version 1.5.3.

Procedure ¶

Note

To increase observability and gain insight into the status of the cluster upgrade, run the following commands in a separate window:

Get the live cluster status:

root@rok-tools:~# watch kubectl get rokcluster -n rok
Get the live cluster events:

root@rok-tools:~# watch 'kubectl describe rokcluster -n rok rok | tail -n 20'

Go to your GitOps repository, inside your rok-tools management environment:

root@rok-tools:~# cd ~/ops/deployments
Set the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.

AWS

Azure

Google Cloud

Restore the required context from previous sections.

root@rok-tools:~/ops/deployments# source deploy/env.cloudidentity

root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE

root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok

root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
Upgrade Rok Disk Manager:
1. Apply the latest Rok Disk Manager manifests:
  
  root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-disk-manager/overlays/deploy
2. Ensure Rok Disk Manager has become ready. Verify field READY is 1/1 and field STATUS is Running for all Pods:
  
  root@rok-tools:~# watch kubectl get pods -n rok-system -l name=rok-disk-manager Every 2.0s: kubectl get pods -n rok-system -l name=rok-disk-manager rok-tools: Thu Nov 25 09:36:49 2021 NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m
Upgrade Rok kmod:
1. Apply the latest Rok kmod manifests:
  
  root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-kmod/overlays/deploy
2. Ensure Rok kmod has become ready. Verify field READY is 1/1 and field STATUS is Running for all Pods:
  
  root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-kmod Every 2.0s: kubectl get pods -n rok-system -l app=rok-kmod rok-tools: Thu Nov 25 09:39:58 2021 NAME READY STATUS RESTARTS AGE rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m
  Troubleshooting
  The STATUS field of some or all Rok kmod Pods is CrashLoopBackOff or ERROR
  
  Inspect the logs of the Pods in question:
  
  root@rok-tools:~/ops/deployments# kubectl logs -n rok-system <POD_NAME>
  
  Replace <POD_NAME> with the name of the failing Pod.
  
  If you see the following error in the logs:
  
  modprobe: FATAL: Module dm_era is in use.
  
  it means that Rok kmod failed to install the new version of the kernel module, because the old version is in use by some device.
  
  This is expected. Please go on with this guide and then follow the Upgrade Kernel Modules guide to finish upgrading the kernel modules.
Upgrade Rok Operator:
1. Apply the latest Rok Operator manifests:
  
  root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-operator/overlays/deploy
  
  Note
  
  The above command also updates the RokCluster CRD.
2. Ensure Rok Operator has become ready. Verify field READY is 1/1 and field STATUS is Running:
  
  root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-operator Every 2.0s: kubectl get pods -n rok-system -l app=rok-operator rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-operator-0 1/1 Running 0 1m
Upgrade the Rok cluster:
1. Apply the latest Rok cluster manifests:
  
  root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-cluster/overlays/deploy
2. Ensure Rok cluster has been upgraded:
  1. Check the status of the cluster upgrade Job:
    
    root@rok-tools:~/ops/deployments# kubectl get job -n ${ROK_CLUSTER_NAMESPACE?} rok-upgrade-release-1.5-l0-release-1.5.3 NAME COMPLETIONS DURATION AGE rok-upgrade-release-1.5-l0-release-1.5 1/1 45s 3m
  2. Ensure that Rok is up and running after the upgrade Job finishes. Verify field HEALTH is OK and field PHASE is Running:
    
    root@rok-tools:~/ops/deployments# kubectl get rokcluster -n ${ROK_CLUSTER_NAMESPACE?} rok NAME VERSION HEALTH TOTAL MEMBERS READY MEMBERS PHASE AGE rok release-1.5-l0-release-1.5 OK 2 2 Running 1m
Upgrade Rok etcd:
1. Apply the latest Rok etcd manifests:
  
  root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/etcd/overlays/deploy \ > --force --force-kinds StatefulSet
  
  Note
  
  You need to re-create the StatefulSet because Rok 1.5 changes the port names of the container, which are immutable fields. The underlying PVC will not be deleted.
2. Ensure that Rok etcd has become ready. Verify field READY is 1/1 and field STATUS is Running:
  
  root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 1/1 Running 0 1m
Upgrade Rok Monitoring stack:
1. Apply the latest Rok Monitoring manifests:
  
  root@rok-tools:~/ops/deployments# rok-deploy --apply rok/monitoring/overlays/deploy \ > --force --force-kinds Deployment DaemonSet RoleBinding
  
  Note
  
  You need to re-create resources because Rok 1.5 renames Kube State Metrics cluster-scoped RBAC resources, and the refs to them are immutable fields.
2. Remove a stale RBAC resource that is left behind by the previous version of Rok:
  
  root@rok-tools:~/ops/deployments# kubectl delete role -n monitoring kube-state-metrics \ > --ignore-not-found role.rbac.authorization.k8s.io "kube-state-metrics" deleted
Upgrade the rest of the Rok installation components by applying the latest Rok manifests:

root@rok-tools:~/ops/deployments# rok-deploy --apply install/rok

Verify ¶

Go to your GitOps repository, inside your rok-tools management environment:

root@rok-tools:~# cd ~/ops/deployments
Set the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.

AWS

Azure

Google Cloud

Restore the required context from previous sections.

root@rok-tools:~/ops/deployments# source deploy/env.cloudidentity

root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE

root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok

root@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rok
Ensure all pods in the rok-system namespace are up-and-running. Verify field READY is 1/1 and field STATUS is Running for all Pods:

root@rok-tools:~/ops/deployments# kubectl get pods -n rok-system NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m rok-operator-0 1/1 Running 0 1m
Ensure all pods in the Rok namespace are up-and-running. Verify field READY is n/n and field STATUS is Running for all Pods:

root@rok-tools:~/ops/deployments# kubectl get pods -n ${ROK_CLUSTER_NAMESPACE?} NAME READY STATUS RESTARTS AGE rok-csi-controller-0 4/4 Running 0 1m rok-csi-guard-ip-172-31-34-181.eu-central-1.compute.interntthrs 1/1 Running 0 1m rok-csi-guard-ip-172-31-47-250.eu-central-1.compute.internnsgb5 1/1 Running 0 1m rok-csi-node-27422 2/2 Running 0 1m rok-csi-node-qs7pm 2/2 Running 0 1m rok-etcd-0 1/1 Running 0 1m rok-p7kqh 1/1 Running 0 1m rok-postgresql-0 1/1 Running 0 1m rok-redis-0 2/2 Running 0 1m rok-vd5lp 1/1 Running 0 1m
Ensure that Dex is up-and-running. Verify that field READY is 1/1:

root@rok-tools:~/ops/deployments# kubectl get deploy -n auth NAME READY UP-TO-DATE AVAILABLE AGE dex 1/1 1 1 1m
Ensure that AuthService is up-and-running. Verify that field READY is 1/1:

root@rok-tools:~/ops/deployments# kubectl get sts -n istio-system authservice NAME READY AVAILABLE AGE authservice 1/1 1 1m
Ensure that Reception is up-and-running. Verify that field READY is 1/1:

root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow kubeflow-reception NAME READY UP-TO-DATE AVAILABLE AGE kubeflow-reception 1/1 1 1 1m
Ensure that the Profiles Controller is up-and-running. Verify that field READY is 1/1:

root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow profiles-deployment NAME READY UP-TO-DATE AVAILABLE AGE profiles-deployment 1/1 1 1 1m
Verify that the cert-manager pods are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1 for all pods:

root@rok-tools:~/ops/deployments# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-bl9rs 1/1 Running 0 1m cert-manager-cainjector-5b9cd446fd-n5jpd 1/1 Running 0 1m cert-manager-webhook-64d967c45-cdfwh 1/1 Running 0 1m
Verify that the Rok Monitoring Stack is up and running:

root@rok-tools:~/ops/deployments# kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE grafana-6d7d7b78f7-6flm7 2/2 Running 0 1m kube-state-metrics-765c7c7f95-chkzn 4/4 Running 0 1m node-exporter-zng26 2/2 Running 0 1m prometheus-k8s-0 2/2 Running 1 1m prometheus-operator-5f75d76f9f-fmpp5 3/3 Running 0 1m

Summary ¶

You have successfully upgraded Rok.

What’s Next ¶

The next step is to upgrade the kernel modules that Rok uses.

Upgrade Kernel Modules

Upgrade Rok¶

What You’ll Need¶

Procedure¶

Verify¶

Summary¶

What’s Next¶

What You’ll Need ¶

Procedure ¶

Verify ¶

Summary ¶

What’s Next ¶