Upgrade Rok¶
This guide will walk you through upgrading Rok.
We assume that you are already running a Rok cluster on Kubernetes and that you also have access to the 2.0.1 kustomization tree you are upgrading to. Since a Rok cluster on Kubernetes consists of multiple components, you will upgrade each one of them separately.
During the upgrade, Rok Operator will remove all members from the cluster and add a dedicated one to perform the upgrade. It will scale the cluster down to zero, and a Kubernetes Job will run to upgrade the cluster config on etcd and run any needed migrations. Finally, it will scale the cluster back to its initial size.
Overview
What You’ll Need¶
- An upgraded management environment.
- An existing Kubernetes cluster.
- An existing Rok 1.5.3 or later deployment.
- Your local clone of the Arrikto GitOps repository.
- Arrikto manifests for EKF version 2.0.1.
Procedure¶
Note
To increase observability and gain insight into the status of the cluster upgrade, run the following commands in a separate window:
Get the live cluster status:
root@rok-tools:~# watch kubectl get rokcluster -n rokGet the live cluster events:
root@rok-tools:~# watch 'kubectl describe rokcluster -n rok rok | tail -n 20'
If you are running on EKS and had previously deployed FluentD to send logs to Amazon CloudWatch, follow the updated Enable Logging guide to migrate from FluentD to Fluent Bit for significant performance and security gains.
If you are running on EKS, install the EBS CSI driver by following the two guides below:
Go to your GitOps repository, inside your
rok-tools
management environment:root@rok-tools:~# cd ~/ops/deploymentsSet the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.
Restore the required context from previous sections.
root@rok-tools:~/ops/deployments# source deploy/env.cloudidentityroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACEroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokUpgrade Rok namespaces:
Apply the latest Rok namespaces manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply \ > rok/rok-namespaces/overlays/deployNote
The above command updates namespace labels in order to enable Istio sidecar injection for workloads running in the Rok namespaces.
Ensure that Istio sidecar injection is enabled for the
rok
androk-system
namespaces:root@rok-tools:~/ops/deployments# kubectl get ns -l istio-injection=enabled | grep rok rok Active 2d rok-system Active 2d
Upgrade Rok Disk Manager:
Apply the latest Rok Disk Manager manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-disk-manager/overlays/deployEnsure Rok Disk Manager has become ready. Verify that field READY is 1/1 and field STATUS is Running for all Pods:
root@rok-tools:~# watch kubectl get pods -n rok-system -l name=rok-disk-manager Every 2.0s: kubectl get pods -n rok-system -l name=rok-disk-manager rok-tools: Thu Nov 25 09:36:49 2021 NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m
Upgrade Rok kmod:
Apply the latest Rok kmod manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-kmod/overlays/deployEnsure Rok kmod has become ready. Verify that field READY is 1/1 and field STATUS is Running for all Pods:
root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-kmod Every 2.0s: kubectl get pods -n rok-system -l app=rok-kmod rok-tools: Thu Nov 25 09:39:58 2021 NAME READY STATUS RESTARTS AGE rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m
Upgrade Rok Operator:
Apply the latest Rok Operator manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply \ > rok/rok-operator/overlays/deployNote
The above command also updates the
RokCluster
CRD.Ensure Rok Operator has become ready. Verify field READY is 2/2 and field STATUS is Running:
root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-operator Every 2.0s: kubectl get pods -n rok-system -l app=rok-operator rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-operator-0 2/2 Running 0 1m
Configure port names of Rok etcd service:
root@rok-tools:~/ops/deployments# kubectl patch service -n ${ROK_CLUSTER_NAMESPACE?} rok-etcd \ > --type='json' \ > --patch-file=rok/rok-external-services/etcd/overlays/arrikto/patches/service-port-names.yaml service/rok-etcd patchedNote
This is required so that Istio enabled Pods can access Rok etcd.
Upgrade the Rok cluster:
Apply the latest Rok cluster manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-cluster/overlays/deployEnsure Rok cluster has been upgraded:
Check the status of the cluster upgrade Job. Ensure that field COMPLETIONS is 1/1:
root@rok-tools:~/ops/deployments# kubectl get job -n ${ROK_CLUSTER_NAMESPACE?} rok-upgrade-release-2.0-l0-release-2.0.1 NAME COMPLETIONS DURATION AGE rok-upgrade-... 1/1 45s 3mEnsure that Rok is up and running after the upgrade Job finishes. Verify that field Health is OK and field Phase is Running:
root@rok-tools:~/ops/deployments# kubectl describe rokcluster -n ${ROK_CLUSTER_NAMESPACE?} rok ... Status: Health: OK Phase: Running Ready Members: 2 Total Members: 2 Version: release-2.0-l0-release-2.0.1
Upgrade Rok Redis:
Delete the old headless service:
root@rok-tools:~/ops/deployments# kubectl delete service -n rok rok-redis service "rok-redis" deletedNote
This is needed so that the new non-headleass service gets an IP allocated.
Apply the latest Rok Redis manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/redis/overlays/deploy \ > --force --force-kinds StatefulSetNote
You need to recreate the StatefulSet because Rok 2.0.X renames the headless service and as such needs to change some immutable fields.
Ensure that Rok Redis has become ready. Verify field READY is 3/3 and field STATUS is Running:
root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=redis Every 2.0s: kubectl get pods -n rok -l app=redis rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-redis-0 3/3 Running 0 1m
Deploy the Kyverno resources that are needed to upgrade the Profile Controller:
Apply the Kyverno manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/kyverno/overlays/deployVerify that the Kyverno Pod is up and running. Check the Pod status and verify field STATUS is Running and field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl -n kyverno get pods NAME READY STATUS RESTARTS AGE kyverno-544fc576bb-gbc9l 1/1 Running 0 9mPrune stale Kyverno resources:
root@rok-tools:~/ops/deployments# rok-kf-prune --app kyverno
Upgrade the Profile Controller:
Apply the latest Profile Controller manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply \ > kubeflow/manifests/apps/profiles/upstream/overlays/deploy \ > --force --force-kinds DeploymentNote
You need to recreate the Deployment because Rok 2.0.X updates the label selectors, which is an immutable field.
Ensure that the Profile Controller is up and running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow profiles-deployment NAME READY UP-TO-DATE AVAILABLE AGE profiles-deployment 1/1 1 1 1m
Remove Rok PostgreSQL:
Note
Rok 2.0.1 no longer requires a PostgreSQL instance to operate. That is, you can safely remove the existing PostgreSQL instance that you installed in an older EKF version.
Delete the Rok PostgreSQL Kustomize package:
root@rok-tools:~/ops/deployments# rok-deploy \ > --delete rok/rok-external-services/postgresql/overlays/deployDelete the Kubernetes volume that Rok PostgreSQL was using to free up space in your cluster:
root@rok-tools:~/ops/deployments# kubectl delete pvc -n rok -l app=postgresql \ > --ignore-not-foundEnsure that Rok PostgreSQL resources no longer exist:
root@rok-tools:~/ops/deployments# kubectl get svc,sts,pod,pvc -n rok -l app=postgresql No resources found in rok namespace.
Upgrade Rok Monitoring stack:
Apply the latest Rok Monitoring manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/monitoring/overlays/deploy \ > --force --force-kinds DeploymentNote
Rok 2.0.1 ships with an upgraded version of the Prometheus Operator that uses a different label selector to match Pods. Since this is an immutable field, you need to forcefully apply the corresponding Deployment.
Ensure that Prometheus Operator is up and running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get deploy -n monitoring prometheus-operator NAME READY UP-TO-DATE AVAILABLE AGE prometheus-operator 1/1 1 1 1mEnsure that Prometheus Operator has endpoints. Verify that field ENDPOINTS is not <none>:
root@rok-tools:~/ops/deployments# kubectl get endpoints -n monitoring prometheus-operator NAME ENDPOINTS AGE prometheus-operator 192.168.131.168:8080 1m
Verify¶
Go to your GitOps repository, inside your
rok-tools
management environment:root@rok-tools:~# cd ~/ops/deploymentsSet the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.
Restore the required context from previous sections.
root@rok-tools:~/ops/deployments# source deploy/env.cloudidentityroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACEroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokEnsure that all Pods in the
rok-system
namespace are up and running. Verify that field READY is N/N and field STATUS is Running for all Pods:root@rok-tools:~/ops/deployments# kubectl get pods -n rok-system NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m rok-operator-0 2/2 Running 0 1m rok-scheduler-d86974c9b-n2kln 1/1 Running 0 1m rok-scheduler-webhook-7cdb779b8-89g9z 1/1 Running 0 1mEnsure that all Pods in the Rok namespace are up and running. Verify that field READY is N/N and field STATUS is Running for all Pods:
root@rok-tools:~/ops/deployments# kubectl get pods -n ${ROK_CLUSTER_NAMESPACE?} NAME READY STATUS RESTARTS AGE rok-csi-controller-0 5/5 Running 0 1m rok-csi-guard-ip-172-31-34-181.eu-central-1.compute.interntthrs 1/1 Running 0 1m rok-csi-guard-ip-172-31-47-250.eu-central-1.compute.internnsgb5 1/1 Running 0 1m rok-csi-node-27422 3/3 Running 0 1m rok-csi-node-qs7pm 3/3 Running 0 1m rok-etcd-0 2/2 Running 0 1m rok-etcd-1 2/2 Running 0 1m rok-etcd-2 2/2 Running 0 1m rok-p7kqh 2/2 Running 0 1m rok-redis-0 3/3 Running 0 1m rok-vd5lp 2/2 Running 0 1mEnsure that AuthService is up and running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get sts -n istio-system authservice NAME READY AGE authservice 1/1 1mEnsure that Reception is up and running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow kubeflow-reception NAME READY UP-TO-DATE AVAILABLE AGE kubeflow-reception 1/1 1 1 1mEnsure that the Profile Controller is up and running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow profiles-deployment NAME READY UP-TO-DATE AVAILABLE AGE profiles-deployment 1/1 1 1 1mVerify that the cert-manager Pods are up and running. Check the Pod status and verify that field STATUS is Running and field READY is 1/1 for all Pods:
root@rok-tools:~/ops/deployments# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-bl9rs 1/1 Running 0 1m cert-manager-cainjector-5b9cd446fd-n5jpd 1/1 Running 0 1m cert-manager-webhook-64d967c45-cdfwh 1/1 Running 0 1mVerify that the Rok Monitoring Stack is up and running:
root@rok-tools:~/ops/deployments# kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE grafana-6d7d7b78f7-6flm7 2/2 Running 0 1m kube-state-metrics-765c7c7f95-chkzn 4/4 Running 0 1m node-exporter-zng26 2/2 Running 0 1m prometheus-k8s-0 3/3 Running 1 1m prometheus-operator-5f75d76f9f-fmpp5 3/3 Running 0 1m