Upgrade Rok Registry etcd

EKF 2.0 uses etcd 3.5 for Rok Registry. This guide will walk you through upgrading your Rok Registry etcd cluster from 3.3 (that previous EKF versions were using) to 3.5, by upgrading first to 3.4.

Note

Rok Registry 2.0 uses etcd v3.5.4, but etcd does not support upgrading directly from v3.3.27 to v3.5.4. This is why you need to first upgrade Rok etcd to v3.4.20.

What You’ll Need

Check Your Environment

  1. Ensure that the Rok Registry etcd cluster is healthy. Verify that the HEALTH field is equal to true:

    root@rok-tools:~# kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > env ETCDCTL_API=3 etcdctl endpoint health -w table +----------------+--------+-----------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+-----------+-------+ | 127.0.0.1:2379 | true | 1.95858ms | | +----------------+--------+-----------+-------+
  2. Ensure that the Rok Registry etcd cluster is running an older version. Verify that the output of the following command is 3.3.0:

    root@rok-tools:~# kubectl exec -ti -n rok-registry svc/rok-registry -c rok-registry -- \ > curl rok-registry-etcd.rok-registry.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.3.0
  3. Ensure that the size of the v2 data set of your etcd cluster does not exceed 50MB.

    1. Create a snapshot of the database:

      root@rok-tools:~# kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > env ETCDCTL_API=2 etcdctl backup \ > --data-dir /var/etcd/data \ > --wal-dir /var/etcd/data/member/wal \ > --backup-dir backup \ > --backup-wal-dir backup-wal
    2. Inspect the size of the snapshot:

      root@rok-tools:~# kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > du -sh backup 5M backup

      Important

      If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Upgrading clusters with v2 data set larger than 100MB may take even more time. In such a case, please contact the Arrikto Tech Team before upgrading.

    3. Delete the backup:

      root@rok-tools:~# kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > rm -rfv backup backup-wal

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Retrieve the ID of the existing etcd cluster member:

    root@rok-tools:~/ops/deployments# export ID=$(kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > env ETCDCTL_API=3 etcdctl member list \ > | grep rok-registry-etcd-0 \ > | cut -f1 -d,)
  3. Set the peer URL for the etcd member:

    root@rok-tools:~/ops/deployments# export \ > PEER_URL=http://rok-registry-etcd-0.rok-registry-etcd-cluster.rok-registry:2380

    Note

    EKF 2.0 renames the headless service for internal communication between its members. Therefore, you need to update the configuration of the existing member to reflect this change.

  4. Update the endpoint of the existing etcd member:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > env ETCDCTL_API=3 etcdctl member update ${ID?} --peer-urls ${PEER_URL?} Member 69b60ba94f21e626 updated in cluster 844c2991de84c0b
  5. Use a custom image for etcd 3.4.

    1. Specify the image for etcd 3.4:

      root@rok-tools:~/ops/deployments# export IMAGE=gcr.io/arrikto/etcd:v3.4.20-bullseye-20220912
    2. Update the deploy overlay to use it:

      root@rok-tools:~/ops/deployments# pushd rok/rok-external-services/etcd/overlays/registry/deploy \ > && kustomize edit set image gcr.io/arrikto/etcd=${IMAGE?} \ > && popd

    Note

    Rok Registry 2.0 uses etcd v3.5.4. Since etcd does not support upgrading directly from v3.3.27 to v3.5.4, you need to first upgrade Rok Registry etcd to v3.4.20. This is a temporary change that will be reverted in a later step.

  6. Delete the old headless service:

    root@rok-tools:~/ops/deployments# kubectl delete service -n rok-registry rok-registry-etcd service "rok-registry-etcd" deleted

    Note

    This is needed so that the new non-headleass service gets an IP allocated.

  7. Apply the latest Rok Registry etcd manifests:

    root@rok-tools:~/ops/deployments# rok-deploy \ > --apply rok/rok-external-services/etcd/overlays/registry/deploy \ > --force --force-kinds StatefulSet

    Note

    You need to recreate the StatefulSet because Rok Registry 2.0 renames the headless service and container ports, and as such needs to change some immutable fields. The underlying PVC will not be affected.

  8. Ensure that Rok Registry etcd has become ready. Verify that it consists of one member, field READY is equal to 2/2, and field STATUS is equal to Running:

    root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-registry -l app=etcd Every 2.0s: kubectl get pods -n rok-registry -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-registry-etcd-0 2/2 Running 0 1m

    Note

    The Pod will become ready once the migration completes. This may take time and depends on the size of the v2 data set you have in the cluster. For example, for data sets larger than 50MB it may take up to two minutes.

  9. Ensure that the Rok Registry etcd cluster has been successfully upgraded to version 3.4. Verify that the output of the following command is 3.4.0:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok-registry svc/rok-registry -c rok-registry -- \ > curl rok-registry-etcd.rok-registry.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.4.0
  10. Ensure that the Rok Registry etcd cluster is healthy. Verify that the HEALTH field is equal to true:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > env ETCDCTL_API=3 etcdctl endpoint health -w table +----------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+------------+-------+ | 127.0.0.1:2379 | true | 1.982921ms | | +----------------+--------+------------+-------+
  11. Revert previous change for using a custom image for etcd 3.4 so that you use the default image, that is for etcd 3.5:

    rok@rok-tools:~/ops/deployments# git checkout -- \ > rok/rok-external-services/etcd/overlays/registry/deploy/kustomization.yaml
  12. Apply the kustomization:

    root@rok-tools:~/ops/deployments# rok-deploy \ > --apply rok/rok-external-services/etcd/overlays/registry/deploy

Verify

  1. Ensure that Rok Registry etcd has become ready. Verify that field READY is 2/2 and field STATUS is equal to Running:

    root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-registry -l app=etcd Every 2.0s: kubectl get pods -n rok-registry -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-registry-etcd-0 2/2 Running 0 1m
  2. Ensure that the Rok Registry etcd cluster is healthy. Verify that the HEALTH field is equal to true:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok-registry sts/rok-registry-etcd -- \ > etcdctl endpoint health -w table +----------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+------------+-------+ | 127.0.0.1:2379 | true | 1.982921ms | | +----------------+--------+------------+-------+
  3. Ensure that the Rok Registry etcd cluster runs the latest version. Verify that the output of the following command is 3.5.0:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok-registry svc/rok-registry -c rok-registry -- \ > curl rok-registry-etcd.rok-registry.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.5.0

Summary

You have successfully upgraded the Rok Registry etcd cluster.

What’s Next

The next step is to upgrade the NVIDIA device plugin.