Upgrade and Scale Up Rok etcd

EKF 2.0.X uses etcd 3.5 and supports running multiple etcd replicas. This guide will walk you through:

  • upgrading your Rok etcd cluster from 3.3 to 3.5, by upgrading first to 3.4, and
  • scaling up your single member cluster so that it has at least three replicas.

Note

Rok 2.0.X uses etcd v3.5.4, but etcd does not support upgrading directly from v3.3.27 to v3.5.4. This is why you need to first upgrade Rok etcd to v3.4.20.

What You’ll Need

Check Your Environment

  1. Check the Rok etcd version:

    root@rok-tools:~# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.3.0
  2. Ensure that the Rok etcd cluster is healthy. Verify that the HEALTH field is equal to true:

    root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl endpoint health -w table +----------------+--------+-----------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+-----------+-------+ | 127.0.0.1:2379 | true | 1.95858ms | | +----------------+--------+-----------+-------+
  3. Ensure that the Rok etcd cluster has one member. Verify that the output of the following command is 1:

    root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl member list | wc -l 1
  4. Ensure that the size of the v2 data set of your etcd cluster does not exceed 50MB.

    1. Create a snapshot of the database:

      root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=2 etcdctl backup \ > --data-dir /var/etcd/data \ > --wal-dir /var/etcd/data/member/wal \ > --backup-dir backup \ > --backup-wal-dir backup-wal
    2. Inspect the size of the snapshot:

      root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > du -sh backup 5M backup

      Important

      If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Upgrading clusters with v2 data set larger than 100MB may take even more time. In such a case, please contact the Arrikto Tech Team before upgrading.

    3. Delete the backup:

      root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > rm -rfv backup backup-wal

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Retrieve the ID of the existing etcd cluster member:

    root@rok-tools:~/ops/deployments# export ID=$(kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl member list \ > | grep rok-etcd-0 \ > | cut -f1 -d,)
  3. Set the peer URL for the etcd member:

    root@rok-tools:~/ops/deployments# export PEER_URL=http://rok-etcd-0.rok-etcd-cluster.rok:2380

    Note

    EKF 2.0.X renames the headless service for internal communication between its members. Therefore, you need to update the configuration of the existing member to reflect this change.

  4. Update the endpoint of the existing etcd member:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl member update ${ID?} --peer-urls ${PEER_URL?} Member 69b60ba94f21e626 updated in cluster 844c2991de84c0b
  5. Use a custom image for etcd 3.4.

    1. Specify the image for etcd 3.4:

      root@rok-tools:~/ops/deployments# export \ > IMAGE=gcr.io/arrikto/etcd:v3.4.20-bullseye-20220912
    2. Update the deploy overlay to use it:

      root@rok-tools:~/ops/deployments# pushd rok/rok-external-services/etcd/overlays/deploy \ > && kustomize edit set image gcr.io/arrikto/etcd=${IMAGE?} \ > && popd

    Note

    Rok 2.0.X uses etcd v3.5.4. Since etcd does not support upgrading directly from v3.3.27 to v3.5.4, you need to first upgrade Rok etcd to v3.4.20. This is a temporary change that will be reverted in a later step.

  6. Delete the old headless service:

    root@rok-tools:~/ops/deployments# kubectl delete service -n rok rok-etcd service "rok-etcd" deleted

    Note

    This is needed so that the new non-headleass service gets an IP allocated.

  7. Apply the latest Rok etcd manifests:

    root@rok-tools:~/ops/deployments# rok-deploy \ > --apply rok/rok-external-services/etcd/overlays/deploy \ > --force --force-kinds StatefulSet

    Note

    You need to recreate the StatefulSet because Rok 2.0.X renames the headless service and container ports, and as such needs to change some immutable fields. The underlying PVC will not be affected.

  8. Ensure that Rok etcd has become ready. Verify that it consists of one member, field READY is equal to 2/2, and field STATUS is equal to Running:

    root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 2/2 Running 0 1m

    Note

    The Pod will become ready once the migration completes. This may take time and depends on the size of the v2 data set you have in the cluster. For example, for data sets larger than 50MB it may take up to two minutes.

  9. Ensure that the Rok etcd cluster has been successfully upgraded to version 3.4. Verify that the output of the following command is 3.4.0:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.4.0
  10. Ensure that the Rok etcd cluster is healthy. Verify that the HEALTH field is equal to true:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl endpoint health -w table +----------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+------------+-------+ | 127.0.0.1:2379 | true | 1.982921ms | | +----------------+--------+------------+-------+
  11. Revert previous change for using a custom image for etcd 3.4 so that you use the default image, that is for etcd 3.5:

    rok@rok-tools:~/ops/deployments# git checkout -- \ > rok/rok-external-services/etcd/overlays/deploy/kustomization.yaml
  12. Apply the kustomization:

    root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/etcd/overlays/deploy
  13. Ensure that Rok etcd has become ready. Verify that it consists of one member, READY field is equal to 2/2, and STATUS field is equal to Running:

    root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 2/2 Running 0 1m
  14. Ensure that the Rok etcd cluster has been successfully upgraded to version 3.5. Verify that the output of the following command is 3.5.0:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.5.0
  15. Ensure that the Rok etcd cluster is healthy. Verify that the HEALTH field is equal to true:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -- \ > etcdctl endpoint health -w table +----------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+------------+-------+ | 127.0.0.1:2379 | true | 1.982921ms | | +----------------+--------+------------+-------+
  16. Follow the instructions in Scale Up Rok etcd at least twice to scale up Rok etcd to at least three members.

Verify

  1. Ensure that Rok etcd has become ready. Verify that field READY is 2/2 and field STATUS is Running for all the Pods:

    root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 2/2 Running 0 1m rok-etcd-1 2/2 Running 0 1m rok-etcd-2 2/2 Running 0 1m
  2. Retrieve the endpoints of all etcd cluster members:

    root@rok-tools:~/ops/deployments# export ETCD_ENDPOINTS=$(kubectl \ > exec -ti -n rok sts/rok-etcd -- etcdctl member list -w json \ > | jq -r '.members[].clientURLs[]' | paste -sd, -)
  3. Ensure that the etcd cluster is currently healthy. Inspect the etcd endpoints and verify that the HEALTH field is true for all endpoints:

    root@rok-tools:~/# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl --endpoints ${ETCD_ENDPOINTS?} endpoint health -w table +--------------------------------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +--------------------------------------+--------+------------+-------+ | rok-etcd-0.rok-etcd-cluster.rok:2379 | true | 9.302141ms | | | rok-etcd-1.rok-etcd-cluster.rok:2379 | true | 9.325642ms | | | rok-etcd-2.rok-etcd-cluster.rok:2379 | true | 9.317423ms | | +--------------------------------------+--------+------------+-------+
  4. Ensure that the Rok etcd cluster has at least 3 members. Verify that the output of the following command is for example 3:

    root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > etcdctl member list | wc -l 3

    Note

    We highly recommend you use an odd number of members and no more than seven members.

  5. Ensure that the Rok etcd cluster runs the latest version. Verify that the output of the following command is 3.5.0:

    root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.5.0

Summary

You have successfully upgraded the Rok etcd cluster.

What’s Next

The next step is to upgrade the rest Rok components.