Upgrade and Scale Up Rok etcd¶

EKF 2.0.X uses etcd 3.5 and supports running multiple etcd replicas. This guide will walk you through:

upgrading your Rok etcd cluster from 3.3 to 3.5, by upgrading first to 3.4, and
scaling up your single member cluster so that it has at least three replicas.

Note

Rok 2.0.X uses etcd v3.5.4, but etcd does not support upgrading directly from v3.3.27 to v3.5.4. This is why you need to first upgrade Rok etcd to v3.4.20.

What You’ll Need¶

An upgraded management environment.
Your clone of the Arrikto GitOps repository.
Arrikto manifests for EKF version 2.0.2.

Check Your Environment¶

Check the Rok etcd version:

root@rok-tools:~# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.3.0
Fast Forward

If the Rok etcd version is 3.5, expand this box to fast-forward.
1. Proceed to the Verify section
Ensure that the Rok etcd cluster is healthy. Verify that the HEALTH field is equal to true:

root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl endpoint health -w table +----------------+--------+-----------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+-----------+-------+ | 127.0.0.1:2379 | true | 1.95858ms | | +----------------+--------+-----------+-------+
Ensure that the Rok etcd cluster has one member. Verify that the output of the following command is 1:

root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl member list | wc -l 1
Ensure that the size of the v2 data set of your etcd cluster does not exceed 50MB.
1. Create a snapshot of the database:
  
  root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=2 etcdctl backup \ > --data-dir /var/etcd/data \ > --wal-dir /var/etcd/data/member/wal \ > --backup-dir backup \ > --backup-wal-dir backup-wal
2. Inspect the size of the snapshot:
  
  root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > du -sh backup 5M backup
  
  Important
  
  If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Upgrading clusters with v2 data set larger than 100MB may take even more time. In such a case, please contact the Arrikto Tech Team before upgrading.
3. Delete the backup:
  
  root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > rm -rfv backup backup-wal
See also
- etcd 3.4 upgrade limitations

Procedure¶

Go to your GitOps repository, inside your rok-tools management environment:

root@rok-tools:~# cd ~/ops/deployments
Retrieve the ID of the existing etcd cluster member:

root@rok-tools:~/ops/deployments# export ID=$(kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl member list \ > | grep rok-etcd-0 \ > | cut -f1 -d,)
Set the peer URL for the etcd member:

root@rok-tools:~/ops/deployments# export PEER_URL=http://rok-etcd-0.rok-etcd-cluster.rok:2380

Note

EKF 2.0.X renames the headless service for internal communication between its members. Therefore, you need to update the configuration of the existing member to reflect this change.
Update the endpoint of the existing etcd member:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl member update ${ID?} --peer-urls ${PEER_URL?} Member 69b60ba94f21e626 updated in cluster 844c2991de84c0b
Use a custom image for etcd 3.4.
1. Specify the image for etcd 3.4:
  
  root@rok-tools:~/ops/deployments# export \ > IMAGE=gcr.io/arrikto/etcd:v3.4.20-bullseye-20220912
  
  Air Gapped
  
  Use the mirrored etcd image from your internal registry. For example:
  
  root@rok-tools:~/ops/deployments# export \ > IMAGE=${INTERNAL_REGISTRY?}/gcr.io/arrikto/etcd:v3.4.20-bullseye-20220912
2. Update the deploy overlay to use it:
  
  root@rok-tools:~/ops/deployments# pushd rok/rok-external-services/etcd/overlays/deploy \ > && kustomize edit set image gcr.io/arrikto/etcd=${IMAGE?} \ > && popd
Note

Rok 2.0.X uses etcd v3.5.4. Since etcd does not support upgrading directly from v3.3.27 to v3.5.4, you need to first upgrade Rok etcd to v3.4.20. This is a temporary change that will be reverted in a later step.
Delete the old headless service:

root@rok-tools:~/ops/deployments# kubectl delete service -n rok rok-etcd service "rok-etcd" deleted

Note

This is needed so that the new non-headleass service gets an IP allocated.
Apply the latest Rok etcd manifests:

root@rok-tools:~/ops/deployments# rok-deploy \ > --apply rok/rok-external-services/etcd/overlays/deploy \ > --force --force-kinds StatefulSet

Note

You need to recreate the StatefulSet because Rok 2.0.X renames the headless service and container ports, and as such needs to change some immutable fields. The underlying PVC will not be affected.
Ensure that Rok etcd has become ready. Verify that it consists of one member, field READY is equal to 2/2, and field STATUS is equal to Running:

root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 2/2 Running 0 1m

Note

The Pod will become ready once the migration completes. This may take time and depends on the size of the v2 data set you have in the cluster. For example, for data sets larger than 50MB it may take up to two minutes.
Ensure that the Rok etcd cluster has been successfully upgraded to version 3.4. Verify that the output of the following command is 3.4.0:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.4.0
Ensure that the Rok etcd cluster is healthy. Verify that the HEALTH field is equal to true:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -- \ > env ETCDCTL_API=3 etcdctl endpoint health -w table +----------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+------------+-------+ | 127.0.0.1:2379 | true | 1.982921ms | | +----------------+--------+------------+-------+
Revert previous change for using a custom image for etcd 3.4 so that you use the default image, that is for etcd 3.5:

rok@rok-tools:~/ops/deployments# git checkout -- \ > rok/rok-external-services/etcd/overlays/deploy/kustomization.yaml
Apply the kustomization:

root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/etcd/overlays/deploy
Ensure that Rok etcd has become ready. Verify that it consists of one member, READY field is equal to 2/2, and STATUS field is equal to Running:

root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 2/2 Running 0 1m
Ensure that the Rok etcd cluster has been successfully upgraded to version 3.5. Verify that the output of the following command is 3.5.0:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.5.0
Ensure that the Rok etcd cluster is healthy. Verify that the HEALTH field is equal to true:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -- \ > etcdctl endpoint health -w table +----------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------+--------+------------+-------+ | 127.0.0.1:2379 | true | 1.982921ms | | +----------------+--------+------------+-------+
Follow the instructions in Scale Up Rok etcd at least twice to scale up Rok etcd to at least three members.

Verify¶

Ensure that Rok etcd has become ready. Verify that field READY is 2/2 and field STATUS is Running for all the Pods:

root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Jul 15 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 2/2 Running 0 1m rok-etcd-1 2/2 Running 0 1m rok-etcd-2 2/2 Running 0 1m
Retrieve the endpoints of all etcd cluster members:

root@rok-tools:~/ops/deployments# export ETCD_ENDPOINTS=$(kubectl \ > exec -ti -n rok sts/rok-etcd -- etcdctl member list -w json \ > | jq -r '.members[].clientURLs[]' | paste -sd, -)
Ensure that the etcd cluster is currently healthy. Inspect the etcd endpoints and verify that the HEALTH field is true for all endpoints:

root@rok-tools:~/# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl --endpoints ${ETCD_ENDPOINTS?} endpoint health -w table +--------------------------------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +--------------------------------------+--------+------------+-------+ | rok-etcd-0.rok-etcd-cluster.rok:2379 | true | 9.302141ms | | | rok-etcd-1.rok-etcd-cluster.rok:2379 | true | 9.325642ms | | | rok-etcd-2.rok-etcd-cluster.rok:2379 | true | 9.317423ms | | +--------------------------------------+--------+------------+-------+
Ensure that the Rok etcd cluster has at least 3 members. Verify that the output of the following command is for example 3:

root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -- \ > etcdctl member list | wc -l 3

Note

We highly recommend you use an odd number of members and no more than seven members.
Ensure that the Rok etcd cluster runs the latest version. Verify that the output of the following command is 3.5.0:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok svc/rok -- \ > curl rok-etcd.rok.svc.cluster.local:2379/version \ > | jq -r .etcdcluster 3.5.0

Summary¶

You have successfully upgraded the Rok etcd cluster.

What’s Next¶

The next step is to upgrade the rest Rok components.

Upgrade Rest Rok Components