Back Up Rok etcd EBS Volume

Rok uses EBS volumes for persisting various metadata for your Rok volumes and snapshots. While EBS volumes are highly durable, there might be cases where EBS volumes get lost. For example:

  • An admin accidentally deletes an EBS volume.
  • A custodian job deletes detached/unused EBS volumes.
  • An admin formats the contents of an EBS volume.
  • Amazon EBS fails.

This guide will walk you through setting up an AWS lifecycle policy for the Rok etcd EBS volume so that you can go back in time and partially recover your data.

Warning

If the EBS volumes that Rok uses are lost, data on Rok volumes or Rok snapshots will be permanently lost and you will not be able to recover them.

Warning

You should never delete detached EBS volumes that belong to an active EKS cluster.

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Create the default AWSDataLifecycleManagerDefaultRole IAM role which Livecycle service will use:

    root@rok-tools:~/ops/deployments# aws dlm create-default-role

    Note

    If the role already exists, this will be a no-op.

  3. Obtain the ARN of the IAM role:

    root@rok-tools:~/ops/deployment# export DLM_ROLE_ARN=$(aws iam get-role \ > --role-name AWSDataLifecycleManagerDefaultRole \ > --query Role.Arn --output text) && echo ${DLM_ROLE_ARN} arn:aws:iam::123456789012:role/AWSDataLifecycleManagerDefaultRole
  4. Get the PV name of the PVC that Rok etcd is using:

    root@rok-tools:~/ops/deployments# export PV_NAME=$(kubectl get pvc \ > -n rok data-rok-etcd-0 \ > -o jsonpath={.spec.volumeName}) && echo ${PV_NAME} pvc-0916cd35-bbc2-4456-be14-a9ec49b74fbd
  5. Set the number of snapshots that DLM will retain:

    root@rok-tools:~/ops/deployments# export COUNT=30

    Note

    Since this is a daily backup policy, the oldest snapshot retained will be one month old.

  6. Render the policy details to use the underlying PV:

    root@rok-tools:~/ops/deployments# j2 rok/eks/lifecycle-policy-details.json.j2 -o rok/eks/rok-etcd-lifecycle-policy-details.json

    Alternatively, download the lifecycle-policy-details.json.j2 configuration file provided below and use it locally.

    lifecycle-policy-details.json.j2
    1{
    2 "PolicyType": "EBS_SNAPSHOT_MANAGEMENT",
    3 "ResourceTypes": [
    4-25
    4 "VOLUME"
    5 ],
    6 "TargetTags": [
    7 {
    8 "Key": "kubernetes.io/created-for/pv/name",
    9 "Value": "{{PV_NAME}}"
    10 }
    11 ],
    12 "Schedules": [
    13 {
    14 "Name": "daily backup",
    15 "CopyTags": true,
    16 "CreateRule": {
    17 "Interval": 24,
    18 "IntervalUnit": "HOURS",
    19 "Times": [
    20 "09:00"
    21 ]
    22 },
    23 "RetainRule": {
    24 "Count": {{COUNT}}
    25 }
    26 }
    27 ]
    28}
  7. Stage your changes:

    root@rok-tools:~/ops/deployments# git add rok/eks/rok-etcd-lifecycle-policy-details.json
  8. Commit your changes:

    root@rok-tools:~/ops/deployments# git commit -am "Back Up Rok etcd EBS Volume"
  9. Set the name for your lifecycle policy:

    root@rok-tools:~/ops/deployments# export DLM_NAME="rok-${AWS_DEFAULT_REGION?}-${EKS_CLUSTER?}-etcd"
  10. Set the description for your lifecycle policy:

    rok@rok-tools:~/ops/deployments# export DLM_DESCRIPTION="Backup EBS volume of Rok etcd running in EKS cluster ${EKS_CLUSTER?}"
  11. Create the policy:

    rok@rok-tools:~/ops/deployments# aws dlm create-lifecycle-policy \ > --execution-role-arn ${DLM_ROLE_ARN?} \ > --description "${DLM_DESCRIPTION?}" \ > --state ENABLED \ > --policy-details file://rok/eks/rok-etcd-lifecycle-policy-details.json \ > --tags Name=${DLM_NAME?} { "PolicyId": "policy-0db673046cec41a45" }

Important

This policy will create EBS snapshots with the following tags:

  • dlm:managed
  • aws:dlm:lifecycle-policy-id
  • aws:dlm:lifecycle-schedule-name
  • kubernetes.io/cluster/EKS_CLUSTER
  • kubernetes.io/created-for/pv/name
  • kubernetes.io/created-for/pvc/name
  • kubernetes.io/created-for/pvc/namespace

Do not delete these snapshots. DLM will delete the old ones and retain the latest.

Verify

  1. Ensure that there is an enabled snapshot policy that targets the desired PV:

    root@rok-tools:~/ops/deployments# aws dlm get-lifecycle-policies \ > --target-tags kubernetes.io/created-for/pv/name=${PV_NAME?} \ > --query Policies[].[PolicyId,Tags.Name,State] --output text policy-0ef0d7cb035f9b659 rok-us-west-1-arrikto-cluster-etcd ENABLED
  2. Ensure that there is an EBS volume that the policy targets:

    root@rok-tools:~/ops/deployments# aws ec2 describe-volumes \ > --filter Name=tag:kubernetes.io/created-for/pv/name,Values=${PV_NAME?} \ > --query Volumes[].VolumeId --output text vol-0da181700a6c8db5e
  3. Ensure that the EBS volume found above is the one that the Rok etcd uses:

    root@rok-tools:~/ops/deployments# kubectl get pvc \ > -n rok data-rok-etcd-0 -o jsonpath={.spec.volumeName} | \ > xargs -n1 kubectl get pv -o jsonpath={.spec.awsElasticBlockStore.volumeID} | \ > cut -d/ -f4 vol-0da181700a6c8db5e

Summary

You have successfully set up a lifecycle policy for the EBS volume that Rok etcd uses.

What’s Next

Check out the rest of the maintenance operations that you can perform on your cluster.