Back Up EKF cluster

In this guide you will use our rok-backup tool to snapshot all the EKF resources of an Arrikto EKF cluster, add them into Rok buckets, and publish the buckets to a Rok Registry. This way you will back up your EKF cluster so you can later restore and migrate it to a new destination cluster.

What You’ll Need

Procedure

  1. Connect to the privileged notebook server and open a new terminal.

  2. Set the Rok Registry token:

    1. Read a line from the standard input:

      jovyan@mynotebook-0:~$ read -s ROK_REGISTRY_TOKEN
    2. Paste the Rok Registry token you issued by following the relevant guide.

    3. Export the Rok Registry token:

      jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN

    Note

    You can also provide the Rok Registry token in a file:

    jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN="file:<PATH_TO_FILE>"

    Replace <PATH_TO_FILE> with the path of your Rok Registry token, for example:

    jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN="file:/home/jovyan/registry.token"
  3. Set the Rok Registry URL:

    jovyan@mynotebook-0:~$ export ROK_REGISTRY_URL=<URL>

    Replace <URL> with the base URL of your Rok Registry installation. For example:

    jovyan@mynotebook-0:~$ export ROK_REGISTRY_URL=https://arr-cluster.example.com/registry
  4. Choose whether to start notebooks, depending on your EKF version of your source cluster:

    If your Rok cluster version is older than 1.4, start the notebooks before snapshotting them:

    jovyan@mynotebook-0:~$ export START_NOTEBOOKS=true jovyan@mynotebook-0:~$ export STOP_NOTEBOOKS=true

    If your Rok cluster version is 1.4 or greater, do not start the notebooks before snapshotting them:

    jovyan@mynotebook-0:~$ export START_NOTEBOOKS=false jovyan@mynotebook-0:~$ export STOP_NOTEBOOKS=false
  5. Set an identifier for the bucket prefix that the rok-backup will use when creating the Rok and Rok Registry buckets:

    jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX=<MIGRATION_ID> jovyan@mynotebook-0:~$ export ROK_REGISTRY_BUCKET_PREFIX=${ROK_BUCKET_PREFIX?}

    Replace <MIGRATION_ID> with a custom unique name for the backup. For example, to include the date and a UID, run:

    jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX=$(python3 -c \ > 'import uuid, datetime; \ > print("cluster-migration-%s-%s" \ > % (datetime.date.today(), uuid.uuid4().hex[:5]))') jovyan@mynotebook-0:~$ export ROK_REGISTRY_BUCKET_PREFIX=${ROK_BUCKET_PREFIX?}

    Important

    Use a unique name for the ROK_BUCKET_PREFIX and the ROK_REGISTRY_BUCKET_PREFIX. This prefix distinguishes this backup run from others. If you have already used the same identifier for ROK_BUCKET_PREFIX or ROK_REGISTRY_BUCKET_PREFIX in a previous backup run, you will override the previous backup.

  6. Run the backup script to snapshot the EKF resources and publish them to the Rok Registry. Choose one of the following options, depending on whether you want the script to get its configuration options through environment variables or through a preseed file.

    Choose one of the following options depending on whether you want to run the script interactively or non-interactively.

    Note

    In a non-interactive run you will not be prompted for input, while in an interactive run you will. If you have not explicitly specified an answer in the case of a non-interactive run, rok-backup will assume the default answer. The log output is redirected to stdout.

    jovyan@mynotebook-0:~$ rok-backup

    Troubleshooting

    dialog.ExecutableNotFound

    If the above command fails with an error message similar to the following:

    dialog.ExecutableNotFound: Executable not found: can't find the executable for the dialog-like program

    it means your notebook does not have the dialog package installed. You can install it with:

    jovyan@mynotebook-0:~$ sudo apt install dialog

    and retry the command.

    jovyan@mynotebook-0:~$ rok-backup \ > --frontend non-interactive
    1. Copy the backup-preseed.py.j2 Jinja2 template inside your privileged notebook:

      backup-preseed.py.j2
      1# Copyright © 2022 Arrikto Inc. All Rights Reserved.
      2
      3"""EKF Migration Backup Preseed File."""
      4-42
      4
      5SEEDS = {
      6 # Resources to back up
      7 'question/resources': ['bucket',
      8 'katib',
      9 'mlmd',
      10 'model',
      11 'notebook',
      12 'pipeline',
      13 'profile',
      14 'pvc'],
      15 # The token to connect to Rok
      16 # 'question/rok_token': <protected>,
      17 # The URL of the Rok cluster
      18 'question/rok_url': 'http://rok.rok.svc.cluster.local',
      19 # The token to connect to Rok Registry
      20 'question/rok_registry_token': '{{ROK_REGISTRY_TOKEN}}',
      21 # The URL of the Rok Registry cluster
      22 'question/rok_registry_url': '{{ROK_REGISTRY_URL}}',
      23 # The prefix for the local Rok buckets
      24 'question/rok_bucket_prefix': 'cluster-migration',
      25 # The prefix for the Registry buckets
      26 'question/rok_registry_bucket_prefix': '{{ROK_REGISTRY_BUCKET_PREFIX}}',
      27 # Namespaces to back up / exclude per resource
      28 'question/buckets/exclude_namespaces': [],
      29 'question/buckets/namespaces': ['ALL'],
      30 'question/katib/exclude_namespaces': [],
      31 'question/katib/namespaces': ['ALL'],
      32 'question/models/exclude_namespaces': [],
      33 'question/models/namespaces': ['ALL'],
      34 'question/notebooks/exclude_namespaces': [],
      35 'question/notebooks/namespaces': ['ALL'],
      36 'question/pvcs/exclude_namespaces': [],
      37 'question/pvcs/namespaces': ['ALL'],
      38 # Skip notebooks for which a snapshot exists
      39 'question/skip_existing_notebooks': False,
      40 'question/skip_existing_profiles': False,
      41 # Start stoppped notebooks so that a snapshot can be taken
      42 'question/start_notebooks': '{{START_NOTEBOOKS}}',
      43 # Stop notebooks started by the script
      44 'question/stop_notebooks': '{{STOP_NOTEBOOKS}}'
      45}
    2. Render the preseed file:

      jovyan@mynotebook-0:~$ j2 backup-preseed.py.j2 \ > -o backup-preseed.py

      Troubleshooting

      bash: j2: command not found

      If the above command fails with an error message similar to the following:

      bash: j2: command not found

      it means your notebook does not have the j2 Python package installed. You can install it with:

      jovyan@mynotebook-0:~$ pip3 install j2

      and retry the command.

      Note

      After rendering the preseed file, you can edit it to change the default value for any question and specify a custom answer.

    3. Unset all exported environment variables:

      jovyan@mynotebook-0:~$ unset ROK_REGISTRY_TOKEN ROK_REGISTRY_URL \ > ROK_BUCKET_PREFIX ROK_REGISTRY_BUCKET_PREFIX START_NOTEBOOKS \ > STOP_NOTEBOOKS
    4. Run the backup script. Choose one of the following options depending on whether you want to run the script interactively or non-interactively.

      Note

      In a non-interactive run you will not be prompted for input, while in a interactive run you will. If you have not explicitly specified an answer in the case of a non-interactive run, rok-backup will assume the default answer. The log output is redirected to stdout.

      jovyan@mynotebook-0:~$ rok-backup \ > --preseed-load backup-preseed.py

      Troubleshooting

      dialog.ExecutableNotFound

      If the above command fails with an error message similar to the following:

      dialog.ExecutableNotFound: Executable not found: can't find the executable for the dialog-like program

      it means your notebook does not have the dialog package installed. You can install it with:

      jovyan@mynotebook-0:~$ sudo apt install dialog

      and retry the command.

      jovyan@mynotebook-0:~$ rok-backup \ > --frontend non-interactive \ > --preseed-load backup-preseed.py

Verify

  1. Connect to the privileged notebook server and open a new terminal.

  2. Export the Rok bucket prefix:

    jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX=<ROK_BUCKET_PREFIX>

    Replace <ROK_BUCKET_PREFIX> with the name of the Rok bucket prefix you specified in step 5. For example:

    jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX="cluster-migration-2022-07-07-d3674"
  3. Format the names of the migration buckets:

    jovyan@mynotebook-0:~# export MLMD_BUCKET="${ROK_BUCKET_PREFIX?}-mlmd" \ > PIPELINES_BUCKET="${ROK_BUCKET_PREFIX?}-pipeline" \ > PROFILES_BUCKET="${ROK_BUCKET_PREFIX?}-profile" \ > NOTEBOOKS_BUCKET="${ROK_BUCKET_PREFIX?}-notebook" \ > MODELS_BUCKET="${ROK_BUCKET_PREFIX?}-model" \ > KATIB_BUCKET="${ROK_BUCKET_PREFIX?}-katib" \ > PVC_BUCKET="${ROK_BUCKET_PREFIX?}-pvc"
  4. Ensure that you have snapshotted and published the MLMD.

    1. Make sure that the metadata-mysql exists in the MLMD migration bucket:

      jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > object-list ${MLMD_BUCKET?} \ > | jq -r '.[].object_name' metadata-mysql

      Troubleshooting

      bash: jq: command not found

      If the above command fails with an error message similar to the following:

      bash: jq: command not found:

      it means your notebook does not have the jq package installed. You can install it with:

      jovyan@mynotebook-0:~$ sudo apt install jq

      and retry the command.

    2. Make sure that the MLMD bucket has been successfully published to the Rok Registry:

      jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > bucket-show ${MLMD_BUCKET?} \ > | jq -r '.throw_type' published
  5. Ensure that you have snapshotted and published all pipelines.

    1. Make sure that the minio and mysql PVCs exist in the pipelines migration bucket:

      jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > object-list ${PIPELINES_BUCKET?} \ > | jq -r '.[].object_name' minio-pv-claim mysql-pv-claim
    2. Make sure that the pipelines bucket has been successfully published to the Rok Registry:

      jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > bucket-show ${PIPELINES_BUCKET?} \ > | jq -r '.throw_type' published
  6. Ensure that you have snapshotted and published the Kubeflow profiles.

    1. List all the Kubeflow profiles of the cluster and make sure that they also exist in the profiles migration bucket:

      jovyan@mynotebook-0:~# diff <(kubectl get profiles -n kubeflow -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(rok --account kubeflow -o json object-list ${PROFILES_BUCKET?} \ > | jq -r '.[].object_name' | sort) \ > && echo "OK" || echo "FAIL" OK
    2. Make sure that the profiles bucket has been successfully published to the Rok Registry:

      jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > bucket-show ${PROFILES_BUCKET?} \ > | jq -r '.throw_type' published
  7. Choose a user namespace and verify that all EKF resources in that namespace have been snapshotted and published.

    1. Export the user namespace:

      jovyan@mynotebook-0:~$ export NAMESPACE=<NAMESPACE>

      Replace <NAMESPACE> with namespace of the user, for example:

      jovyan@mynotebook-0:~$ export NAMESPACE=kubeflow-user1
    2. Ensure that you have snapshotted and published all notebooks.

      1. List all the notebooks of the source cluster and make sure that they also exist in the notebooks migration bucket:

        jovyan@mynotebook-0:~# diff <(kubectl get notebooks -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(rok --account ${NAMESPACE?} -o json object-list ${NOTEBOOKS_BUCKET?} \ > | jq -r '.[].object_name' | sort) \ > && echo "OK" || echo "FAIL" OK
      2. Make sure that the notebooks bucket has been successfully published to the Rok Registry:

        jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${NOTEBOOKS_BUCKET?} \ > | jq -r '.throw_type' published
    3. Ensure that you have snapshotted and published all models.

      1. List all the Inference Services of the source cluster and make sure that they also exist in the models migration bucket:

        jovyan@mynotebook-0:~# MODELS=$(for object in \ > $(rok --account ${NAMESPACE?} -o json object-list ${MODELS_BUCKET?} \ > | jq -r '.[].object_name') ; do \ > if [[ "$(rok --account ${NAMESPACE?} -o json object-meta-show ${MODELS_BUCKET?} $object \ > | jq -r '.type')" == "CR" ]] > then echo $object ; fi ; done)
        jovyan@mynotebook-0:~# diff <(kubectl get inferenceservices -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(echo "${MODELS?}" | sort) \ > && echo "OK" || echo "FAIL" OK
      2. Make sure that the models bucket has been successfully published to the Rok Registry:

        jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${MODELS_BUCKET?} \ > | jq -r '.throw_type' published
    4. Ensure that you have snapshotted and published all Katib experiments.

      1. List all the Katib experiments of the source cluster and make sure that they also exist in the Katib migration bucket:

        jovyan@mynotebook-0:~# EXPERIMENTS=$(for object in \ > $(rok --account ${NAMESPACE?} -o json object-list ${KATIB_BUCKET?} \ > | jq -r '.[].object_name') ; do \ > if [[ "$(rok --account ${NAMESPACE?} -o json object-meta-show ${KATIB_BUCKET?} $object \ > | jq -r '.type')" == "experiment" ]] > then echo $object ; fi ; done)
        jovyan@mynotebook-0:~# diff <(kubectl get experiments -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort | xargs -I {} echo "experiment-{}") \ > <(echo "${EXPERIMENTS?}" | sort) \ > && echo "OK" || echo "FAIL" OK
      2. Make sure that the Katib bucket has been successfully published to the Rok Registry:

        jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${KATIB_BUCKET?} \ > | jq -r '.throw_type' published
    5. Ensure that you have snapshotted and published all PVCs backed by the Rok storage class.

      1. List all the PVCs of the source cluster backed by the Rok storage class. Make sure they also exist in the Rok PVCs migration bucket:

        jovyan@mynotebook-0:~# diff <(kubectl get pvc -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(rok --account ${NAMESPACE?} -o json object-list ${PVC_BUCKET?} \ > | jq -r '.[].object_name' | sort) \ > && echo "OK" || echo "FAIL" OK
      2. Make sure that the Rok PVCs bucket has been successfully published to the Rok Registry:

        jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${PVC_BUCKET?} \ > | jq -r '.throw_type' published
    6. Navigate to the Rok UI and make sure that all the Rok buckets you chose to back up have been successfully published to the Rok Registry.

  8. Go back to step 7, and repeat the steps for all the user namespaces for which you want to verify that notebooks, models, Katib experiments, PVCs, and buckets have been successfully snapshotted and published.

Summary

You have snapshotted all the EKF resources of the cluster, added them into Rok buckets, and published the buckets to a Rok Registry.

What’s Next

The next step is to subscribe to the buckets you just published, and present all EKF resources to the destination cluster.