Back Up EKF cluster¶

In this guide you will use our rok-backup tool to snapshot all the EKF resources of an Arrikto EKF cluster, add them into Rok buckets, and publish the buckets to a Rok Registry. This way you will back up your EKF cluster so you can later restore and migrate it to a new destination cluster.

Overview

What You’ll Need
Procedure
Verify
Summary
What’s Next

What You’ll Need ¶

An existing Arrikto EKF and Rok Registry deployment.
A Rok cluster registered to the Rok Registry.
A Rok cluster configured for syncing.
An issued token for a Rok Registry user.
An EKF user to act as the admin EKF user.
A privileged notebook server in the namespace of the admin EKF user.
The latest Arrikto wheels installed in the notebook.

Procedure ¶

Connect to the privileged notebook server and open a new terminal.
Set the Rok Registry token:
1. Read a line from the standard input:
  
  jovyan@mynotebook-0:~$ read -s ROK_REGISTRY_TOKEN
2. Paste the Rok Registry token you issued by following the relevant guide.
3. Export the Rok Registry token:
  
  jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN
Note

You can also provide the Rok Registry token in a file:

jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN="file:<PATH_TO_FILE>"

Replace <PATH_TO_FILE> with the path of your Rok Registry token, for example:

jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN="file:/home/jovyan/registry.token"
Set the Rok Registry URL:

jovyan@mynotebook-0:~$ export ROK_REGISTRY_URL=<URL>

Replace <URL> with the base URL of your Rok Registry installation. For example:

jovyan@mynotebook-0:~$ export ROK_REGISTRY_URL=https://arr-cluster.example.com/registry
Choose whether to start notebooks, depending on your EKF version of your source cluster:

EKF version < 1.4

EKF version >= 1.4

If your Rok cluster version is older than 1.4, start the notebooks before snapshotting them:

jovyan@mynotebook-0:~$ export START_NOTEBOOKS=true jovyan@mynotebook-0:~$ export STOP_NOTEBOOKS=true

If your Rok cluster version is 1.4 or greater, do not start the notebooks before snapshotting them:

jovyan@mynotebook-0:~$ export START_NOTEBOOKS=false jovyan@mynotebook-0:~$ export STOP_NOTEBOOKS=false
Set an identifier for the bucket prefix that the rok-backup will use when creating the Rok and Rok Registry buckets:

jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX=<MIGRATION_ID> jovyan@mynotebook-0:~$ export ROK_REGISTRY_BUCKET_PREFIX=${ROK_BUCKET_PREFIX?}

Replace <MIGRATION_ID> with a custom unique name for the backup. For example, to include the date and a UID, run:

jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX=$(python3 -c \ > 'import uuid, datetime; \ > print("cluster-migration-%s-%s" \ > % (datetime.date.today(), uuid.uuid4().hex[:5]))') jovyan@mynotebook-0:~$ export ROK_REGISTRY_BUCKET_PREFIX=${ROK_BUCKET_PREFIX?}

Important

Use a unique name for the ROK_BUCKET_PREFIX and the ROK_REGISTRY_BUCKET_PREFIX. This prefix distinguishes this backup run from others. If you have already used the same identifier for ROK_BUCKET_PREFIX or ROK_REGISTRY_BUCKET_PREFIX in a previous backup run, you will override the previous backup.

Run the backup script to snapshot the EKF resources and publish them to the Rok Registry. Choose one of the following options, depending on whether you want the script to get its configuration options through environment variables or through a preseed file.

Choose one of the following options depending on whether you want to run the script interactively or non-interactively.

Note

In a non-interactive run you will not be prompted for input, while in an interactive run you will. If you have not explicitly specified an answer in the case of a non-interactive run, rok-backup will assume the default answer. The log output is redirected to stdout.

jovyan@mynotebook-0:~$ rok-backup

Troubleshooting

dialog.ExecutableNotFound

If the above command fails with an error message similar to the following:

dialog.ExecutableNotFound: Executable not found: can't
find the executable for the dialog-like program

it means your notebook does not have the dialog package installed. You can install it with:

jovyan@mynotebook-0:~$ sudo apt install dialog

and retry the command.

jovyan@mynotebook-0:~$ rok-backup \
> --frontend non-interactive

Copy the backup-preseed.py.j2 Jinja2 template inside your privileged notebook:

backup-preseed.py.j2

1# Copyright © 2022 Arrikto Inc.  All Rights Reserved.
2
3"""EKF Migration Backup Preseed File."""
4-42
4
5SEEDS = {
  # Resources to back up
  'question/resources': ['bucket',
                         'katib',
                         'mlmd',
                         'model',
                         'notebook',
                         'pipeline',
                         'profile',
                         'pvc'],
  # The token to connect to Rok
  # 'question/rok_token': <protected>,
  # The URL of the Rok cluster
  'question/rok_url': 'http://rok.rok.svc.cluster.local',
  # The token to connect to Rok Registry
  'question/rok_registry_token': '{{ROK_REGISTRY_TOKEN}}',
  # The URL of the Rok Registry cluster
  'question/rok_registry_url': '{{ROK_REGISTRY_URL}}',
  # The prefix for the local Rok buckets
  'question/rok_bucket_prefix': 'cluster-migration',
  # The prefix for the Registry buckets
  'question/rok_registry_bucket_prefix': '{{ROK_REGISTRY_BUCKET_PREFIX}}',
  # Namespaces to back up / exclude per resource
  'question/buckets/exclude_namespaces': [],
  'question/buckets/namespaces': ['ALL'],
  'question/katib/exclude_namespaces': [],
  'question/katib/namespaces': ['ALL'],
  'question/models/exclude_namespaces': [],
  'question/models/namespaces': ['ALL'],
  'question/notebooks/exclude_namespaces': [],
  'question/notebooks/namespaces': ['ALL'],
  'question/pvcs/exclude_namespaces': [],
  'question/pvcs/namespaces': ['ALL'],
  # Skip notebooks for which a snapshot exists
  'question/skip_existing_notebooks': False,
  'question/skip_existing_profiles': False,
  # Start stoppped notebooks so that a snapshot can be taken
  'question/start_notebooks': '{{START_NOTEBOOKS}}',
  # Stop notebooks started by the script
  'question/stop_notebooks': '{{STOP_NOTEBOOKS}}'
45}

Render the preseed file:

jovyan@mynotebook-0:~$ j2 backup-preseed.py.j2 \ > -o backup-preseed.py

Troubleshooting

bash: j2: command not found

If the above command fails with an error message similar to the following:

bash: j2: command not found

it means your notebook does not have the j2 Python package installed. You can install it with:

jovyan@mynotebook-0:~$ pip3 install j2

and retry the command.

Note

After rendering the preseed file, you can edit it to change the default value for any question and specify a custom answer.
Unset all exported environment variables:

jovyan@mynotebook-0:~$ unset ROK_REGISTRY_TOKEN ROK_REGISTRY_URL \ > ROK_BUCKET_PREFIX ROK_REGISTRY_BUCKET_PREFIX START_NOTEBOOKS \ > STOP_NOTEBOOKS
Run the backup script. Choose one of the following options depending on whether you want to run the script interactively or non-interactively.

Note

In a non-interactive run you will not be prompted for input, while in a interactive run you will. If you have not explicitly specified an answer in the case of a non-interactive run, rok-backup will assume the default answer. The log output is redirected to stdout.

Interactive

Non-Interactive

jovyan@mynotebook-0:~$ rok-backup \ > --preseed-load backup-preseed.py

Troubleshooting

dialog.ExecutableNotFound

If the above command fails with an error message similar to the following:

dialog.ExecutableNotFound: Executable not found: can't find the executable for the dialog-like program

it means your notebook does not have the dialog package installed. You can install it with:

jovyan@mynotebook-0:~$ sudo apt install dialog

and retry the command.

jovyan@mynotebook-0:~$ rok-backup \ > --frontend non-interactive \ > --preseed-load backup-preseed.py

Verify ¶

Connect to the privileged notebook server and open a new terminal.
Export the Rok bucket prefix:

jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX=<ROK_BUCKET_PREFIX>

Replace <ROK_BUCKET_PREFIX> with the name of the Rok bucket prefix you specified in step 5. For example:

jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX="cluster-migration-2022-07-07-d3674"
Format the names of the migration buckets:

jovyan@mynotebook-0:~# export MLMD_BUCKET="${ROK_BUCKET_PREFIX?}-mlmd" \ > PIPELINES_BUCKET="${ROK_BUCKET_PREFIX?}-pipeline" \ > PROFILES_BUCKET="${ROK_BUCKET_PREFIX?}-profile" \ > NOTEBOOKS_BUCKET="${ROK_BUCKET_PREFIX?}-notebook" \ > MODELS_BUCKET="${ROK_BUCKET_PREFIX?}-model" \ > KATIB_BUCKET="${ROK_BUCKET_PREFIX?}-katib" \ > PVC_BUCKET="${ROK_BUCKET_PREFIX?}-pvc"
Ensure that you have snapshotted and published the MLMD.
1. Make sure that the metadata-mysql exists in the MLMD migration bucket:
  
  jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > object-list ${MLMD_BUCKET?} \ > | jq -r '.[].object_name' metadata-mysql
  
  Troubleshooting
  
  bash: jq: command not found
  
  If the above command fails with an error message similar to the following:
  
  bash: jq: command not found:
  
  it means your notebook does not have the jq package installed. You can install it with:
  
  jovyan@mynotebook-0:~$ sudo apt install jq
  
  and retry the command.
2. Make sure that the MLMD bucket has been successfully published to the Rok Registry:
  
  jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > bucket-show ${MLMD_BUCKET?} \ > | jq -r '.throw_type' published
Ensure that you have snapshotted and published all pipelines.
1. Make sure that the minio and mysql PVCs exist in the pipelines migration bucket:
  
  jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > object-list ${PIPELINES_BUCKET?} \ > | jq -r '.[].object_name' minio-pv-claim mysql-pv-claim
2. Make sure that the pipelines bucket has been successfully published to the Rok Registry:
  
  jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > bucket-show ${PIPELINES_BUCKET?} \ > | jq -r '.throw_type' published
Ensure that you have snapshotted and published the Kubeflow profiles.
1. List all the Kubeflow profiles of the cluster and make sure that they also exist in the profiles migration bucket:
  
  jovyan@mynotebook-0:~# diff <(kubectl get profiles -n kubeflow -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(rok --account kubeflow -o json object-list ${PROFILES_BUCKET?} \ > | jq -r '.[].object_name' | sort) \ > && echo "OK" || echo "FAIL" OK
2. Make sure that the profiles bucket has been successfully published to the Rok Registry:
  
  jovyan@mynotebook-0:~# rok --account kubeflow -o json \ > bucket-show ${PROFILES_BUCKET?} \ > | jq -r '.throw_type' published
Choose a user namespace and verify that all EKF resources in that namespace have been snapshotted and published.
1. Export the user namespace:
  
  jovyan@mynotebook-0:~$ export NAMESPACE=<NAMESPACE>
  
  Replace <NAMESPACE> with namespace of the user, for example:
  
  jovyan@mynotebook-0:~$ export NAMESPACE=kubeflow-user1
2. Ensure that you have snapshotted and published all notebooks.
  1. List all the notebooks of the source cluster and make sure that they also exist in the notebooks migration bucket:
    
    jovyan@mynotebook-0:~# diff <(kubectl get notebooks -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(rok --account ${NAMESPACE?} -o json object-list ${NOTEBOOKS_BUCKET?} \ > | jq -r '.[].object_name' | sort) \ > && echo "OK" || echo "FAIL" OK
  2. Make sure that the notebooks bucket has been successfully published to the Rok Registry:
    
    jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${NOTEBOOKS_BUCKET?} \ > | jq -r '.throw_type' published
3. Ensure that you have snapshotted and published all models.
  1. List all the Inference Services of the source cluster and make sure that they also exist in the models migration bucket:
    
    jovyan@mynotebook-0:~# MODELS=$(for object in \ > $(rok --account ${NAMESPACE?} -o json object-list ${MODELS_BUCKET?} \ > | jq -r '.[].object_name') ; do \ > if [[ "$(rok --account ${NAMESPACE?} -o json object-meta-show ${MODELS_BUCKET?} $object \ > | jq -r '.type')" == "CR" ]] > then echo $object ; fi ; done)
    
    jovyan@mynotebook-0:~# diff <(kubectl get inferenceservices -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(echo "${MODELS?}" | sort) \ > && echo "OK" || echo "FAIL" OK
  2. Make sure that the models bucket has been successfully published to the Rok Registry:
    
    jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${MODELS_BUCKET?} \ > | jq -r '.throw_type' published
4. Ensure that you have snapshotted and published all Katib experiments.
  1. List all the Katib experiments of the source cluster and make sure that they also exist in the Katib migration bucket:
    
    jovyan@mynotebook-0:~# EXPERIMENTS=$(for object in \ > $(rok --account ${NAMESPACE?} -o json object-list ${KATIB_BUCKET?} \ > | jq -r '.[].object_name') ; do \ > if [[ "$(rok --account ${NAMESPACE?} -o json object-meta-show ${KATIB_BUCKET?} $object \ > | jq -r '.type')" == "experiment" ]] > then echo $object ; fi ; done)
    
    jovyan@mynotebook-0:~# diff <(kubectl get experiments -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort | xargs -I {} echo "experiment-{}") \ > <(echo "${EXPERIMENTS?}" | sort) \ > && echo "OK" || echo "FAIL" OK
  2. Make sure that the Katib bucket has been successfully published to the Rok Registry:
    
    jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${KATIB_BUCKET?} \ > | jq -r '.throw_type' published
5. Ensure that you have snapshotted and published all PVCs backed by the Rok storage class.
  1. List all the PVCs of the source cluster backed by the Rok storage class. Make sure they also exist in the Rok PVCs migration bucket:
    
    jovyan@mynotebook-0:~# diff <(kubectl get pvc -n ${NAMESPACE} -o json \ > | jq -r '.items[].metadata.name' | sort) \ > <(rok --account ${NAMESPACE?} -o json object-list ${PVC_BUCKET?} \ > | jq -r '.[].object_name' | sort) \ > && echo "OK" || echo "FAIL" OK
  2. Make sure that the Rok PVCs bucket has been successfully published to the Rok Registry:
    
    jovyan@mynotebook-0:~# rok --account ${NAMESPACE?} -o json \ > bucket-show ${PVC_BUCKET?} \ > | jq -r '.throw_type' published
6. Navigate to the Rok UI and make sure that all the Rok buckets you chose to back up have been successfully published to the Rok Registry.
Go back to step 7, and repeat the steps for all the user namespaces for which you want to verify that notebooks, models, Katib experiments, PVCs, and buckets have been successfully snapshotted and published.

Summary ¶

You have snapshotted all the EKF resources of the cluster, added them into Rok buckets, and published the buckets to a Rok Registry.

What’s Next ¶

The next step is to subscribe to the buckets you just published, and present all EKF resources to the destination cluster.

Restore EKF cluster

Previous Next

Back Up EKF cluster¶

What You’ll Need¶

Procedure¶

Verify¶

Summary¶

What’s Next¶

What You’ll Need ¶

Procedure ¶

Verify ¶

Summary ¶

What’s Next ¶