Restore EKF cluster¶
In this guide you will use our rok-restore
tool to subscribe to Rok Registry
buckets, download the snapshots of all EKF resources of a source Arrikto EKF
cluster, and present them to the destination Arrikto EKF cluster. This way you
will complete the migration of an Arrikto EKF cluster.
What You’ll Need¶
- An existing Arrikto EKF deployment to use as the destination cluster and a Rok Registry deployment.
- A Rok cluster registered to the Rok Registry.
- A Rok cluster configured for syncing.
- An issued token for a Rok Registry user.
- An EKF user to act as the admin EKF user.
- A privileged notebook server in the namespace of the admin EKF user.
- The latest Arrikto wheels installed in the notebook.
- A backup for the source Arrikto EKF cluster.
Check Your Environment¶
Get the version of the Katib
mysql
database in the source cluster:root@rok-tools-src:~# kubectl exec -n kubeflow svc/katib-mysql -c katib-mysql \ > -- mysql --version \ > | tr -s ' ' \ > | cut -d ' ' -f 3 8.0.23Get the version of the Katib
mysql
database in the destination cluster:root@rok-tools-dst:~# kubectl exec -n kubeflow svc/katib-mysql -c katib-mysql \ > -- mysql --version \ > | tr -s ' ' \ > | cut -d ' ' -f 3 8.0.26Ensure that the Katib
mysql
version in the destination cluster is greater than the version in the source cluster. If not, update the Katibmysql
in the destination cluster to themysql
version of the source cluster:Export the
mysql
version of the source cluster:root@rok-tools-dst:~# export MYSQL_VERSION_SOURCE=<MYSQL_VERSION_SOURCE>Replace
<MYSQL_VERSION_SOURCE>
with themysql
version that you found in step 1. For example:root@rok-tools-dst:~# export MYSQL_VERSION_SOURCE=8.0.23Update the Katib
mysql
in the destination cluster:root@rok-tools-dst:~# kubectl patch -n kubeflow deploy katib-mysql \ > -p "{\"spec\": {\"template\": {\"spec\":{\"containers\":[{\"name\":\"katib-mysql\",\"image\":\"mysql:${MYSQL_VERSION_SOURCE?}\"}]}}}}"
Procedure¶
Connect to a privileged notebook server and open a new terminal.
Set the Rok Registry token:
Read a line from the standard input:
jovyan@mynotebook-0:~$ read -s ROK_REGISTRY_TOKENPaste the Rok Registry token you issued by following the relevant guide.
Export the Rok Registry token:
jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN
Note
You can also provide the Rok Registry token in a file:
jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN="file:<PATH_TO_FILE>"Replace
<PATH_TO_FILE>
with the path of your Rok Registry token, for example:jovyan@mynotebook-0:~$ export ROK_REGISTRY_TOKEN="file:/home/jovyan/registry.token"Set the Rok Registry URL:
jovyan@mynotebook-0:~$ export ROK_REGISTRY_URL=<URL>Replace
<URL>
with the base URL of your Rok Registry installation. For example:jovyan@mynotebook-0:~$ export ROK_REGISTRY_URL=https://arr-cluster.example.com/registrySet the Rok and Rok Registry bucket prefix you used when running the backup script in the source cluster, to distinguish which backup run to restore (step 5 of the backup guide):
jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX=<MIGRATION_ID> jovyan@mynotebook-0:~$ export ROK_REGISTRY_BUCKET_PREFIX=${ROK_BUCKET_PREFIX?}Replace
<MIGRATION_ID>
with the identifier you specified in the corresponding invocation of therok-backup
in the source cluster, for example:jovyan@mynotebook-0:~$ export ROK_BUCKET_PREFIX="cluster-migration-2022-07-07-d3674" jovyan@mynotebook-0:~$ export ROK_REGISTRY_BUCKET_PREFIX=${ROK_BUCKET_PREFIX?}Important
Use exactly the same Rok and Rok Registry bucket prefix you used in the corresponding backup run you want to restore.
Run the restore script to subscribe to Rok Registry and present the EKF resources to the destination cluster. Choose one of the following options, depending on whether you want the script to get its configuration options through environment variables or through a preseed file.
Choose one of the following options depending on whether you want to run the script interactively or non-interactively.
Note
In a non-interactive run you will not be prompted for input, while in an interactive run you will. If you have not explicitly specified an answer,
rok-restore
will assume the default answer. The log output is redirected tostdout
.jovyan@mynotebook-0:~$ rok-restoreTroubleshooting
dialog.ExecutableNotFound
If the above command fails with an error message similar to the following:
dialog.ExecutableNotFound: Executable not found: can't find the executable for the dialog-like programit means your notebook does not have the
dialog
package installed. You can install it with:jovyan@mynotebook-0:~$ sudo apt install dialogand retry the command.
jovyan@mynotebook-0:~$ rok-restore \ > --frontend non-interactiveCopy the
restore-preseed.py.j2
Jinja2 template inside your privileged notebook:restore-preseed.py.j21 # Copyright © 2022 Arrikto Inc. All Rights Reserved. 2 3 """EKF Migration Restoration Preseed File.""" 4-52 4 5 SEEDS = { 6 # Resources to restore 7 'question/resources': ['bucket', 8 'katib', 9 'mlmd', 10 'model', 11 'notebook', 12 'pipeline', 13 'profile', 14 'pvc'], 15 # The token to connect to Rok 16 # 'question/rok_token': <protected>, 17 # The URL of the Rok cluster 18 'question/rok_url': 'http://rok.rok.svc.cluster.local', 19 # The token to connect to Rok 20 'question/rok_registry_token': '{{ROK_REGISTRY_TOKEN}}', 21 # The URL of the Rok Registry cluster 22 'question/rok_registry_url': '{{ROK_REGISTRY_URL}}', 23 # The prefix for the local Rok buckets 24 'question/rok_bucket_prefix': 'cluster-migration', 25 # The prefix for the Registry buckets 26 # This MUST be the same as the one provided when running the backup 27 'question/rok_registry_bucket_prefix': '{{ROK_REGISTRY_BUCKET_PREFIX}}', 28 # Namespaces to restore / exclude per resource 29 'question/buckets/exclude_namespaces': [], 30 'question/buckets/namespaces': ['ALL'], 31 'question/katib/exclude_namespaces': [], 32 'question/katib/namespaces': ['ALL'], 33 'question/models/exclude_namespaces': [], 34 'question/models/namespaces': ['ALL'], 35 'question/notebooks/exclude_namespaces': [], 36 'question/notebooks/namespaces': ['ALL'], 37 'question/pvcs/exclude_namespaces': [], 38 'question/pvcs/namespaces': ['ALL'], 39 # Delete Kubernetes resources for which a copy has been found on the 40 # Registry, in order to restore the new version 41 'question/overwrite_all_buckets': True, 42 'question/overwrite_all_experiments': True, 43 'question/overwrite_all_models': True, 44 'question/overwrite_all_notebooks': True, 45 'question/overwrite_all_profiles': True, 46 'question/overwrite_all_pvcs': True, 47 # Start migrated resources after restoring them 48 'question/stop_notebooks': True, 49 'question/stop_models': True, 50 'question/stop_recurring_runs': True, 51 # Low priority question, applying default notebook configurations 52 # If no configurations are provided, this will default to whatever is 53 # included in the Notebook CR 54 'question/notebook_configurations': [] 55 } Render the preseed file:
jovyan@mynotebook-0:~$ j2 restore-preseed.py.j2 \ > -o restore-preseed.pyTroubleshooting
bash: j2: command not found
If the above command fails with an error message similar to the following:
bash: j2: command not foundit means your notebook does not have the
j2
Python package installed. You can install it with:jovyan@mynotebook-0:~$ pip3 install j2and retry the command.
Note
After rendering the preseed file, you can edit it to change the default value for any question and specify a custom answer.
Unset all exported environment variables:
jovyan@mynotebook-0:~$ unset ROK_REGISTRY_TOKEN ROK_REGISTRY_URL \ > ROK_BUCKET_PREFIX ROK_REGISTRY_BUCKET_PREFIXRun the restore script. Choose one of the following options depending on whether you want to run the script interactively or non-interactively.
Note
In a non-interactive run you will not be prompted for input, while in an interactive run you will. If you have not explicitly specified an answer,
rok-restore
will assume the default answer. The log output is redirected tostdout
.jovyan@mynotebook-0:~$ rok-restore \ > --preseed-load restore-preseed.pyTroubleshooting
dialog.ExecutableNotFound
If the above command fails with an error message similar to the following:
dialog.ExecutableNotFound: Executable not found: can't find the executable for the dialog-like programit means your notebook does not have the
dialog
package installed. You can install it with:jovyan@mynotebook-0:~$ sudo apt install dialogand retry the command.
jovyan@mynotebook-0:~$ rok-restore \ > --frontend non-interactive \ > --preseed-load restore-preseed.py
Note
Notebooks, models, and recurring pipelines are restored in a stopped state by default to avoid a cluster scale-out. Add the CLI arguments
--no-stop-notebooks
,--no-stop-models
, and--no-stop-recurring-runs
to restore the corresponding resources in the state they were on the source cluster.
Verify¶
Connect to the privileged notebook servers in the source and destination clusters, and open a new terminal in each one of them.
List all Kubeflow profiles in the source and destination clusters. Ensure that all Kubeflow profiles are the same, that is, the following command produces the same output in both clusters:
jovyan@notebook-0:~$ kubectl get profiles -A -o json \ > | jq -r '.items[].metadata.name' kubeflow-user1 kubeflow-user2Troubleshooting
bash: jq: command not found
If the above command fails with an error message similar to the following:
bash: jq: command not found:it means your notebook does not have the
jq
package installed. You can install it with:jovyan@mynotebook-0:~$ sudo apt install jqand retry the command.
List all notebooks in the source and destination clusters. Ensure that all notebooks are the same, that is, the following command produces the same output in both clusters:
jovyan@notebook-0:~$ kubectl get notebooks -A -o json \ > | jq -r '.items[].metadata.name' notebook1 notebook2List all pipelines in the source and destination clusters. Ensure that all pipelines are the same, that is, the following command produces the same output in both clusters:
jovyan@notebook-0:~$ python3 -c \ > "import kfp; print([p.name for p in kfp.Client().list_pipelines().pipelines])" ['pipeline1', 'pipeline2']List all models in the source and destination clusters. Ensure that all models are the same, that is, the following command produces the same output in both clusters:
jovyan@notebook-0:~$ kubectl get inferenceservices -A -o json \ > | jq -r '.items[].metadata.name' model1 model2List all Katib experiments in the source and destination clusters. Ensure that all Katib experiments are the same, that is, the following command produces the same output in both clusters:
jovyan@notebook-0:~$ kubectl get experiments -A -o json \ > | jq -r '.items[].metadata.name' experiment1 experiment2List all the PVCs backed by the Rok storage class in the source and destination clusters. Ensure that all PVCs backed by the Rok storage class are the same, that is, the following command produces the same output in both clusters:
jovyan@notebook-0:~$ kubectl get pvc -A -o json \ > | jq -r '.items[] | select(.spec.storageClassName=="rok") | .metadata.name' pvc1 pvc2Navigate to the Rok UI and make sure that all of the Rok buckets you chose to back up from the source cluster exist in the destination cluster.
Summary¶
You have subscribed to the Rok Registry buckets that contain the EKF resources’ snapshots of a source Arrikto EKF cluster, and presented them to the destination Arrikto EKF cluster.
What’s Next¶
Check out the rest of the maintenance operations that you can perform on your cluster.