Upgrade Kubeflow

This section describes how to upgrade Kubeflow.

What You’ll Need

Procedure

  1. Go to your GitOps repository inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Upgrade your Spark Operator installation first:

    root@rok-tools:~# rok-deploy \ > --apply kubeflow/manifests/contrib/spark/spark-operator/overlays/deploy \ > --force --force-kinds Deployment
  3. Upgrade your Kubeflow installation:

    root@rok-tools:~/ops/deployments# rok-deploy --apply install/kubeflow
  4. Remove the deprecated resources left by the previous version of Kubeflow:

    root@rok-tools:~/ops/deployments# rok-kf-prune --app kubeflow
  5. Remove the deprecated resources left by the previous version of Knative:

    root@rok-tools:~/ops/deployments# rok-kf-prune --app knative
  6. Migrate from KFServing to KServe.

    1. Migrate any existing KFServing InferenceServices to KServe:

      root@rok-tools:~/ops/deployments# rok-kserve-migrate

      Important

      During the migration, KServe will use the default ServingRuntime of the specific framework as the backend of your new InferenceService. As such, your InferenceService may not become ready if:

      • You were using a custom image to serve your model, for example, using a specific runtimeVersion in the InferenceService spec or changing the inferenceservice-config ConfigMap.
      • Your model is not compatible with the new default ServingRuntime, for example, there is a mismatch between the version of the library used to build your model and the version of the library the default ServingRuntime is using.

      In such cases, you need to:

      • Create a new ServingRuntime with your custom image or an image compatible with the model you want to serve.
      • Patch your InferenceService to use this ServingRuntime.

      Note that until the new Revision becomes ready, the old one will remain up-and-running.

      See also

    2. Verify that there are no KFServing inference services present:

      root@rok-tools:~/ops/deployments# kubectl get inferenceservices.serving.kubeflow.org -A No resources found
    3. Delete the deprecated KFServing resources:

      root@rok-tools:~/ops/deployments# rok-deploy --delete kubeflow/manifests/apps/kfserving/upstream/overlays/deploy
    4. Optional

      KServe 0.8 supports path-based serving. If you have already exposed serving and you want to switch from host-based serving follow the corresponding Operations guide.

Verify

  1. Verify that the Dex Pod is up and running. Check the Pod status and verify that field STATUS is Running and field READY is 2/2:

    root@rok-tools:~# kubectl -n auth get pods NAME READY STATUS RESTARTS AGE dex-0 2/2 Running 3 1m
  2. Verify that the Pods in the cert-manager namespace are up and running. Check the Pod status and verify that field STATUS is Running and field READY is 1/1 for all Pods:

    root@rok-tools:~# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-qwgnj 1/1 Running 0 1m cert-manager-cainjector-5b9cd446fd-kl9gg 1/1 Running 0 1m cert-manager-webhook-64d967c45-jmxcz 1/1 Running 0 1m
  3. Verify that the Pods in the istio-system namespace are up and running. Check the Pod status and verify that field STATUS is Running and field READY is 1/1 for all Pods:

    root@rok-tools:~# kubectl -n istio-system get pods NAME READY STATUS RESTARTS AGE authservice-0 1/1 Running 0 1m istio-ingressgateway-57f58bf544-x45kw 1/1 Running 0 1m istiod-68f6c899f5-wzjfm 1/1 Running 0 1m
  4. Verify that the Pods in the knative-monitoring namespace are up and running. Check the Pod status and verify that field STATUS is Running and field READY is N/N for all Pods:

    root@rok-tools:~# kubectl -n knative-monitoring get pods NAME READY STATUS RESTARTS AGE grafana-6695587d6f-ktf86 1/1 Running 0 1m kube-state-metrics-79ddb7fc64-w7s5m 1/1 Running 0 1m node-exporter-xlj2v 2/2 Running 0 1m node-exporter-zfjh5 2/2 Running 0 1m prometheus-system-0 1/1 Running 0 1m prometheus-system-1 1/1 Running 0 1m
  5. Verify that the Pods in the knative-serving namespace are up and running. Check the Pod status and verify that field STATUS is Running and field READY is 2/2 for all Pods:

    root@rok-tools:~# kubectl -n knative-serving get pods NAME READY STATUS RESTARTS AGE activator-5d6754bc67-qb2ct 2/2 Running 0 1m autoscaler-6dd6dbbb84-zgwkf 2/2 Running 0 1m controller-687f6c6995-27fkw 2/2 Running 0 1m istio-webhook-8d4f5fbfb-tg6h4 2/2 Running 0 1m networking-istio-785675596f-nnqbr 2/2 Running 0 1m webhook-6d776d968c-gmnbz 2/2 Running 0 1m
  6. Verify that the Pods in the kubeflow namespace are up and running. Check the Pod status and verify that field STATUS is Running and field READY is N/N for all Pods:

    root@rok-tools:~# kubectl -n kubeflow get pods NAME READY STATUS RESTARTS AGE admission-webhook-deployment-5d4cf6bbdb-jszsw 2/2 Running 0 1m cache-server-68ffc8d4ff-ltl8q 2/2 Running 0 1m centraldashboard-fd8774874-56587 2/2 Running 0 1m jupyter-web-app-deployment-7987d45c7d-5gwss 2/2 Running 0 1m katib-controller-54f895f874-g29bx 2/2 Running 2 1m katib-db-manager-6f5d8f5945-wmmnb 2/2 Running 1 1m katib-mysql-857bfdb7f9-w5zj8 2/2 Running 0 1m katib-ui-696fc69ddc-jkk2x 2/2 Running 2 1m kfp-cache-d96f57c8b-5cjht 3/3 Running 4 1m kfserving-controller-manager-0 3/3 Running 1 1m kfserving-models-web-app-77cc4c8dd6-86v92 2/2 Running 0 1m kubeflow-reception-9c67996fc-46djf 2/2 Running 1 1m metadata-db-d48d67699-89fg9 2/2 Running 0 1m metadata-envoy-deployment-775b466c45-4gbkx 1/1 Running 0 1m metadata-grpc-deployment-5c975cb96d-tq5vr 2/2 Running 4 1m minio-7c9b6578cd-7f2tb 2/2 Running 0 1m ml-pipeline-7867b5b879-dgmnj 2/2 Running 0 1m ml-pipeline-persistenceagent-8495768cbb-vpfjt 2/2 Running 0 1m ml-pipeline-scheduledworkflow-7f58d84f9f-4pf7d 2/2 Running 0 1m ml-pipeline-ui-678cb55d6f-z9spc 2/2 Running 0 1m ml-pipeline-viewer-crd-57768dc6c6-wtxjm 2/2 Running 1 1m ml-pipeline-visualizationserver-68498d6df6-ms74w 2/2 Running 0 1m mysql-55d57856d7-bzvgd 2/2 Running 0 1m notebook-controller-deployment-6cf9974cd9-2p9mj 2/2 Running 1 1m profiles-deployment-64cf74dfd4-b6dx2 3/3 Running 1 1m pvcviewer-controller-controller-manager-6dd55d9dfd-m5j8s 3/3 Running 1 1m spark-operatorsparkoperator-5775c699bb-4xgn2 2/2 Running 0 1m tensorboard-controller-controller-manager-7f766c8676-8g6fq 3/3 Running 2 1m tensorboards-web-app-deployment-6b4dfd598c-r9xgk 1/1 Running 0 1m training-operator-747f797684-f6jhd 2/2 Running 0 1m volumes-web-app-deployment-7b58b4c478-btfmw 2/2 Running 0 1m workflow-controller-76579565dd-8f6vw 2/2 Running 1 1m
  7. Verify that the Pods in the kserve namespace are up and running. Check the Pod status and verify that field STATUS is Running and field READY is N/N for all Pods:

    root@rok-tools:~# kubectl -n kserve get pods NAME READY STATUS RESTARTS AGE kserve-controller-manager-0 3/3 Running 1 1m

Summary

You have successfully upgraded Kubeflow.

What’s Next

The next step is to upgrade the Rok Registry etcd.