Upgrade Kubeflow

This section describes how to upgrade Kubeflow. If you have not deployed Kubeflow in your cluster, you can safely skip this section.

What You’ll Need

Procedure

  1. Go to your GitOps repository inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Upgrade your Spark Operator installation:

    root@rok-tools:~/ops/deployments# rok-deploy --apply \ > kubeflow/manifests/contrib/spark/spark-operator/overlays/deploy \ > --force --force-kinds Deployment
  3. Upgrade your Kubeflow installation:

    root@rok-tools:~/ops/deployments# rok-deploy --apply install/kubeflow
  4. Configure the Argo workflows executor if necessary. Choose one of the following options, based on your cloud provider:

    You can skip this step since EKF already uses the correct executor.

    Follow the Configure Argo Workflow Executor guide to set the executor to PNS.

    Then, come back to this guide and follow the rest of the procedure.

    You can skip this step since EKF already uses the correct executor.

  5. Enable Istio sidecar injection for existing inference services:

    root@rok-tools:~/ops/deployments# rok-serving-upgrade

Verify

  1. Verify that the Dex pod is up-and-running. Check the pod status and verify field STATUS is Running and field READY is 2/2:

    root@rok-tools:~# kubectl -n auth get pods NAME READY STATUS RESTARTS AGE dex-57c98bb9bb-l466d 2/2 Running 3 1m
  2. Verify that the pods in the cert-manager namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1 for all Pods:

    root@rok-tools:~# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-qwgnj 1/1 Running 0 1m cert-manager-cainjector-5b9cd446fd-kl9gg 1/1 Running 0 1m cert-manager-webhook-64d967c45-jmxcz 1/1 Running 0 1m
  3. Verify that the pods in the istio-system namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1 for all Pods:

    root@rok-tools:~# kubectl -n istio-system get pods NAME READY STATUS RESTARTS AGE authservice-0 1/1 Running 0 1m cluster-local-gateway-b76ff5885-2rjg5 1/1 Running 0 1m istio-ingressgateway-57f58bf544-x45kw 1/1 Running 0 1m istiod-68f6c899f5-wzjfm 1/1 Running 0 1m
  4. Verify that the pods in the knative-monitoring namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is n/n for all Pods:

    root@rok-tools:~# kubectl -n knative-monitoring get pods NAME READY STATUS RESTARTS AGE grafana-6695587d6f-ktf86 1/1 Running 0 1m kube-state-metrics-79ddb7fc64-w7s5m 1/1 Running 0 1m node-exporter-xlj2v 2/2 Running 0 1m node-exporter-zfjh5 2/2 Running 0 1m prometheus-system-0 1/1 Running 0 1m prometheus-system-1 1/1 Running 0 1m
  5. Verify that the pods in the knative-serving namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 2/2 for all Pods:

    root@rok-tools:~# kubectl -n knative-serving get pods NAME READY STATUS RESTARTS AGE activator-5d6754bc67-qb2ct 2/2 Running 0 1m autoscaler-6dd6dbbb84-zgwkf 2/2 Running 0 1m controller-687f6c6995-27fkw 2/2 Running 0 1m istio-webhook-8d4f5fbfb-tg6h4 2/2 Running 0 1m networking-istio-785675596f-nnqbr 2/2 Running 0 1m webhook-6d776d968c-gmnbz 2/2 Running 0 1m
  6. Verify that the pods in the kubeflow namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is n/n for all Pods:

    root@rok-tools:~# kubectl -n kubeflow get pods NAME READY STATUS RESTARTS AGE admission-webhook-deployment-5d4cf6bbdb-jszsw 2/2 Running 0 1m centraldashboard-fd8774874-56587 2/2 Running 0 1m jupyter-web-app-deployment-7987d45c7d-5gwss 2/2 Running 0 1m katib-controller-54f895f874-g29bx 2/2 Running 2 1m katib-db-manager-6f5d8f5945-wmmnb 2/2 Running 1 1m katib-mysql-857bfdb7f9-w5zj8 2/2 Running 0 1m katib-ui-696fc69ddc-jkk2x 2/2 Running 2 1m kfp-cache-d96f57c8b-5cjht 3/3 Running 4 1m kfserving-controller-manager-0 3/3 Running 1 1m kubeflow-reception-9c67996fc-46djf 2/2 Running 1 1m metadata-db-d48d67699-89fg9 2/2 Running 0 1m metadata-envoy-deployment-775b466c45-4gbkx 1/1 Running 0 1m metadata-grpc-deployment-5c975cb96d-tq5vr 2/2 Running 4 1m minio-7c9b6578cd-7f2tb 2/2 Running 0 1m ml-pipeline-7867b5b879-dgmnj 2/2 Running 0 1m ml-pipeline-persistenceagent-8495768cbb-vpfjt 2/2 Running 0 1m ml-pipeline-scheduledworkflow-7f58d84f9f-4pf7d 2/2 Running 0 1m ml-pipeline-ui-678cb55d6f-z9spc 2/2 Running 0 1m ml-pipeline-viewer-crd-57768dc6c6-wtxjm 2/2 Running 1 1m ml-pipeline-visualizationserver-68498d6df6-ms74w 2/2 Running 0 1m models-web-app-748f8776df-zrc66 2/2 Running 0 1m mpi-operator-f658c675b-6jrln 1/1 Running 0 1m mxnet-operator-6594fb56b-q68pp 1/1 Running 0 1m mysql-55d57856d7-bzvgd 2/2 Running 0 1m notebook-controller-deployment-6cf9974cd9-2p9mj 2/2 Running 1 1m profiles-deployment-64cf74dfd4-b6dx2 3/3 Running 1 1m pvcviewer-controller-controller-manager-6dd55d9dfd-m5j8s 3/3 Running 1 1m pytorch-operator-74788b9d8c-prdsb 2/2 Running 0 1m spark-operatorsparkoperator-5775c699bb-4xgn2 2/2 Running 0 1m tensorboard-controller-controller-manager-7f766c8676-8g6fq 3/3 Running 2 1m tensorboards-web-app-deployment-6b4dfd598c-r9xgk 1/1 Running 0 1m tf-job-operator-d8b96567b-qj48v 2/2 Running 1 1m volumes-web-app-deployment-7b58b4c478-btfmw 2/2 Running 0 1m workflow-controller-76579565dd-8f6vw 2/2 Running 1 1m xgboost-operator-deployment-7dcff8bf85-t9hvr 2/2 Running 1 1m
  7. Verify that there are no inference services with Istio sidecar injection disabled, that is, the following command produces no output:

    root@rok-tools:~# kubectl get isvc -A -o json \ > | jq -r '.items[] | select(.metadata.annotations["sidecar.istio.io/inject"]=="false") | .metadata.namespace, .metadata.name' \ > | paste - -

Summary

You have successfully upgraded Kubeflow.

What’s Next

The next step is to upgrade Cluster Autoscaler.