Install Kubeflow

This section will guide you through installing Kubeflow alongside Rok, using the rok-deploy tool.

Choose one of the following options to install Kubeflow:

What You’ll Need

Option 1: Install Kubeflow Automatically (preferred)

Choose one of the following options, based on your cloud provider.

Install Kubeflow by following the on-screen instructions on the rok-deploy user interface.

If rok-deploy is not already running, start it with:

root@rok-tools:~# rok-deploy --run-from kubeflow-deploy
../../_images/kubeflow-deploy.png

Proceed to the Summary section.

Rok does not currently support automatic deployment on Azure. Please follow the instructions in the Option 2: Install Kubeflow Manually section to deploy Rok manually.
Rok does not currently support automatic deployment on Google Cloud. Please follow the instructions in the Option 2: Install Kubeflow Manually section to deploy Rok manually.

Option 2: Install Kubeflow Manually

If you want to install Kubeflow manually, follow the instructions below.

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Deploy Kubeflow:

    root@rok-tools:~/ops/deployments# rok-deploy --apply install/kubeflow

    Troubleshooting

    Cannot create resources in user namespaces

    If you have previously uninstalled Kubeflow and are re-applying your existing manifests to reinstall it, it is possible that the namespace resources cannot be applied because the user namespaces do not yet exist. In this case, follow the next steps to apply the profiles before installing Kubeflow so that you can create the namespaces during the deployment.

    1. Go to your GitOps repository, inside your rok-tools management environment:

      root@rok-tools:~# cd ~/ops/deployments
    2. Apply the Profile CRD:

      root@rok-tools:~/ops/deployments# rok-deploy --apply \ > kubeflow/manifests/apps/profiles/upstream/crd/
    3. Create all user profiles:

      root@rok-tools:~/ops/deployments# find kubeflow/manifests/common/namespace-resources/profiles/*.yaml \ > | xargs -n1 kubectl apply -f
    4. Deploy Kubeflow:

      root@rok-tools:~/ops/deployments# rok-deploy --apply install/kubeflow
  3. Configure the Argo workflow executor, if necessary. Choose one of the following options, based on your cloud provider:

    Skip this step. The executor that EKF configures by default is compatible with AWS.

    Follow the Configure Argo Workflow Executor guide to set the executor to PNS.

    Then, come back to this guide and follow the rest of the procedure.

    Skip this step. The executor that EKF configures by default is compatible with Google Cloud.

Verify

  1. Verify that the Dex pod is up-and-running. Check the pod status and verify field STATUS is Running and field READY is 2/2:

    root@rok-tools:~# kubectl -n auth get pods NAME READY STATUS RESTARTS AGE dex-57c98bb9bb-l466d 2/2 Running 3 17m
  2. Verify that the pods in the cert-manager namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1:

    root@rok-tools:~# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-qwgnj 1/1 Running 0 16m cert-manager-cainjector-5b9cd446fd-kl9gg 1/1 Running 0 16m cert-manager-webhook-64d967c45-jmxcz 1/1 Running 0 16m
  3. Verify that the pods in the istio-system namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1:

    root@rok-tools:~# kubectl -n istio-system get pods NAME READY STATUS RESTARTS AGE authservice-0 1/1 Running 0 17m cluster-local-gateway-b76ff5885-2rjg5 1/1 Running 0 2m23s istio-ingressgateway-57f58bf544-x45kw 1/1 Running 0 19m istiod-68f6c899f5-wzjfm 1/1 Running 0 19m
  4. Verify that the pods in the knative-monitoring namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is N/N:

    root@rok-tools:~# kubectl -n knative-monitoring get pods NAME READY STATUS RESTARTS AGE grafana-6695587d6f-ktf86 1/1 Running 0 2m41s kube-state-metrics-79ddb7fc64-w7s5m 1/1 Running 0 2m38s node-exporter-xlj2v 2/2 Running 0 2m3s node-exporter-zfjh5 2/2 Running 0 2m3s prometheus-system-0 1/1 Running 0 2m3s prometheus-system-1 1/1 Running 0 2m3s
  5. Verify that the pods in the knative-serving namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 2/2:

    root@rok-tools:~# kubectl -n knative-serving get pods NAME READY STATUS RESTARTS AGE activator-5d6754bc67-qb2ct 2/2 Running 0 2m47s autoscaler-6dd6dbbb84-zgwkf 2/2 Running 0 2m46s controller-687f6c6995-27fkw 2/2 Running 0 2m42s istio-webhook-8d4f5fbfb-tg6h4 2/2 Running 0 2m40s networking-istio-785675596f-nnqbr 2/2 Running 0 2m43s webhook-6d776d968c-gmnbz 2/2 Running 0 2m43s
  6. Verify that the pods in the kubeflow namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is N/N:

    root@rok-tools:~# kubectl -n kubeflow get pods NAME READY STATUS RESTARTS AGE admission-webhook-deployment-5d4cf6bbdb-jszsw 2/2 Running 0 16m centraldashboard-fd8774874-56587 2/2 Running 0 2m42s jupyter-web-app-deployment-7987d45c7d-5gwss 2/2 Running 0 2m42s katib-controller-54f895f874-g29bx 2/2 Running 2 2m41s katib-db-manager-6f5d8f5945-wmmnb 2/2 Running 1 2m48s katib-mysql-857bfdb7f9-w5zj8 2/2 Running 0 2m39s katib-ui-696fc69ddc-jkk2x 2/2 Running 2 2m38s kfp-cache-d96f57c8b-5cjht 3/3 Running 4 2m46s kfserving-controller-manager-0 3/3 Running 1 2m20s kubeflow-reception-9c67996fc-46djf 2/2 Running 1 15m metadata-db-d48d67699-89fg9 2/2 Running 0 2m44s metadata-envoy-deployment-775b466c45-4gbkx 1/1 Running 0 2m38s metadata-grpc-deployment-5c975cb96d-tq5vr 2/2 Running 4 2m37s minio-7c9b6578cd-7f2tb 2/2 Running 0 2m35s ml-pipeline-7867b5b879-dgmnj 2/2 Running 0 2m41s ml-pipeline-persistenceagent-8495768cbb-vpfjt 2/2 Running 0 2m33s ml-pipeline-scheduledworkflow-7f58d84f9f-4pf7d 2/2 Running 0 2m37s ml-pipeline-ui-678cb55d6f-z9spc 2/2 Running 0 2m32s ml-pipeline-viewer-crd-57768dc6c6-wtxjm 2/2 Running 1 2m30s ml-pipeline-visualizationserver-68498d6df6-ms74w 2/2 Running 0 2m28s models-web-app-748f8776df-zrc66 2/2 Running 0 2m34s mpi-operator-f658c675b-6jrln 1/1 Running 0 2m34s mxnet-operator-6594fb56b-q68pp 1/1 Running 0 2m25s mysql-55d57856d7-bzvgd 2/2 Running 0 2m25s notebook-controller-deployment-6cf9974cd9-2p9mj 2/2 Running 1 2m25s profiles-deployment-64cf74dfd4-b6dx2 3/3 Running 1 15m pvcviewer-controller-controller-manager-6dd55d9dfd-m5j8s 3/3 Running 1 2m23s pytorch-operator-74788b9d8c-prdsb 2/2 Running 0 2m29s spark-operatorsparkoperator-5775c699bb-4xgn2 2/2 Running 0 2m27s tensorboard-controller-controller-manager-7f766c8676-8g6fq 3/3 Running 2 2m22s tensorboards-web-app-deployment-6b4dfd598c-r9xgk 1/1 Running 0 2m25s tf-job-operator-d8b96567b-qj48v 2/2 Running 1 2m22s volumes-web-app-deployment-7b58b4c478-btfmw 2/2 Running 0 2m24s workflow-controller-76579565dd-8f6vw 2/2 Running 1 2m22s xgboost-operator-deployment-7dcff8bf85-t9hvr 2/2 Running 1 2m22s

Summary

You have successfully installed Kubeflow.

What’s Next

The next step is to integrate Rok with the Kubeflow dashboard.