Install Kubeflow

This section will guide you through installing Kubeflow alongside Rok, using the rok-deploy tool.

Choose one of the following options to install Kubeflow:

What You'll Need

Option 1: Install Kubeflow Automatically (preferred)

Install Kubeflow by following the on-screen instructions on the rok-deploy user interface.

If rok-deploy is not already running, start it with:

root@rok-tools:~# rok-deploy --run-from kubeflow-deploy
../../_images/kubeflow-deploy.png

Proceed to the Summary section.

Option 2: Install Kubeflow Manually

If you want to install Kubeflow manually, follow the instructions below.

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
    
  2. Deploy Kubeflow:

    root@rok-tools:~/ops/deployments# rok-deploy --apply install/kubeflow
    

    Troubleshooting

    Cannot create resources in user namespaces

    If you have previously uninstalled Kubeflow and are re-applying your existing manifests to reinstall it, it is possible that the namespace resources cannot be applied because the user namespaces do not yet exist. In this case, follow the next steps to apply the profiles before installing Kubeflow so that you can create the namespaces during the deployment.

    1. Go to your GitOps repository, inside your rok-tools management environment:

      root@rok-tools:~# cd ~/ops/deployments
      
    2. Apply the Profile CRD:

      root@rok-tools:~/ops/deployments# rok-deploy --apply \
      > kubeflow/manifests/apps/profiles/upstream/crd/
      
    3. Create all user profiles:

      root@rok-tools:~/ops/deployments# find kubeflow/manifests/common/namespace-resources/profiles/*.yaml \
      > | xargs -n1 kubectl apply -f
      
    4. Deploy Kubeflow:

      root@rok-tools:~/ops/deployments# rok-deploy --apply install/kubeflow
      

Verify

  1. Verify that the Dex pod is up-and-running. Check the pod status and verify field STATUS is Running and field READY is 2/2:

    root@rok-tools:~# kubectl -n auth get pods
    NAME                   READY   STATUS    RESTARTS   AGE
    dex-57c98bb9bb-l466d   2/2     Running   3          17m
    
  2. Verify that the pods in the cert-manager namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1:

    root@rok-tools:~# kubectl -n cert-manager get pods
    NAME                                       READY   STATUS    RESTARTS   AGE
    cert-manager-6d86476c77-qwgnj              1/1     Running   0          16m
    cert-manager-cainjector-5b9cd446fd-kl9gg   1/1     Running   0          16m
    cert-manager-webhook-64d967c45-jmxcz       1/1     Running   0          16m
    
  3. Verify that the pods in the istio-system namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1:

    root@rok-tools:~# kubectl -n istio-system get pods
    NAME                                    READY   STATUS    RESTARTS   AGE
    authservice-0                           1/1     Running   0          17m
    cluster-local-gateway-b76ff5885-2rjg5   1/1     Running   0          2m23s
    istio-ingressgateway-57f58bf544-x45kw   1/1     Running   0          19m
    istiod-68f6c899f5-wzjfm                 1/1     Running   0          19m
    
  4. Verify that the pods in the knative-monitoring namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is N/N:

    root@rok-tools:~# kubectl -n knative-monitoring get pods
    NAME                                  READY   STATUS    RESTARTS   AGE
    grafana-6695587d6f-ktf86              1/1     Running   0          2m41s
    kube-state-metrics-79ddb7fc64-w7s5m   1/1     Running   0          2m38s
    node-exporter-xlj2v                   2/2     Running   0          2m3s
    node-exporter-zfjh5                   2/2     Running   0          2m3s
    prometheus-system-0                   1/1     Running   0          2m3s
    prometheus-system-1                   1/1     Running   0          2m3s
    
  5. Verify that the pods in the knative-serving namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 2/2:

    root@rok-tools:~# kubectl -n knative-serving get pods
    NAME                                READY   STATUS    RESTARTS   AGE
    activator-5d6754bc67-qb2ct          2/2     Running   0          2m47s
    autoscaler-6dd6dbbb84-zgwkf         2/2     Running   0          2m46s
    controller-687f6c6995-27fkw         2/2     Running   0          2m42s
    istio-webhook-8d4f5fbfb-tg6h4       2/2     Running   0          2m40s
    networking-istio-785675596f-nnqbr   2/2     Running   0          2m43s
    webhook-6d776d968c-gmnbz            2/2     Running   0          2m43s
    
  6. Verify that the pods in the kubeflow namespace are up-and-running. Check the pod status and verify field STATUS is Running and field READY is N/N:

    root@rok-tools:~# kubectl -n kubeflow get pods
    NAME                                                         READY   STATUS    RESTARTS   AGE
    admission-webhook-deployment-5d4cf6bbdb-jszsw                2/2     Running   0          16m
    centraldashboard-fd8774874-56587                             2/2     Running   0          2m42s
    jupyter-web-app-deployment-7987d45c7d-5gwss                  2/2     Running   0          2m42s
    katib-controller-54f895f874-g29bx                            2/2     Running   2          2m41s
    katib-db-manager-6f5d8f5945-wmmnb                            2/2     Running   1          2m48s
    katib-mysql-857bfdb7f9-w5zj8                                 2/2     Running   0          2m39s
    katib-ui-696fc69ddc-jkk2x                                    2/2     Running   2          2m38s
    kfp-cache-d96f57c8b-5cjht                                    3/3     Running   4          2m46s
    kfserving-controller-manager-0                               3/3     Running   1          2m20s
    kubeflow-reception-9c67996fc-46djf                           2/2     Running   1          15m
    metadata-db-d48d67699-89fg9                                  2/2     Running   0          2m44s
    metadata-envoy-deployment-775b466c45-4gbkx                   1/1     Running   0          2m38s
    metadata-grpc-deployment-5c975cb96d-tq5vr                    2/2     Running   4          2m37s
    minio-7c9b6578cd-7f2tb                                       2/2     Running   0          2m35s
    ml-pipeline-7867b5b879-dgmnj                                 2/2     Running   0          2m41s
    ml-pipeline-persistenceagent-8495768cbb-vpfjt                2/2     Running   0          2m33s
    ml-pipeline-scheduledworkflow-7f58d84f9f-4pf7d               2/2     Running   0          2m37s
    ml-pipeline-ui-678cb55d6f-z9spc                              2/2     Running   0          2m32s
    ml-pipeline-viewer-crd-57768dc6c6-wtxjm                      2/2     Running   1          2m30s
    ml-pipeline-visualizationserver-68498d6df6-ms74w             2/2     Running   0          2m28s
    models-web-app-748f8776df-zrc66                              2/2     Running   0          2m34s
    mpi-operator-f658c675b-6jrln                                 1/1     Running   0          2m34s
    mxnet-operator-6594fb56b-q68pp                               1/1     Running   0          2m25s
    mysql-55d57856d7-bzvgd                                       2/2     Running   0          2m25s
    notebook-controller-deployment-6cf9974cd9-2p9mj              2/2     Running   1          2m25s
    profiles-deployment-64cf74dfd4-b6dx2                         3/3     Running   1          15m
    pvcviewer-controller-controller-manager-6dd55d9dfd-m5j8s     3/3     Running   1          2m23s
    pytorch-operator-74788b9d8c-prdsb                            2/2     Running   0          2m29s
    spark-operatorsparkoperator-5775c699bb-4xgn2                 2/2     Running   0          2m27s
    tensorboard-controller-controller-manager-7f766c8676-8g6fq   3/3     Running   2          2m22s
    tensorboards-web-app-deployment-6b4dfd598c-r9xgk             1/1     Running   0          2m25s
    tf-job-operator-d8b96567b-qj48v                              2/2     Running   1          2m22s
    volumes-web-app-deployment-7b58b4c478-btfmw                  2/2     Running   0          2m24s
    workflow-controller-76579565dd-8f6vw                         2/2     Running   1          2m22s
    xgboost-operator-deployment-7dcff8bf85-t9hvr                 2/2     Running   1          2m22s
    

Summary

You have successfully installed Kubeflow.

What's Next

The next step is to integrate Rok with the Kubeflow dashboard.