Kubeflow

This guide describes how to deploy Kubeflow alongside Rok, using installation manifests provided by Arrikto. For example:

$ git clone https://github.com/arrikto/deployments
$ cd deployments

EKF (Enterprise Kubeflow)

Configure Authentication

EKF authenticates users using OIDC. We use Dex as our default OIDC Provider and AuthService as our OIDC Client (authenticating proxy). If you have another OIDC Provider (e.g., GitLab) then you can skip installing Dex. In this section we describe how to setup authentication for EKF, using Dex and AuthService.

Specifically:

  • Change password of default user
  • Change credentials of OIDC client

By default, Dex is installed with a single static user. To change the default user’s password or create new users, one has to modify Dex’s ConfigMap. To change the password of the default user:

  1. Pick a password for the default user, with handle user, and hash it using bcrypt:

    $ python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'
    
  2. Edit kubeflow/manifests/dex-auth/dex-crds/overlays/deploy/patches/config-map.yaml and fill the relevant field with the hash of the password you chose:

    ...
      staticPasswords:
      - email: user
        hash: <enter the generated hash here>
    
  3. Generate OIDC Client credentials for the AuthService. These credentials are used by the AuthService to authenticate to Dex. The credentials must be filled in both Dex and AuthService kustomizations:

    $ export OIDC_CLIENT_ID="authservice"
    $ export OIDC_CLIENT_SECRET="$(openssl rand -base64 32)"
    $ j2 kubeflow/manifests/dex-auth/dex-crds/overlays/deploy/secret_params.env.j2 -o kubeflow/manifests/dex-auth/dex-crds/overlays/deploy/secret_params.env
    $ j2 kubeflow/manifests/istio/oidc-authservice/overlays/deploy/secret_params.env.j2 -o kubeflow/manifests/istio/oidc-authservice/overlays/deploy/secret_params.env
    
  4. Commit changes locally:

    $ git commit -am "kubeflow: Configure authentication"
    

Deploy Kubeflow

Kubeflow is deployed in a edit/commit/apply manner as well.

More specifically:

  1. Edit: Kubeflow comes with some default kustomizations. Change them by editing the deploy overlay of those kustomizations.
  2. Commit: Commit all changes to git.
  3. Apply: Apply everything to Kubernetes.

Follow the steps below to deploy Kubeflow:

  1. If you have already deployed Rok, delete Dex and AuthService, as we’re going to install them as part of Kubeflow:

    $ kubectl delete -k rok/rok-external-services/dex/overlays/deploy
    $ kubectl delete -k rok/rok-external-services/authservice/overlays/deploy
    
  2. Deploy Kubeflow with:

    $ rok-deploy --apply install/kubeflow
    

Integrate Rok with the Kubeflow Dashboard

To integrate Rok with the Kubeflow dashboard, so that you can visit it from the “Snapshot Store” tab in the Kubeflow UI, you need to:

  1. Go to the deployment repository:

    $ cd ~/ops/deployments
    
  2. Edit rok/rok-cluster/overlays/deploy/patches/configvars.yaml and add the gw.ui.kubeflow_dashboard_enabled: true config variable, like so:

    ...
    configVars:
      daemons.s3d.use_iam_role: true
      gw.ui.kubeflow_dashboard_enabled: true # <-- Copy this line.
    
  3. Commit the new option:

    $ git add rok/rok-cluster/overlays/deploy
    $ git commit -m "Enable Kubeflow dashboard integration"
    
  4. Re-apply the Rok cluster overlay:

    $ rok-deploy --apply rok/rok-cluster/overlays/deploy
    

Set up namespaces

When a user logs in Kubeflow for the first time, a Profile is created. This, in turn, leads to the automatic creation of a dedicated namespace along with some necessary resources.

Still, for the user to be able to access Rok and Kubeflow Pipelines we need to set up some extra resources.

Create Profile (optional)

The steps below are performed automatically upon user’s first login. The admin can “simulate” this procedure, i.e., create profiles manually, in case, for example, they want to setup namespaces in a bulk way. To do so, for each user:

  1. Specify the user ID, i.e., the one they will use to log in:

    $ export USER=user@example.com
    

    Note

    USER depends on the way authservice is configured, i.e., via USERID_CLAIM, and can be a username, or email.

    Note

    In case you want to create a shared namespace (see Enable namespace sharing below), the USER can have a dummy value, i.e., there is no need for having an actual user in your OIDC provider.

  2. Specify the name of the namespace that corresponds to this user. It should be a DNS-1123 compatible name, with a kubeflow- prefix:

    $ export NAMESPACE=kubeflow-${USER//[^a-zA-Z0-9\-]/-}
    
  3. Create the profile based on the given template:

    $ cd ~/ops/deployments/kubeflow/manifests
    $ mkdir -p namespace-resources/profiles
    $ j2 namespace-resources/profile.yaml.j2 -o namespace-resources/profiles/$NAMESPACE.yaml
    
  4. Commit changes:

    $ git add namespace-resources/profiles/$NAMESPACE.yaml
    $ git commit -am "Create Profile for $USER"
    
  5. Apply changes:

    $ kubectl apply -f namespace-resources/profiles/$NAMESPACE.yaml
    
  6. Wait for the namespace to be created:

    $ while ! kubectl get ns $NAMESPACE; do :; done
    
  7. Check that the necessary ServiceAccounts and RoleBindings are created:

    $ kubectl get -n $NAMESPACE serviceaccounts
    NAME              SECRETS   AGE
    default           1         19d
    default-editor    1         19d
    default-viewer    1         19d
    $ kubectl get -n $NAMESPACE rolebindings
    NAME                             AGE
    default-editor                   19d
    default-viewer                   19d
    namespaceAdmin                   19d
    

Tweak namespace (mandatory)

The remaining of this section describes how to setup existing namespaces to enable access to Rok and Kubeflow Pipelines.

Important

You need to follow these instructions, and create a new overlay, for every namespace you wish to set up.

In kubeflow/manifests/ you will find a directory called namespace-resources/. This contains a base/ kustomization, along with a kustomization.yaml.j2.

  1. Specify the namespace to be configured:

    $ export NAMESPACE=kubeflow-user-example-com
    
  2. Switch to kubeflow/manifests/ directory:

    $ cd ~/ops/deployments/kubeflow/manifests
    
  3. Create a new directory for the namespace:

    $ mkdir -p namespace-resources/overlays/$NAMESPACE
    
  4. Create the new overlay:

    $ j2 namespace-resources/kustomization.yaml.j2 -o namespace-resources/overlays/$NAMESPACE/kustomization.yaml
    
  5. Commit the changes:

    $ git add namespace-resources/overlays/$NAMESPACE
    $ git commit -m "Set up namespace '$NAMESPACE' with access to Rok and KFP"
    
  6. Apply the kustomization:

    $ rok-deploy --apply namespace-resources/overlays/$NAMESPACE
    

Enable namespace sharing

This section describes how to share a namespace among users. It will make use of namespace-permissions base kustomization, and will create an overlay using existing templates.

Important

The namespace should already exist and be configured as described in the Set up namespaces section.

Important

You need to follow the instructions below, i.e., generate, commit and apply an overlay, for every namespace and every user you wish to set up.

  1. Set the namespace to be shared, the user ID of the user to access the namespace and the desired role that the user will have inside the namespace. For example:

    $ export NAMESPACE=kubeflow-shared
    $ export USER=user1@example.com
    $ export ROLE=edit
    

    Note

    ROLE can be one of view/edit/admin.

  2. Set the name prefix for the K8s resources that will be generated:

    $ export NAME=${USER//[^a-zA-Z0-9\-]/-}-$ROLE
    $ export OVERLAY=$NAMESPACE-$NAME
    

    Note

    We need a unique, DNS-1123 compatible name (prefix) for all K8s resources that will be created. Here we replace all non-valid chars of the USER (that usually is an email) with a dash.

  3. Switch to kubeflow/manifests/ directory:

    $ cd kubeflow/manifests
    
  4. Create the new overlay:

    $ mkdir -p namespace-permissions/overlays/$OVERLAY
    $ j2 namespace-permissions/kustomization.yaml.j2 -o namespace-permissions/overlays/$OVERLAY/kustomization.yaml
    $ j2 namespace-permissions/params.env.j2 -o namespace-permissions/overlays/$OVERLAY/params.env
    
  5. Commit the changes:

    $ git add namespace-permissions/overlays/$OVERLAY
    $ git commit -m "Assign '$ROLE' access on namespace '$NAMESPACE' to user '$USER'"
    
  6. Apply the kustomization:

    $ rok-deploy --apply namespace-permissions/overlays/$OVERLAY
    

Access Private Registries

To be able to pull images from a private registry, first create a Secret following the Official Kubernetes Guide.

To enable Jupyter Web App to create Notebooks using images from a private registry, create a PodDefault using the previously created Secret as imagePullSecret like:

apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: access-prv-registry
spec:
  desc: Allow access to private registry
  selector:
    matchLabels:
      registry-pull-secret: "true"
  imagePullSecrets:
  - name: <secret_name>

Notice the selector.matchLabels field, this PodDefault will be applied to every Pod that contains the label registry-pull-secret: "true" in its spec. Jupyter Web App will now show this new PodDefault in the “Configurations” section. In case you want to make the PodDefault selected by default, edit kubeflow/manifests/jupyter/jupyter-web-app/overlays/deploy/patches/config-map.yaml and append the above label, e.g., registry-pull-secret to the existing spawnerFormDefaults.configurations.value list.

Then commit and apply the changes:

$ git commit -am "Allow pulling private images when creating Notebooks"
$ rok-deploy --apply kubeflow/manifests/jupyter/jupyter-web-app/overlays/deploy

Finally restart the JWA pod so that it “sees” the change in the jupyter-web-app-config ConfigMap:

$ kubectl delete pods -n kubeflow -l app.kubernetes.io/name=jupyter-web-app

Make sure you also commit both Secret and PodDefault in the GitOps repo.

Verify

Congratulations, you have just deployed Kubeflow! Please follow the sections below to ensure that all Kubeflow services are running as expected. To do so we will guide you to run end-to-end data science workflows with Kale. Kale is the workflow tool that allows to orchestrate Kubeflow pipelines, starting from a Jupyter notebook.

Dashboard access

Ensure that Kubeflow dashboard is working:

  1. Navigate to https://demo.example.com.
  2. Log in to the dashboard as user with the password you generated in the Configure Authentication section.
  3. Verify you are redirected to the Kubeflow dashboard.

Notebook to Pipeline using Kale

Running this workflow will validate that:

  • You can self-serve Jupyter Notebooks.
  • Kale can communicate with Kubeflow and Rok.
  • Kale can create Kubeflow pipelines.
  • Rok can take snapshots of notebooks and pipeline steps.
  • You can exploit Rok snapshots to recover a notebook.
  1. Follow the Run a Pipeline section of our Kale tutorial to run a pipeline from inside your notebook.
  2. Follow the Reproducibility with Volume Snapshots section to reproduce a pipeline step using volume snapshots, inspect its state, debug it and finally fix the code.

Hyperparameter Tuning with Kale and Katib

Running this workflow will validate that:

  • Katib has been deployed successfully and works as expected.
  • Kale can create and submit a Katib experiment.
  • Repeatable pipeline steps can be cached, using Rok’s PVC caching mechanism.

Head over to the Run a pipeline from inside your notebook section of our DogBreed tutorial and follow the instructions.

Recurring pipeline

Running this workflow will validate that:

  • You can create recurring Kubeflow pipelines.
  • You can manage your recurring jobs from the Kubeflow UI.

This section will guide you through how to setup a recurring run using a simple notebook that consists of three steps that simply run dd on local files. This will simulate an I/O intensive pipeline that will try also stress the Rok snapshot mechanism.

  1. Navigate to the Notebooks dashboard.

  2. Create a notebook server using the default configuration.

  3. Connect to the Lab and open a terminal

  4. Download the example notebook:

    $ wget <download_root>/kubeflow-validation.ipynb
    
  5. Click on the downloaded file to open the notebook.

  6. Enable Kale by clicking on the left sidebar the corresponding icon.

  7. Click Compile and Run and see the notebook get snapshotted, a pipeline get created and a run started.

  8. Then navigate to Jobs dashboard.

  9. Click on Create run.

  10. Choose the pipeline that Kale just created.

  11. Pick a name for the run.

    Note

    This name will be used for naming generated workflows.

  12. Choose the experiment name.

    Note

    The name of the experiment will be used by Kale for creating Rok buckets.

  13. For Run type choose Recurring.

  14. For Trigger type choose Periodic.

  15. Set Maximum cuncurrent runs to 1 to avoid having more that one parallel runs.

  16. Un-check the Catchup checkbox to disable any backfilling in case the job gets paused.

  17. Run every 15 Minutes.

  18. Click on Start.

  19. Go to Runs/Trials -> Pipelines dashboard and see your recurring runs get started.

    Note

    The first run will be created after the given interval has passed.

Tweak Kubeflow

In this section, we document common tweaks needed in a Kubeflow deployment.

Extend the list of default images of Jupyter Web App

  1. Go to the deployment repository:

    $ cd ~/ops/deployments
    
  2. Edit the Jupyter Web App config kubeflow/manifests/jupyter/jupyter-web-app/overlays/deploy/patches/config-map.yaml and add your custom image to the Jupyter Web App configuration:

    data:
      spawner_ui_config.yaml: |
        spawnerFormDefaults:
          image:
            # The container Image for the user's Jupyter Notebook
            # If readonly, this value must be a member of the list below
            value: <your_custom_image>
            # The list of available standard container Images
            options:
              - <your_custom_image>
              - gcr.io/arrikto/jupyter-kale:4a5d7e63-9f74f267
    
  3. Commit the new configuration:

    $ git add kubeflow/manifests/jupyter/jupyter-web-app/overlays/deploy/patches/config-map.yaml
    $ git commit -m "kubeflow: Update Jupyter Web App image list"
    
  4. Follow the Upgrade guide to re-apply Kubeflow.

Tweak Dex

In this section we describe how to tweak the default Dex deployment (see also the corresponding section in official Kubeflow docs). Specifically:

  • change password of default user
  • change frontend theme and issuer
  • change credentials of OIDC client

By default, Dex is installed with a single static user. To change its password or create new users, one has to modify the ConfigMap patch.

Note

Here we avoid modifying params.env and use variable substitution inside the ConfigMap, since this limits us to a single user.

First pick a password and hash it, and a UUID:

$ python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'
$ cat /proc/sys/kernel/random/uuid

Edit kubeflow/manifests/dex-auth/dex-crds/overlays/deploy/patches/config-map.yaml and:

apiVersion: v1 kind: ConfigMap
metadata:
  name: dex
data:
   ...
  staticPasswords:
  - email: user
    hash: <enter the generated hash here>
    username: user
    userID: <enter the random UUID here>

Change the default frontend theme and issuer. Edit the configmap again and:

apiVersion: v1
kind: ConfigMap
metadata:
  name: dex
data:
  config.yaml: |
    frontend:
      dir: /arrikto_web
      issuer: Kubeflow
      theme: ekf

Finally, to change the OIDC client credentials, edit kubeflow/manifests/dex-auth/dex-crds/overlays/deploy/secret_params.env and:

OIDC_CLIENT_ID=kubeflow-oidc-authservice
OIDC_CLIENT_SECRET=pUBnBOY80SnXgjibTYM9ZWNzY2xreNGQok

Important

If one changes the default values, one should update the corresponding secret of oidc-authservice component, i.e., match with kubeflow/manifests/istio/oidc-authservice/overlays/deploy/secret_params.env

Commit and apply changes:

$ git commit -am "dex: Change default user, OIDC client creds and theme"
$ rok-deploy --apply kubeflow/manifests/dex-auth/dex-crds/overlays/deploy

For changes to take effect we have to restart the pods manually:

$ kubectl delete pods -n auth -l app=dex
$ kubectl delete pods -n istio-system -l app=authservice

Upgrade

Kubeflow manifests are upgraded in a fetch/rebase/apply manner.

To upgrade Kubeflow:

  1. Follow the Upgrade manifests guide

  2. Re-apply the manifests:

    $ rok-deploy --apply install/kubeflow
    
  3. Finally, make sure to validate the updated Kubeflow deployment by following the Verify section.

Cleanup

To delete a Kubeflow deployment:

  1. Go to your deployments repository:

    $ cd ~/ops/deployments/
    
  2. Purge your Kubeflow installation:

    $ kustomize build install/kubeflow | kubectl delete -f -
    

    Important

    This will delete all Kubeflow related resources including the kubeflow namespace and all user namespaces, i.e., kubeflow-XXX.

    Note

    If you come across the following error about a resource X, you may ignore it:

    Error from server (NotFound): error when deleting "STDIN": ...
    

    This occurs when the deletion of some other resource triggers the deletion of X prior to the kubectl delete.

    Note

    If a user namespace, e.g., kubeflow-user is stuck in a Terminating phase, please take a look at our Troubleshooting FAQ which may contain useful information on how to proceed.