Configure Notebook Culling

The Notebook Controller periodically checks for the state of every Notebook Server. You can inspect the Last activity of each Notebook Server listed on a column on the Notebooks UI. According to the execution state of the kernels, the Notebook Controller updates the respective notebooks.kubeflow.org/last-activity annotation of each Notebook CR (Custom Resource). When at least one kernel is busy performing computations, then the Last activity of the Notebook Server will be the current time. When none of the kernels are performing computations, then the Last activity will be the time that the last kernel completed its computations.

The culling feature allows you to stop a Notebook Server based on its Last activity. The following table lists the parameters you can define in the ConfigMap. These parameters will enforce specific values to the environment variables for the Notebook Controller. This way you can form a culling policy.

Culling Policy Parameters
Parameter Name Default Value Description
ENABLE_CULLING “false” If set to true then the Notebook Controller will scale to zero all Notebooks with Last activity older than the CULL_IDLE_TIME.
CULL_IDLE_TIME “1440” (minutes) If a Notebook’s age from the Last activity until the current timestamp exceeds this value, then the Notebook will be scaled to zero (culled). ENABLE_CULLING must be set to “true” for this setting to take effect.
IDLENESS_CHECK_PERIOD “1” (minutes) How frequently the controller should poll each Notebook to update its Last activity.

If you have enabled culling and the Last activity of a Notebook Server has expired, then the Notebook Controller will cull this Notebook Server.

Note

This means that the Notebook Server will stop. The Notebook Server will not get deleted and the PVCs will not be affected. When starting their Notebooks again, the users can resume their work without any data loss.

This guide will walk you through setting a culling policy for your Notebook Controller.

Procedure

  1. Go to your GitOps repository, inside your rok-tools management environment:

    root@rok-tools:~# cd ~/ops/deployments
  2. Uncomment the following line in the kubeflow/manifests/apps/jupyter/notebook-controller/upstream/overlays/deploy/kustomization.yaml:

    patchesStrategicMerge: [...] #- patches/culler-config-map.yaml # <-- Uncomment this line to enable culling.
  3. Edit the Notebook Controller config at kubeflow/manifests/apps/jupyter/notebook-controller/upstream/overlays/deploy/patches/culler-config-map.yaml and set the values for the parameters:

    apiVersion: v1 kind: ConfigMap metadata: name: config data: ENABLE_CULLING: "true" # <-- Update this line with your desired value. CULL_IDLE_TIME: "30" # <-- Update this line with your desired value. IDLENESS_CHECK_PERIOD: "1" # <-- Update this line with your desired value.
  4. Commit your changes:

    root@rok-tools:~/ops/deployments# git commit -am "kubeflow: Configure Notebook Culling"
  5. Reapply the kustomization:

    root@rok-tools:~/ops/deployments# rok-deploy --apply kubeflow/manifests/apps/jupyter/notebook-controller/upstream/overlays/deploy

Verify

  1. Get the Notebook Controller pod name:

    root@rok-tools:~# export POD=$(kubectl get pod -n kubeflow \ > -l app=notebook-controller -o jsonpath="{.items[0].metadata.name}") \ > && echo ${POD} notebook-controller-deployment-54884d6854-gzs2r
  2. Get the environment variables of the Notebook Controller container:

    root@rok-tools:~# kubectl exec -n kubeflow ${POD} -c manager -- printenv | \ > grep -E "IDLE|CULL" ENABLE_CULLING=true CULL_IDLE_TIME=30 IDLENESS_CHECK_PERIOD=1

    Note

    Make sure the above environment variables have the values you defined previously.

Summary

You have successfully configured your Notebook Culling policy.

What’s Next

Check out the rest of the operations you can perform on your Kubeflow deployment.