Kale Notebook Cell Types

To create a Kubeflow Pipeline (KFP) from a Jupyter Notebook using Kale, annotate the cells of your notebook selecting from six Kale cell types. Some of the cell types require a small number of parameters.

Kale uses the annotations you supply to define a Kubeflow pipeline. Each step of the pipeline will run in its own container in a Kubernetes deployment. The annotations you apply to cells in your notebook enable Kale to manage dependencies for each step and marshal data correctly as inputs and outputs for each step of a pipeline. See below for the list of cell types and a brief summary of each.

Cell type Cell should contain
Imports Blocks of code that import other modules your machine learning pipeline requires and may be needed by more than one step.
Functions Functions used later in your machine learning pipeline; global variable definitions (other than pipeline parameters); and code that initializes lists, dictionaries, objects, and other values used throughout your pipeline.
Pipeline Parameters Definitions for global variables used to parameterize your machine learning workflow. These are often training hyperparameters.
Pipeline Metrics Lines of code that log or print values used to measure the success of your model.
Pipeline Step Code that implements the core logic of a discrete step in your workflow.
Skip Cell Any code that you want Kale to ignore.

Imports Cells

Annotate notebook cells with the label Imports to identify blocks of code that import other modules your machine learning pipeline requires.

Purpose

Imports cells help Kale identify all dependencies for pipeline steps. Kale prepends the code in Imports cells to the code specific to a pipeline step in the execution environment it creates for that step. See How Kale Creates a Pipeline Step for more detail.

Annotate Imports Cells

To annotate imports, edit the first cell containing import statements by clicking the pencil icon in the upper right corner and select Cell type > Imports.

../../../_images/imports-cell.png

Note

If you don’t see the pencil icon, please enable Kale from the Kale Deployment Panel.

Functions Cells

Annotate notebook cells with the label Functions to identify blocks of code containing:

  • Functions used later in your machine learning pipeline.
  • Global variable definitions (other than pipeline parameters) and code that initializes lists, dictionaries, objects, and other values used throughout your pipeline.

Note

Though pipeline parameters are often written as global variables, you should annotate pipeline parameters using the Pipeline Parameters label. This will enable Kale to configure the Kubeflow pipeline it defines with the appropriate input parameters.

Purpose

Functions cells help Kale identify all dependencies for pipeline steps. Kale creates pipeline steps by prepending Imports cells followed by Functions cells to the code specific to a pipeline step in the execution environment it creates for that step. See How Kale Creates a Pipeline Step for more detail.

Annotate Functions Cells

To identify functions, global variable declarations, and initialization code, edit the first cell in a block containing this code by clicking the pencil icon in the upper right corner and select Cell type > Functions.

Note

If you don’t see the pencil icon, please enable Kale from the Kale Deployment Panel.

../../../_images/functions-cell.png

Pipeline Parameters Cells

Annotate notebook cells as Pipeline Parameters to identify blocks of code that define global variables used as inputs to specify some controls on the operation of a machine learning pipeline. These should be values that you might experiment with as you evaluate the relative performance of a pipeline run with different settings.

Purpose

Kale uses the values in Pipeline Parameters cells to define Kubeflow Pipeline (KFP) PipelineParam objects and initializes the KF Pipeline with these parameters. KFP includes pipeline parameters values in the artifacts it creates for pipeline runs to facilitate review of results from experiments comparing multiple runs of a pipeline.

Annotate Pipeline Parameters Cells

To annotate pipeline parameters, edit the first cell containing pipeline parameters by clicking the pencil icon in the upper right corner and select Cell type > Pipeline Parameters.

Note

If you don’t see the pencil icon, please enable Kale from the Kale Deployment Panel.

../../../_images/pipeline-parameters-cell.png

Pipeline Step Cells

Annotate notebook cells with the label Pipeline Step to identify code that implements one of the main components or tasks of a machine learning workflow. A pipeline step typically represents a milestone in data preparation, training, evaluation, tuning, prediction or other phases of a workflow.

Kale creates pipeline steps by prepending Imports cells followed by Functions cells to cells annotated for a particular Pipeline Step. These cells together comprise the code Kale uses in the execution environment it creates for a pipeline step. See How Kale Creates a Pipeline Step for more detail.

Annotate Pipeline Step Cells

To identify code that implements a step in a machine learning workflow:

  1. Edit the first cell containing this code by clicking the pencil icon in the upper right corner and select Cell type > Pipeline Step.

    Note

    If you don’t see the pencil icon, please enable Kale from the Kale Deployment Panel.

  2. Specify a unique step name.

  3. (Optional) Select one or more steps that the step depends on.

  4. (Optional) Specify that this step should run on a GPU node.

../../../_images/pipeline-step-cell.png

Step name Parameter

Step name is the label by which you reference a step in a pipeline. As the step name, create a label that is unique and descriptive. You will use this name as a reference as you define dependency relationships between steps in your pipeline.

Note

The step name must consist of only lowercase alphanumeric characters or '_'. The first character must be a lowercase letter.

../../../_images/enter-step-name.png

Depends on Parameter

The values you select for Depends on list the other steps that must execute before the step you are annotating.

../../../_images/depends-on-parameter.png

To add dependencies, use the Depends on pull-down menu to select each step whose output will serve as input for the step you are annotating.

In the example below, since the step eval_custom evaluates the model created in the step custom_classifier, we select that step from the Depends on pull-down menu.

../../../_images/select-dependency.png

When selecting steps using the Depends on pull-down menu, identify only steps that are immediate dependencies. Do not include all dependencies back through the machine learning pipeline.

Together, the dependencies for all steps in a pipeline, define the execution graph for that pipeline. This helps Kale determine, for example, whether there are branches of your pipeline that can run in parallel.

../../../_images/execution-graph.png

Specify Multiple Dependencies: A given step may depend on the outputs from more than one other step. The Depends on pull-down menu enables you to select as many other steps as necessary. Select each dependency one at a time.

Remove Dependencies: To remove a dependency already selected, select the name of that step again from the Depends on pull-down menu. The items in this menu function as toggles for specifying other steps as dependencies.

GPU parameter

Click the GPU button when annotating a step, to require that step to run on a GPU. In the modal that appears, enable this requirement using the toggle and specify a number of GPUs and the type of GPU requested.

../../../_images/gpu-parameter.png

Parallel Pipeline Steps

Pipeline steps that are independent of one another can run in parallel. For example, the two steps represented below each depend on a step named process_data, but are otherwise independent. Kale uses the dependency graph reflected in the way you define pipeline steps to orchestrate pipeline runs, taking advantage of your Kubernetes infrastructure to run a pipeline as efficiently as possible.

../../../_images/parallel-pipeline-steps.png

The step vgg16_classifier can run in parallel with the step custom_classifier.

../../../_images/parallel-step.png

Skip Cells

Use Skip to annotate notebook cells that you want Kale to ignore as it defines a Kubeflow pipeline.

Purpose

Common uses of the Skip annotation include identifying console logging and other diagnostic code useful in developing a step of pipeline, but which is not part of your machine learning workflow.

Annotate Skip Cells

To annotate skip cells, edit the first cell containing code you want Kale to ignore by clicking the pencil icon in the upper right corner and select Cell type > Skip Cell.

../../../_images/skip-cell.png

Pipeline Metrics Cells

Annotate a notebook cell with the label Pipeline Metrics to identify code that outputs the results you want to evaluate for a pipeline run.

Purpose

Based on the variables referenced in a Pipeline Metrics cell, Kale will define pipeline metrics that the Kubeflow Pipelines (KFP) system will produce for every pipeline run. In addition, Kale will associate each one of these metrics to the steps that produced them. Tracking pipeline metrics is essential to evaluating performance across multiple runs of a pipeline that have been parameterized differently or modified while still in the experimental phase of developing a model.

Pipeline metrics are also key to the AutoML capabilities of Kubeflow and Kale. For example, you will need to choose a single pipeline metric as the search objective metric for hyperparameter tuning experiments.

Annotate Pipeline Metrics Cell

Note

Pipeline metrics should be considered the result of pipeline execution, not the result of an individual step. You should only annotate one cell with Pipeline Metrics and that cell should be the last cell in your notebook.

To identify pipeline metrics, edit the cell containing pipeline metrics statements by clicking the pencil icon in the upper right corner and select Cell type > Pipeline Metrics.

Note

If you don’t see the pencil icon, please enable Kale from the Kale Deployment Panel.

../../../_images/pipeline-metrics-cell.png