Create Parameterized Pipelines¶

This guide will walk you through parameterizing a Kubeflow Pipeline using the Kale SDK.

Overview

What You’ll Need
Procedure
Summary
What’s Next

What You’ll Need ¶

An EKF or MiniKF deployment with the default Kale Docker image.
An understanding of how Kale SDK works.

Procedure ¶

Create a new Notebook server using the default Kale Docker image. The image will have the following naming scheme:

gcr.io/arrikto/jupyter-kale-py38:<IMAGE_TAG>

Note

The <IMAGE_TAG> varies based on the MiniKF or EKF release.
Connect to the server, open a terminal, and install scikit-learn:

$ pip3 install --user scikit-learn==0.23.0
Create a new python file and name it kale_parameters.py:

$ touch kale_parameters.py

Copy and paste the following code inside kale_parameters.py:

sdk.py

1# Copyright © 2021-2022 Arrikto Inc.  All Rights Reserved.
2
3"""Kale SDK.
4
5This script trains an ML pipeline to solve a binary classification task.
6"""
7
8from kale.sdk import pipeline, step
9from sklearn.datasets import make_classification
10from sklearn.linear_model import LogisticRegression
11from sklearn.model_selection import train_test_split
12
13
14@step(name="data_loading")
15def load(random_state):
16    """Create a random dataset for binary classification."""
17    rs = int(random_state)
18    x, y = make_classification(random_state=rs)
19    return x, y
20
21
22@step(name="data_split")
23def split(x, y):
24    """Split the data into train and test sets."""
25    x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
26    return x, x_test, y, y_test
27
28
29@step(name="model_training")
30def train(x, x_test, y, training_iterations):
31    """Train a Logistic Regression model."""
32    iters = int(training_iterations)
33    model = LogisticRegression(max_iter=iters)
34    model.fit(x, y)
35    print(model.predict(x_test))
36
37
38@pipeline(name="binary-classification", experiment="kale-tutorial")
39def ml_pipeline(rs=42, iters=100):
40    """Run the ML pipeline."""
41    x, y = load(rs)
42    x, x_test, y, y_test = split(x=x, y=y)
43    train(x, x_test, y, training_iterations=iters)
44
45
46if __name__ == "__main__":
47    ml_pipeline(rs=42, iters=100)

In this code sample, you start with a standard Python script that trains a Logistic Regression model. Moreover, you have decorated the functions using the Kale SDK. To read more about how to create this file, head to the corresponding Kale SDK user guide.

The pipeline resulting from the compilation of the this Python script will have two parameters:

rs: to pass a random seed to the dataset generator, with a default value of 42
iters: to define the number of iterations for the model, with a default value of 100

Note

You should always provide default values for the parameters. These defaults will end up in the definition of the uploaded pipeline. You can override them by calling the pipeline function with new argument values, or set different values when creating a Run from the KFP UI. Head to the KFP macros guide to learn how to provide dynamic values as input to your pipelines.

Run the script locally to test whether your code runs successfully using Kale’s marshalling mechanism:

$ python3 -m kale kale_parameters.py
(Optional) Produce a workflow YAML file that you can inspect:

$ python3 -m kale kale_parameters.py --compile

After the successful execution of this command, look for the workflow YAML file inside a .kale directory inside your working directory. This is a file that you could upload and submit to Kubeflow manually through its User Interface (KFP UI).
Deploy and run your code as a KFP pipeline:

$ python3 -m kale kale_parameters.py --kfp

Note

To see the complete list of arguments and their respective usage, run python3 -m kale --help.

Summary ¶

You have successfully created a parameterized KFP Pipeline.

What’s Next ¶

The next step is to create and log pipeline metrics.

Log KFP Metrics

Create Parameterized Pipelines¶

What You’ll Need¶

Procedure¶

Summary¶

What’s Next¶

What You’ll Need ¶

Procedure ¶

Summary ¶

What’s Next ¶