Create KFP HTML Artifacts

You can use Kale to create KFP HTML artifacts, which you can use to provide rich performance evaluation metrics and figures. This guide will walk you through creating KFP HTML artifacts, to visualize through the KFP UI, using the Kale SDK.

What You’ll Need

  • An EKF or MiniKF deployment with the default Kale Docker image.
  • An understanding of how Kale SDK works.

Procedure

  1. Create a new Notebook server using the default Kale Docker image. The image will have the following naming scheme:

    gcr.io/arrikto/jupyter-kale-py38:<IMAGE_TAG>

    Note

    The <IMAGE_TAG> varies based on the MiniKF or EKF release.

  2. Connect to the server, open a terminal, and install scikit-learn and matplotlib:

    $ pip3 install --user scikit-learn==0.23.0 matplotlib==3.3.0
  3. Create a new python file and name it kale_artifacts.py:

    $ touch kale_artifacts.py
  4. Copy and paste the following code inside kale_artifacts.py:

    sdk.py
    1# Copyright © 2021-2022 Arrikto Inc. All Rights Reserved.
    2
    3"""Kale SDK.
    4
    5This script trains an ML pipeline to solve a binary classification task.
    6"""
    7
    8from kale.sdk import pipeline, step
    9from sklearn.datasets import make_classification
    10from sklearn.linear_model import LogisticRegression
    11from sklearn.model_selection import train_test_split
    12
    13
    14@step(name="data_loading")
    15def load(random_state):
    16 """Create a random dataset for binary classification."""
    17 rs = int(random_state)
    18 x, y = make_classification(random_state=rs)
    19 return x, y
    20
    21
    22@step(name="data_split")
    23def split(x, y):
    24 """Split the data into train and test sets."""
    25 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
    26 return x, x_test, y, y_test
    27
    28
    29@step(name="model_training")
    30def train(x, x_test, y, training_iterations):
    31 """Train a Logistic Regression model."""
    32 iters = int(training_iterations)
    33 model = LogisticRegression(max_iter=iters)
    34 model.fit(x, y)
    35 print(model.predict(x_test))
    36
    37
    38@pipeline(name="binary-classification", experiment="kale-tutorial")
    39def ml_pipeline(rs=42, iters=100):
    40 """Run the ML pipeline."""
    41 x, y = load(rs)
    42 x, x_test, y, y_test = split(x=x, y=y)
    43 train(x, x_test, y, training_iterations=iters)
    44
    45
    46if __name__ == "__main__":
    47 ml_pipeline(rs=42, iters=100)

    In this code sample, you start with a standard Python script that trains a Logistic Regression model. Moreover, you have decorated the functions using the Kale SDK. To read more about how to create this file, head to the corresponding Kale SDK user guide.

  5. Create a new function that will be a step in the pipeline and it will create the HTML artifact. To achieve this, decorate the function with the step and artifact decorators. The artifact decorator takes two arguments; the name of the step and the absolute path where the HTML artifact is stored. The following snippet summarizes the changes in the code:

    decorator.py
    1-# Copyright © 2021-2022 Arrikto Inc. All Rights Reserved.
    2+# Copyright © 2021 Arrikto Inc. All Rights Reserved.
    3
    4"""Kale SDK.
    5
    6This script trains an ML pipeline to solve a binary classification task.
    7"""
    8
    9-from kale.sdk import pipeline, step
    10+from kale.sdk import artifact, pipeline, step
    11from sklearn.datasets import make_classification
    12from sklearn.linear_model import LogisticRegression
    13from sklearn.model_selection import train_test_split
    14-27
    14
    15
    16@step(name="data_loading")
    17def load(random_state):
    18 """Create a random dataset for binary classification."""
    19 rs = int(random_state)
    20 x, y = make_classification(random_state=rs)
    21 return x, y
    22
    23
    24@step(name="data_split")
    25def split(x, y):
    26 """Split the data into train and test sets."""
    27 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
    28 return x, x_test, y, y_test
    29
    30
    31+@artifact(name="plot", path="/home/jovyan/plot.html")
    32+@step(name="plot_data")
    33+def plot(x, y):
    34+ """Create an HTML artifact for KFP UI."""
    35+ pass
    36+
    37+
    38@step(name="model_training")
    39def train(x, x_test, y, training_iterations):
    40 """Train a Logistic Regression model."""
    41-47
    41 iters = int(training_iterations)
    42 model = LogisticRegression(max_iter=iters)
    43 model.fit(x, y)
    44 print(model.predict(x_test))
    45
    46
    47@pipeline(name="binary-classification", experiment="kale-tutorial")
    48def ml_pipeline(rs=42, iters=100):
    49 """Run the ML pipeline."""
    50 x, y = load(rs)
    51- x, x_test, y, y_test = split(x=x, y=y)
    52- train(x, x_test, y, training_iterations=iters)
    53+ x, x_test, y, y_test = split(x, y)
    54+ train(x, x_test, y, iters)
    55
    56
    57if __name__ == "__main__":
    58 ml_pipeline(rs=42, iters=100)
  6. Create an HTML figure inside the function and save it as plot.html. Then, call the plot function inside the ml_pipelines function. The following snippet summarizes the changes in the code:

    artifacts.py
    1# Copyright © 2021 Arrikto Inc. All Rights Reserved.
    2
    3"""Kale SDK.
    4-4
    4
    5This script trains an ML pipeline to solve a binary classification task.
    6"""
    7
    8+import base64
    9+from io import BytesIO
    10+
    11+import matplotlib.pyplot as plt
    12from kale.sdk import artifact, pipeline, step
    13from sklearn.datasets import make_classification
    14from sklearn.linear_model import LogisticRegression
    15-33
    15from sklearn.model_selection import train_test_split
    16
    17
    18@step(name="data_loading")
    19def load(random_state):
    20 """Create a random dataset for binary classification."""
    21 rs = int(random_state)
    22 x, y = make_classification(random_state=rs)
    23 return x, y
    24
    25
    26@step(name="data_split")
    27def split(x, y):
    28 """Split the data into train and test sets."""
    29 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
    30 return x, x_test, y, y_test
    31
    32
    33@artifact(name="plot", path="/home/jovyan/plot.html")
    34@step(name="plot_data")
    35def plot(x, y):
    36 """Create an HTML artifact for KFP UI."""
    37- pass
    38+ fig = plt.figure()
    39+ ax = fig.add_subplot(1, 1, 1)
    40+ ax.scatter(x[:, 0], y)
    41+
    42+ tmpfile = BytesIO()
    43+ fig.savefig(tmpfile, format='png')
    44+ encoded = base64.b64encode(tmpfile.getvalue()).decode('utf-8')
    45+
    46+ html = '<img src=\'data:image/png;base64,{}\'>'.format(encoded)
    47+ with open('plot.html', 'w') as f:
    48+ f.write(html)
    49
    50
    51@step(name="model_training")
    52-61
    52def train(x, x_test, y, training_iterations):
    53 """Train a Logistic Regression model."""
    54 iters = int(training_iterations)
    55 model = LogisticRegression(max_iter=iters)
    56 model.fit(x, y)
    57 print(model.predict(x_test))
    58
    59
    60@pipeline(name="binary-classification", experiment="kale-tutorial")
    61def ml_pipeline(rs=42, iters=100):
    62 """Run the ML pipeline."""
    63 x, y = load(rs)
    64 x, x_test, y, y_test = split(x, y)
    65+ plot(x, y)
    66 train(x, x_test, y, iters)
    67
    68
    69if __name__ == "__main__":
    70 ml_pipeline(rs=42, iters=100)

    Warning

    If the path does not point to a valid file, the step will fail with an error.

    Note

    You can generate more than one artifact per step by applying the same decorator multiple times:

    @artifact(name="plot_1", path="./plot_1.html") @artifact(name="plot_2", path="./plot_2.html") @step(name="plot_data") def plot(x, y): ...

    Note

    This example assumes that you are running the Python script from your /home directory, in a Notebook server. If you change this you should also update the path argument of the artifact decorator accordingly.

  7. Run the script locally to test whether your code runs successfully using Kale’s marshalling mechanism:

    $ python3 -m kale kale_artifacts.py
  8. (Optional) Produce a workflow YAML file that you can inspect:

    $ python3 -m kale kale_artifacts.py --compile

    After the successful execution of this command, look for the workflow YAML file inside a .kale directory inside your working directory. This is a file that you could upload and submit to Kubeflow manually through its User Interface (KFP UI).

  9. Deploy and run your code as a KFP pipeline:

    $ python3 -m kale kale_artifacts.py --kfp

    Note

    To see the complete list of arguments and their respective usage run python3 -m kale --help.

  10. Navigate to the KFP UI and observe the HTML Artifact you created inside the Visualizations tab of the plot_data step:

    ../../../_images/artifacts.png

Summary

You have successfully created a KFP HTML artifact depicting a simple matplotlib figure to visualize through the KFP UI.

What’s Next

The next step is to create KFP pipelines with conditional statements.