Create KFP HTML Artifacts¶
You can use Kale to create KFP HTML artifacts, which you can use to provide rich performance evaluation metrics and figures. This guide will walk you through creating KFP HTML artifacts, to visualize through the KFP UI, using the Kale SDK.
Overview
What You’ll Need¶
- An EKF or MiniKF deployment with the default Kale Docker image.
- An understanding of how Kale SDK works.
Procedure¶
Create a new Notebook server using the default Kale Docker image. The image will have the following naming scheme:
gcr.io/arrikto/jupyter-kale-py38:<IMAGE_TAG>Note
The
<IMAGE_TAG>
varies based on the MiniKF or EKF release.Connect to the server, open a terminal, and install
scikit-learn
andmatplotlib
:$ pip3 install --user scikit-learn==0.23.0 matplotlib==3.3.0Create a new python file and name it
kale_artifacts.py
:$ touch kale_artifacts.pyCopy and paste the following code inside
kale_artifacts.py
:sdk.py1 # Copyright © 2021-2022 Arrikto Inc. All Rights Reserved. 2 3 """Kale SDK. 4 5 This script trains an ML pipeline to solve a binary classification task. 6 """ 7 8 from kale.sdk import pipeline, step 9 from sklearn.datasets import make_classification 10 from sklearn.linear_model import LogisticRegression 11 from sklearn.model_selection import train_test_split 12 13 14 @step(name="data_loading") 15 def load(random_state): 16 """Create a random dataset for binary classification.""" 17 rs = int(random_state) 18 x, y = make_classification(random_state=rs) 19 return x, y 20 21 22 @step(name="data_split") 23 def split(x, y): 24 """Split the data into train and test sets.""" 25 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1) 26 return x, x_test, y, y_test 27 28 29 @step(name="model_training") 30 def train(x, x_test, y, training_iterations): 31 """Train a Logistic Regression model.""" 32 iters = int(training_iterations) 33 model = LogisticRegression(max_iter=iters) 34 model.fit(x, y) 35 print(model.predict(x_test)) 36 37 38 @pipeline(name="binary-classification", experiment="kale-tutorial") 39 def ml_pipeline(rs=42, iters=100): 40 """Run the ML pipeline.""" 41 x, y = load(rs) 42 x, x_test, y, y_test = split(x=x, y=y) 43 train(x, x_test, y, training_iterations=iters) 44 45 46 if __name__ == "__main__": 47 ml_pipeline(rs=42, iters=100) In this code sample, you start with a standard Python script that trains a Logistic Regression model. Moreover, you have decorated the functions using the Kale SDK. To read more about how to create this file, head to the corresponding Kale SDK user guide.
Create a new function that will be a step in the pipeline and it will create the HTML artifact. To achieve this, decorate the function with the
step
andartifact
decorators. Theartifact
decorator takes two arguments; the name of the step and the absolute path where the HTML artifact is stored. The following snippet summarizes the changes in the code:decorator.py1 - # Copyright © 2021-2022 Arrikto Inc. All Rights Reserved. 2 + # Copyright © 2021 Arrikto Inc. All Rights Reserved. 3 4 """Kale SDK. 5 6 This script trains an ML pipeline to solve a binary classification task. 7 """ 8 9 - from kale.sdk import pipeline, step 10 + from kale.sdk import artifact, pipeline, step 11 from sklearn.datasets import make_classification 12 from sklearn.linear_model import LogisticRegression 13 from sklearn.model_selection import train_test_split 14-27 14 15 16 @step(name="data_loading") 17 def load(random_state): 18 """Create a random dataset for binary classification.""" 19 rs = int(random_state) 20 x, y = make_classification(random_state=rs) 21 return x, y 22 23 24 @step(name="data_split") 25 def split(x, y): 26 """Split the data into train and test sets.""" 27 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1) 28 return x, x_test, y, y_test 29 30 31 + @artifact(name="plot", path="/home/jovyan/plot.html") 32 + @step(name="plot_data") 33 + def plot(x, y): 34 + """Create an HTML artifact for KFP UI.""" 35 + pass 36 + 37 + 38 @step(name="model_training") 39 def train(x, x_test, y, training_iterations): 40 """Train a Logistic Regression model.""" 41-47 41 iters = int(training_iterations) 42 model = LogisticRegression(max_iter=iters) 43 model.fit(x, y) 44 print(model.predict(x_test)) 45 46 47 @pipeline(name="binary-classification", experiment="kale-tutorial") 48 def ml_pipeline(rs=42, iters=100): 49 """Run the ML pipeline.""" 50 x, y = load(rs) 51 - x, x_test, y, y_test = split(x=x, y=y) 52 - train(x, x_test, y, training_iterations=iters) 53 + x, x_test, y, y_test = split(x, y) 54 + train(x, x_test, y, iters) 55 56 57 if __name__ == "__main__": 58 ml_pipeline(rs=42, iters=100) Create an HTML figure inside the function and save it as
plot.html
. Then, call theplot
function inside theml_pipelines
function. The following snippet summarizes the changes in the code:artifacts.py1 # Copyright © 2021 Arrikto Inc. All Rights Reserved. 2 3 """Kale SDK. 4-4 4 5 This script trains an ML pipeline to solve a binary classification task. 6 """ 7 8 + import base64 9 + from io import BytesIO 10 + 11 + import matplotlib.pyplot as plt 12 from kale.sdk import artifact, pipeline, step 13 from sklearn.datasets import make_classification 14 from sklearn.linear_model import LogisticRegression 15-33 15 from sklearn.model_selection import train_test_split 16 17 18 @step(name="data_loading") 19 def load(random_state): 20 """Create a random dataset for binary classification.""" 21 rs = int(random_state) 22 x, y = make_classification(random_state=rs) 23 return x, y 24 25 26 @step(name="data_split") 27 def split(x, y): 28 """Split the data into train and test sets.""" 29 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1) 30 return x, x_test, y, y_test 31 32 33 @artifact(name="plot", path="/home/jovyan/plot.html") 34 @step(name="plot_data") 35 def plot(x, y): 36 """Create an HTML artifact for KFP UI.""" 37 - pass 38 + fig = plt.figure() 39 + ax = fig.add_subplot(1, 1, 1) 40 + ax.scatter(x[:, 0], y) 41 + 42 + tmpfile = BytesIO() 43 + fig.savefig(tmpfile, format='png') 44 + encoded = base64.b64encode(tmpfile.getvalue()).decode('utf-8') 45 + 46 + html = '<img src=\'data:image/png;base64,{}\'>'.format(encoded) 47 + with open('plot.html', 'w') as f: 48 + f.write(html) 49 50 51 @step(name="model_training") 52-61 52 def train(x, x_test, y, training_iterations): 53 """Train a Logistic Regression model.""" 54 iters = int(training_iterations) 55 model = LogisticRegression(max_iter=iters) 56 model.fit(x, y) 57 print(model.predict(x_test)) 58 59 60 @pipeline(name="binary-classification", experiment="kale-tutorial") 61 def ml_pipeline(rs=42, iters=100): 62 """Run the ML pipeline.""" 63 x, y = load(rs) 64 x, x_test, y, y_test = split(x, y) 65 + plot(x, y) 66 train(x, x_test, y, iters) 67 68 69 if __name__ == "__main__": 70 ml_pipeline(rs=42, iters=100) Warning
If the path does not point to a valid file, the step will fail with an error.
Note
You can generate more than one artifact per step by applying the same decorator multiple times:
@artifact(name="plot_1", path="./plot_1.html") @artifact(name="plot_2", path="./plot_2.html") @step(name="plot_data") def plot(x, y): ...Note
This example assumes that you are running the Python script from your
/home
directory, in a Notebook server. If you change this you should also update thepath
argument of theartifact
decorator accordingly.Run the script locally to test whether your code runs successfully using Kale’s marshalling mechanism:
$ python3 -m kale kale_artifacts.py(Optional) Produce a workflow YAML file that you can inspect:
$ python3 -m kale kale_artifacts.py --compileAfter the successful execution of this command, look for the workflow YAML file inside a
.kale
directory inside your working directory. This is a file that you could upload and submit to Kubeflow manually through its User Interface (KFP UI).Deploy and run your code as a KFP pipeline:
$ python3 -m kale kale_artifacts.py --kfpNote
To see the complete list of arguments and their respective usage run
python3 -m kale --help
.Navigate to the KFP UI and observe the HTML Artifact you created inside the Visualizations tab of the
plot_data
step:
Summary¶
You have successfully created a KFP HTML artifact depicting a simple
matplotlib
figure to visualize through the KFP UI.
What’s Next¶
The next step is to create KFP pipelines with conditional statements.