Log KFP Metrics¶
This guide will walk you through creating and logging KFP metrics to evaluate and compare your Kubeflow Pipeline runs.
Overview
What You’ll Need¶
- An EKF or MiniKF deployment with the default Kale Docker image.
- An understanding of how Kale SDK works.
Procedure¶
Create a new Notebook server using the default Kale Docker image. The image will have the following naming scheme:
gcr.io/arrikto/jupyter-kale-py38:<IMAGE_TAG>Note
The
<IMAGE_TAG>
varies based on the MiniKF or EKF release.Connect to the server, open a terminal, and install
scikit-learn
:$ pip3 install --user scikit-learn==0.23.0Create a new python file and name it
kale_metrics.py
:$ touch kale_metrics.pyCopy and paste the following code inside
kale_metrics.py
:sdk.py1 # Copyright © 2021-2022 Arrikto Inc. All Rights Reserved. 2 3 """Kale SDK. 4 5 This script trains an ML pipeline to solve a binary classification task. 6 """ 7 8 from kale.sdk import pipeline, step 9 from sklearn.datasets import make_classification 10 from sklearn.linear_model import LogisticRegression 11 from sklearn.model_selection import train_test_split 12 13 14 @step(name="data_loading") 15 def load(random_state): 16 """Create a random dataset for binary classification.""" 17 rs = int(random_state) 18 x, y = make_classification(random_state=rs) 19 return x, y 20 21 22 @step(name="data_split") 23 def split(x, y): 24 """Split the data into train and test sets.""" 25 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1) 26 return x, x_test, y, y_test 27 28 29 @step(name="model_training") 30 def train(x, x_test, y, training_iterations): 31 """Train a Logistic Regression model.""" 32 iters = int(training_iterations) 33 model = LogisticRegression(max_iter=iters) 34 model.fit(x, y) 35 print(model.predict(x_test)) 36 37 38 @pipeline(name="binary-classification", experiment="kale-tutorial") 39 def ml_pipeline(rs=42, iters=100): 40 """Run the ML pipeline.""" 41 x, y = load(rs) 42 x, x_test, y, y_test = split(x=x, y=y) 43 train(x, x_test, y, training_iterations=iters) 44 45 46 if __name__ == "__main__": 47 ml_pipeline(rs=42, iters=100) In this code sample, you start with a standard Python script that trains a Logistic Regression model. Moreover, you have decorated the functions using the Kale SDK. To read more about how to create this file, head to the corresponding Kale SDK user guide.
Create a function that logs a metric and decorate it with the
has_metrics
decorator. Kale needs this decorator to know what steps will produce KFP metrics during compilation time. The following snippet summarizes the changes in code:hasmetrics.py1 - # Copyright © 2021-2022 Arrikto Inc. All Rights Reserved. 2 + # Copyright © 2021 Arrikto Inc. All Rights Reserved. 3 4 """Kale SDK. 5 6 This script trains an ML pipeline to solve a binary classification task. 7 """ 8 9 - from kale.sdk import pipeline, step 10 + from kale.sdk import has_metrics, pipeline, step 11 from sklearn.datasets import make_classification 12 from sklearn.linear_model import LogisticRegression 13 from sklearn.model_selection import train_test_split 14-36 14 15 16 @step(name="data_loading") 17 def load(random_state): 18 """Create a random dataset for binary classification.""" 19 rs = int(random_state) 20 x, y = make_classification(random_state=rs) 21 return x, y 22 23 24 @step(name="data_split") 25 def split(x, y): 26 """Split the data into train and test sets.""" 27 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1) 28 return x, x_test, y, y_test 29 30 31 @step(name="model_training") 32 def train(x, x_test, y, training_iterations): 33 """Train a Logistic Regression model.""" 34 iters = int(training_iterations) 35 model = LogisticRegression(max_iter=iters) 36 model.fit(x, y) 37 print(model.predict(x_test)) 38 39 40 + @has_metrics 41 + @step(name="model_evaluation") 42 + def evaluate(model, x_test, y_test): 43 + """Evaluate the model on the test dataset.""" 44 + pass 45 + 46 + 47 @pipeline(name="binary-classification", experiment="kale-tutorial") 48 def ml_pipeline(rs=42, iters=100): 49 """Run the ML pipeline.""" 50 x, y = load(rs) 51 - x, x_test, y, y_test = split(x=x, y=y) 52 - train(x, x_test, y, training_iterations=iters) 53 + x, x_test, y, y_test = split(x, y) 54 + train(x, x_test, y, iters) 55 56 57 if __name__ == "__main__": 58 ml_pipeline(rs=42, iters=100) Important
Make sure to apply
@has_metrics
on top of@step
, not the other way around.Use the
log_metric
function to log a metric during the step execution. The following snippet summarizes the changes in the code:log.py1 # Copyright © 2021 Arrikto Inc. All Rights Reserved. 2 3 """Kale SDK. 4-5 4 5 This script trains an ML pipeline to solve a binary classification task. 6 """ 7 8 from kale.sdk import has_metrics, pipeline, step 9 + from kale.sdk.logging import log_metric 10 from sklearn.datasets import make_classification 11 from sklearn.linear_model import LogisticRegression 12 + from sklearn.metrics import accuracy_score 13 from sklearn.model_selection import train_test_split 14 15 16-40 16 @step(name="data_loading") 17 def load(random_state): 18 """Create a random dataset for binary classification.""" 19 rs = int(random_state) 20 x, y = make_classification(random_state=rs) 21 return x, y 22 23 24 @step(name="data_split") 25 def split(x, y): 26 """Split the data into train and test sets.""" 27 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1) 28 return x, x_test, y, y_test 29 30 31 @step(name="model_training") 32 def train(x, x_test, y, training_iterations): 33 """Train a Logistic Regression model.""" 34 iters = int(training_iterations) 35 model = LogisticRegression(max_iter=iters) 36 model.fit(x, y) 37 print(model.predict(x_test)) 38 39 40 @has_metrics 41 @step(name="model_evaluation") 42 def evaluate(model, x_test, y_test): 43 """Evaluate the model on the test dataset.""" 44 - pass 45 + y_pred = model.predict(x_test) 46 + accuracy = accuracy_score(y_test, y_pred) 47 + log_metric(name="accuracy", value=accuracy) 48 49 50 @pipeline(name="binary-classification", experiment="kale-tutorial") 51-56 51 def ml_pipeline(rs=42, iters=100): 52 """Run the ML pipeline.""" 53 x, y = load(rs) 54 x, x_test, y, y_test = split(x, y) 55 train(x, x_test, y, iters) 56 57 58 if __name__ == "__main__": 59 ml_pipeline(rs=42, iters=100) Return the model from the
train
function and call theevaluate
function insideml_pipeline
with the right arguments. The following snippet summarizes the changes in the code:metrics.py1 # Copyright © 2021 Arrikto Inc. All Rights Reserved. 2 3 """Kale SDK. 4-33 4 5 This script trains an ML pipeline to solve a binary classification task. 6 """ 7 8 from kale.sdk import has_metrics, pipeline, step 9 from kale.sdk.logging import log_metric 10 from sklearn.datasets import make_classification 11 from sklearn.linear_model import LogisticRegression 12 from sklearn.metrics import accuracy_score 13 from sklearn.model_selection import train_test_split 14 15 16 @step(name="data_loading") 17 def load(random_state): 18 """Create a random dataset for binary classification.""" 19 rs = int(random_state) 20 x, y = make_classification(random_state=rs) 21 return x, y 22 23 24 @step(name="data_split") 25 def split(x, y): 26 """Split the data into train and test sets.""" 27 x, x_test, y, y_test = train_test_split(x, y, test_size=0.1) 28 return x, x_test, y, y_test 29 30 31 @step(name="model_training") 32 def train(x, x_test, y, training_iterations): 33 """Train a Logistic Regression model.""" 34 iters = int(training_iterations) 35 model = LogisticRegression(max_iter=iters) 36 model.fit(x, y) 37 - print(model.predict(x_test)) 38 + return model 39 40 41 @has_metrics 42-51 42 @step(name="model_evaluation") 43 def evaluate(model, x_test, y_test): 44 """Evaluate the model on the test dataset.""" 45 y_pred = model.predict(x_test) 46 accuracy = accuracy_score(y_test, y_pred) 47 log_metric(name="accuracy", value=accuracy) 48 49 50 @pipeline(name="binary-classification", experiment="kale-tutorial") 51 def ml_pipeline(rs=42, iters=100): 52 """Run the ML pipeline.""" 53 x, y = load(rs) 54 x, x_test, y, y_test = split(x, y) 55 - train(x, x_test, y, iters) 56 + model = train(x, x_test, y, iters) 57 + evaluate(model, x_test, y_test) 58 59 60 if __name__ == "__main__": 61 ml_pipeline(rs=42, iters=100) Run the script locally to test whether your code runs successfully using Kale’s marshalling mechanism:
$ python3 -m kale kale_metrics.py(Optional) Produce a workflow YAML file that you can inspect:
$ python3 -m kale kale_metrics.py --compileAfter the successful execution of this command, look for the workflow YAML file inside a
.kale
directory inside your working directory. This is a file that you could upload and submit to Kubeflow manually through its User Interface (KFP UI).Deploy and run your code as a KFP pipeline:
$ python3 -m kale kale_metrics.py --kfpNote
To see the complete list of arguments and their respective usage run
python3 -m kale --help
.
What’s Next¶
The next step is to create KFP HTML artifacts using the Kale SDK.