Log KFP Metrics¶

This guide will walk you through creating and logging KFP metrics to evaluate and compare your Kubeflow Pipeline runs.

Overview

What You’ll Need
Procedure
Summary
What’s Next

What You’ll Need ¶

An EKF or MiniKF deployment with the default Kale Docker image.
An understanding of how Kale SDK works.

Procedure ¶

Create a new Notebook server using the default Kale Docker image. The image will have the following naming scheme:

gcr.io/arrikto/jupyter-kale-py38:<IMAGE_TAG>

Note

The <IMAGE_TAG> varies based on the MiniKF or EKF release.
Connect to the server, open a terminal, and install scikit-learn:

$ pip3 install --user scikit-learn==0.23.0
Create a new python file and name it kale_metrics.py:

$ touch kale_metrics.py

Copy and paste the following code inside kale_metrics.py:

sdk.py

1# Copyright © 2021-2022 Arrikto Inc.  All Rights Reserved.
2
3"""Kale SDK.
4
5This script trains an ML pipeline to solve a binary classification task.
6"""
7
8from kale.sdk import pipeline, step
9from sklearn.datasets import make_classification
10from sklearn.linear_model import LogisticRegression
11from sklearn.model_selection import train_test_split
12
13
14@step(name="data_loading")
15def load(random_state):
16    """Create a random dataset for binary classification."""
17    rs = int(random_state)
18    x, y = make_classification(random_state=rs)
19    return x, y
20
21
22@step(name="data_split")
23def split(x, y):
24    """Split the data into train and test sets."""
25    x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
26    return x, x_test, y, y_test
27
28
29@step(name="model_training")
30def train(x, x_test, y, training_iterations):
31    """Train a Logistic Regression model."""
32    iters = int(training_iterations)
33    model = LogisticRegression(max_iter=iters)
34    model.fit(x, y)
35    print(model.predict(x_test))
36
37
38@pipeline(name="binary-classification", experiment="kale-tutorial")
39def ml_pipeline(rs=42, iters=100):
40    """Run the ML pipeline."""
41    x, y = load(rs)
42    x, x_test, y, y_test = split(x=x, y=y)
43    train(x, x_test, y, training_iterations=iters)
44
45
46if __name__ == "__main__":
47    ml_pipeline(rs=42, iters=100)

In this code sample, you start with a standard Python script that trains a Logistic Regression model. Moreover, you have decorated the functions using the Kale SDK. To read more about how to create this file, head to the corresponding Kale SDK user guide.

Create a function that logs a metric and decorate it with the has_metrics decorator. Kale needs this decorator to know what steps will produce KFP metrics during compilation time. The following snippet summarizes the changes in code:

hasmetrics.py

1-# Copyright © 2021-2022 Arrikto Inc.  All Rights Reserved.
2+# Copyright © 2021 Arrikto Inc.  All Rights Reserved.
3
4"""Kale SDK.
5
6This script trains an ML pipeline to solve a binary classification task.
7"""
8
9-from kale.sdk import pipeline, step
10+from kale.sdk import has_metrics, pipeline, step
11from sklearn.datasets import make_classification
12from sklearn.linear_model import LogisticRegression
13from sklearn.model_selection import train_test_split
14-36
14
15
16@step(name="data_loading")
17def load(random_state):
18    """Create a random dataset for binary classification."""
19    rs = int(random_state)
20    x, y = make_classification(random_state=rs)
21    return x, y
22
23
24@step(name="data_split")
25def split(x, y):
26    """Split the data into train and test sets."""
27    x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
28    return x, x_test, y, y_test
29
30
31@step(name="model_training")
32def train(x, x_test, y, training_iterations):
33    """Train a Logistic Regression model."""
34    iters = int(training_iterations)
35    model = LogisticRegression(max_iter=iters)
36    model.fit(x, y)
37    print(model.predict(x_test))
38
39
40+@has_metrics
41+@step(name="model_evaluation")
42+def evaluate(model, x_test, y_test):
43+    """Evaluate the model on the test dataset."""
44+    pass
45+
46+
47@pipeline(name="binary-classification", experiment="kale-tutorial")
48def ml_pipeline(rs=42, iters=100):
49    """Run the ML pipeline."""
50    x, y = load(rs)
51-    x, x_test, y, y_test = split(x=x, y=y)
52-    train(x, x_test, y, training_iterations=iters)
53+    x, x_test, y, y_test = split(x, y)
54+    train(x, x_test, y, iters)
55
56
57if __name__ == "__main__":
58    ml_pipeline(rs=42, iters=100)

Important

Make sure to apply @has_metrics on top of @step, not the other way around.

Use the log_metric function to log a metric during the step execution. The following snippet summarizes the changes in the code:

log.py

1# Copyright © 2021 Arrikto Inc.  All Rights Reserved.
2
3"""Kale SDK.
4-5
4
5This script trains an ML pipeline to solve a binary classification task.
6"""
7
8from kale.sdk import has_metrics, pipeline, step
9+from kale.sdk.logging import log_metric
10from sklearn.datasets import make_classification
11from sklearn.linear_model import LogisticRegression
12+from sklearn.metrics import accuracy_score
13from sklearn.model_selection import train_test_split
14
15
16-40
16@step(name="data_loading")
17def load(random_state):
18    """Create a random dataset for binary classification."""
19    rs = int(random_state)
20    x, y = make_classification(random_state=rs)
21    return x, y
22
23
24@step(name="data_split")
25def split(x, y):
26    """Split the data into train and test sets."""
27    x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
28    return x, x_test, y, y_test
29
30
31@step(name="model_training")
32def train(x, x_test, y, training_iterations):
33    """Train a Logistic Regression model."""
34    iters = int(training_iterations)
35    model = LogisticRegression(max_iter=iters)
36    model.fit(x, y)
37    print(model.predict(x_test))
38
39
40@has_metrics
41@step(name="model_evaluation")
42def evaluate(model, x_test, y_test):
43    """Evaluate the model on the test dataset."""
44-    pass
45+    y_pred = model.predict(x_test)
46+    accuracy = accuracy_score(y_test, y_pred)
47+    log_metric(name="accuracy", value=accuracy)
48
49
50@pipeline(name="binary-classification", experiment="kale-tutorial")
51-56
51def ml_pipeline(rs=42, iters=100):
52    """Run the ML pipeline."""
53    x, y = load(rs)
54    x, x_test, y, y_test = split(x, y)
55    train(x, x_test, y, iters)
56
57
58if __name__ == "__main__":
59    ml_pipeline(rs=42, iters=100)

Return the model from the train function and call the evaluate function inside ml_pipeline with the right arguments. The following snippet summarizes the changes in the code:

metrics.py

1# Copyright © 2021 Arrikto Inc.  All Rights Reserved.
2
3"""Kale SDK.
4-33
4
5This script trains an ML pipeline to solve a binary classification task.
6"""
7
8from kale.sdk import has_metrics, pipeline, step
9from kale.sdk.logging import log_metric
10from sklearn.datasets import make_classification
11from sklearn.linear_model import LogisticRegression
12from sklearn.metrics import accuracy_score
13from sklearn.model_selection import train_test_split
14
15
16@step(name="data_loading")
17def load(random_state):
18    """Create a random dataset for binary classification."""
19    rs = int(random_state)
20    x, y = make_classification(random_state=rs)
21    return x, y
22
23
24@step(name="data_split")
25def split(x, y):
26    """Split the data into train and test sets."""
27    x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
28    return x, x_test, y, y_test
29
30
31@step(name="model_training")
32def train(x, x_test, y, training_iterations):
33    """Train a Logistic Regression model."""
34    iters = int(training_iterations)
35    model = LogisticRegression(max_iter=iters)
36    model.fit(x, y)
37-    print(model.predict(x_test))
38+    return model
39
40
41@has_metrics
42-51
42@step(name="model_evaluation")
43def evaluate(model, x_test, y_test):
44    """Evaluate the model on the test dataset."""
45    y_pred = model.predict(x_test)
46    accuracy = accuracy_score(y_test, y_pred)
47    log_metric(name="accuracy", value=accuracy)
48
49
50@pipeline(name="binary-classification", experiment="kale-tutorial")
51def ml_pipeline(rs=42, iters=100):
52    """Run the ML pipeline."""
53    x, y = load(rs)
54    x, x_test, y, y_test = split(x, y)
55-    train(x, x_test, y, iters)
56+    model = train(x, x_test, y, iters)
57+    evaluate(model, x_test, y_test)
58
59
60if __name__ == "__main__":
61    ml_pipeline(rs=42, iters=100)

Run the script locally to test whether your code runs successfully using Kale’s marshalling mechanism:

$ python3 -m kale kale_metrics.py
(Optional) Produce a workflow YAML file that you can inspect:

$ python3 -m kale kale_metrics.py --compile

After the successful execution of this command, look for the workflow YAML file inside a .kale directory inside your working directory. This is a file that you could upload and submit to Kubeflow manually through its User Interface (KFP UI).
Deploy and run your code as a KFP pipeline:

$ python3 -m kale kale_metrics.py --kfp

Note

To see the complete list of arguments and their respective usage run python3 -m kale --help.

Summary ¶

You have successfully created and logged metrics for a KFP Pipeline.

What’s Next ¶

The next step is to create KFP HTML artifacts using the Kale SDK.

Create KFP HTML Artifacts

Log KFP Metrics¶

What You’ll Need¶

Procedure¶

Summary¶

What’s Next¶

What You’ll Need ¶

Procedure ¶

Summary ¶

What’s Next ¶