Log KFP metrics

This guide will walk you through creating and logging KFP metrics to evaluate and compare your Kubeflow Pipeline runs.

What You’ll Need

  • An EKF or MiniKF deployment with the default Kale Docker image.
  • An understanding of how Kale SDK works.

Procedure

  1. Create a new Notebook server using the default Kale Docker image. The image will have the following naming scheme:

    gcr.io/arrikto/jupyter-kale-py36:<IMAGE_TAG>
    

    Note

    The <IMAGE_TAG> varies based on the MiniKF or EKF release.

  2. Connect to the server, open a terminal, and install scikit-learn:

    $ pip3 install --user scikit-learn==0.23.0
    
  3. Create a new python file and name it kale_metrics.py:

    $ touch kale_metrics.py
    
  4. Copy and paste the following code inside kale_metrics.py:

    # Copyright © 2021 Arrikto Inc.  All Rights Reserved.
    
    """Kale SDK.
    
    This script trains an ML pipeline to solve a binary classification task.
    """
    
    from kale.sdk import pipeline, step
    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    
    
    @step(name="data_loading")
    def load(random_state):
        """Create a random dataset for binary classification."""
        rs = int(random_state)
        x, y = make_classification(random_state=rs)
        return x, y
    
    
    @step(name="data_split")
    def split(x, y):
        """Split the data into train and test sets."""
        x, x_test, y, y_test = train_test_split(x, y, test_size=0.1)
        return x, x_test, y, y_test
    
    
    @step(name="model_training")
    def train(x, x_test, y, training_iterations):
        """Train a Logistic Regression model."""
        iters = int(training_iterations)
        model = LogisticRegression(max_iter=iters)
        model.fit(x, y)
        print(model.predict(x_test))
    
    
    @pipeline(name="binary-classification", experiment="kale-tutorial")
    def ml_pipeline(rs=42, iters=100):
        """Run the ML pipeline."""
        x, y = load(rs)
        x, x_test, y, y_test = split(x, y)
        train(x, x_test, y, iters)
    
    
    if __name__ == "__main__":
        ml_pipeline(rs=42, iters=100)
    

    Alternatively, download the kale_metrics_starter_code.py Python file.

    In this code sample, you start with a standard Python script that trains a Logistic Regression model. Moreover, you have decorated the functions using the Kale SDK. To read more about how to create this file, head to the corresponding Kale SDK user guide.

  5. Create a function that logs a metric and decorate it with the has_metrics decorator. Kale needs this decorator to know what steps will produce KFP metrics during compilation time. The following snippet summarizes the changes in code:

    --- examples/sdk/sdk.py
    +++ examples/metrics/hasmetrics.py
    @@ -5,7 +5,7 @@
     This script trains an ML pipeline to solve a binary classification task.
     """
     
    -from kale.sdk import pipeline, step
    +from kale.sdk import has_metrics, pipeline, step
     from sklearn.datasets import make_classification
     from sklearn.linear_model import LogisticRegression
     from sklearn.model_selection import train_test_split
    @@ -35,6 +35,13 @@
         print(model.predict(x_test))
     
     
    +@has_metrics
    +@step(name="model_evaluation")
    +def evaluate(model, x_test, y_test):
    +    """Evaluate the model on the test dataset."""
    +    pass
    +
    +
     @pipeline(name="binary-classification", experiment="kale-tutorial")
     def ml_pipeline(rs=42, iters=100):
         """Run the ML pipeline."""
    

    Copy the resulting code below or download the kale_metrics_has_metrics_decorator.py Python file.

    Important

    Make sure to apply @has_metrics on top of @step, not the other way around.

  6. Use the log_metric function to log a metric during the step execution. The following snippet summarizes the changes in the code:

    --- examples/metrics/hasmetrics.py
    +++ examples/metrics/log.py
    @@ -6,8 +6,10 @@
     """
     
     from kale.sdk import has_metrics, pipeline, step
    +from kale.sdk.logging import log_metric
     from sklearn.datasets import make_classification
     from sklearn.linear_model import LogisticRegression
    +from sklearn.metrics import accuracy_score
     from sklearn.model_selection import train_test_split
     
     
    @@ -39,7 +41,9 @@
     @step(name="model_evaluation")
     def evaluate(model, x_test, y_test):
         """Evaluate the model on the test dataset."""
    -    pass
    +    y_pred = model.predict(x_test)
    +    accuracy = accuracy_score(y_test, y_pred)
    +    log_metric(name="accuracy", value=accuracy)
     
     
     @pipeline(name="binary-classification", experiment="kale-tutorial")
    

    Copy the resulting code below or download the kale_metrics_log_metric.py Python file.

  7. Return the model from the train function and call the evaluate function inside ml_pipeline with the right arguments. The following snippet summarizes the changes in the code:

    --- examples/metrics/log.py
    +++ examples/metrics/metrics.py
    @@ -34,7 +34,7 @@
         iters = int(training_iterations)
         model = LogisticRegression(max_iter=iters)
         model.fit(x, y)
    -    print(model.predict(x_test))
    +    return model
     
     
     @has_metrics
    @@ -51,7 +51,8 @@
         """Run the ML pipeline."""
         x, y = load(rs)
         x, x_test, y, y_test = split(x, y)
    -    train(x, x_test, y, iters)
    +    model = train(x, x_test, y, iters)
    +    evaluate(model, x_test, y_test)
     
     
     if __name__ == "__main__":
    

    Copy the resulting code below or download the kale_metrics.py Python file.

  8. Run the script locally to test whether your code runs successfully using Kale’s marshalling mechanism:

    $ python3 -m kale kale_metrics.py
    
  9. (Optional) Produce a workflow YAML file that you can inspect:

    $ python3 -m kale kale_metrics.py --compile
    

    After the successful execution of this command, look for the workflow YAML file inside a .kale directory inside your working directory. This is a file that you could upload and submit to Kubeflow manually through its User Interface (KFP UI).

  10. Deploy and run your code as a KFP pipeline:

    $ python3 -m kale kale_metrics.py --kfp
    

    Note

    To see the complete list of arguments and their respective usage run python3 -m kale --help.

Summary

You have successfully created and logged metrics for a KFP Pipeline.

What’s Next

The next step is to create KFP HTML artifacts using the Kale SDK.