Serve LightGBM Models

This section will guide you through serving a LightGBM model, using the Kale serve API.

What You’ll Need

  • An Arrikto EKF or MiniKF deployment with the default Kale Docker image.
  • An understanding of how the Kale SDK works.
  • An understanding of how the Kale serve API works.

Procedure

This guide comprises three sections: In the first section, you will explore and process the dataset. Then, in the second section, you will leverage the Kale SDK to build a Machine Learning (ML) pipeline that trains and serves a LightGBM model. Finally, in the third section, you will invoke the model service to get predictions on a holdout test subset.

Load & Split the Dataset

In this guide, you will work with the Iris dataset. The Iris dataset contains information on 3 types of the Iris plant. It provides 50 instances of each type and the end goal is to predict the type of the Iris plant for each example.

  1. Create a new notebook server using the default Kale Docker image. The image will have the following naming scheme:

    gcr.io/arrikto/jupyter-kale-py38:<IMAGE_TAG>

    Note

    The <IMAGE_TAG> varies based on the MiniKF or Arrikto EKF release.

  2. Connect to the Jupyter server and create a new Jupyter notebook (that is, an IPYNB file):

    ../../../_images/ipynb2.png
  3. Install the lightgbm library in the first code cell:

    # installing lightgbm !pip3 install lightgbm==3.3.2

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-libraries.png
  4. Restart the notebook’s kernel using the corresponding button in the UI:

    ../../../_images/restart-kernel.png
  5. Copy and paste the import statements in the next code cell, and run it:

    import os import json import numpy as np import lightgbm as lgb from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris from kale.serve import Endpoint

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-imports.png
  6. Load the features and targets of the dataset. Copy and paste the following code into a new code cell, and run it:

    iris = datasets.load_iris() x = iris.data y = iris.target

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-dataset-load.png
  7. Split the dataset into training and test subsets. In a new cell, copy and paste the following code, and run it:

    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.2, random_state=42)

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-dataset-split.png

Serve LightGBM Model

  1. In the same notebook server, open a terminal, and create a new Python file. Name it serve_lightgbm_model.py:

    $ touch serve_lightgbm_model.py
  2. Copy and paste the following code inside serve_lightgbm_model.py:

    lightgbm_starter.py
    1# Copyright © 2022 Arrikto Inc. All Rights Reserved.
    2
    3"""Kale SDK.
    4
    5This script uses an ML pipeline to train and serve an LightGBM Model.
    6"""
    7
    8import lightgbm as lgb
    9
    10from typing import Tuple
    11
    12from sklearn.model_selection import train_test_split
    13from sklearn.datasets import load_iris
    14
    15from kale.types import MarshalData
    16from kale.sdk import pipeline, step
    17
    18
    19@step(name="data_loading")
    20def load_split_dataset() -> Tuple[MarshalData, MarshalData]:
    21 """Fetch Iris dataset."""
    22 # get data and target of the dataset
    23 iris = load_iris()
    24 x = iris.data
    25 y = iris.target
    26
    27 # split the dataset
    28 x_train, _, y_train, _ = train_test_split(x, y, test_size=.2,
    29 random_state=42)
    30 return x_train, y_train
    31
    32
    33@step(name="model_training")
    34def train(x: MarshalData, y: MarshalData):
    35 """Train a Booster model."""
    36 lgb_train = lgb.Dataset(x, y)
    37
    38 params = {"objective": "multiclass",
    39 "metric": "softmax",
    40 "num_class": 3}
    41
    42 lgb.train(params=params, train_set=lgb_train)
    43
    44
    45@pipeline(name="regression", experiment="lightgbm-tutorial")
    46def ml_pipeline():
    47 """Run the ML pipeline."""
    48 x_train, y_train = load_split_dataset()
    49 train(x_train, y_train)
    50
    51
    52if __name__ == "__main__":
    53 ml_pipeline()

    This script defines a KFP run using the Kale SDK. Specifically, it defines a pipeline with two steps:

    • The first step (data_loading) loads and splits the Iris dataset.
    • The second step (model_training) trains a LightGBM Booster model.
  3. Create a new step function which logs an LightGBMModel artifact, using the Kale API. The following snippet summarizes the changes in code:

    Important

    Running these pipelines locally won’t work. After introducing register_model step, run the pipeline as a KFP pipeline, since this step creates a Kubeflow artifact.

    lightgbm_log_model_artifact.py
    1# Copyright © 2022 Arrikto Inc. All Rights Reserved.
    2
    3"""Kale SDK.
    4-11
    4
    5This script uses an ML pipeline to train and serve an LightGBM Model.
    6"""
    7
    8import lightgbm as lgb
    9
    10from typing import Tuple
    11
    12from sklearn.model_selection import train_test_split
    13from sklearn.datasets import load_iris
    14
    15+from kale.ml import Signature
    16from kale.types import MarshalData
    17from kale.sdk import pipeline, step
    18+from kale.common import mlmdutils, artifacts
    19
    20
    21@step(name="data_loading")
    22-32
    22def load_split_dataset() -> Tuple[MarshalData, MarshalData]:
    23 """Fetch Iris dataset."""
    24 # get data and target of the dataset
    25 iris = load_iris()
    26 x = iris.data
    27 y = iris.target
    28
    29 # split the dataset
    30 x_train, _, y_train, _ = train_test_split(x, y, test_size=.2,
    31 random_state=42)
    32 return x_train, y_train
    33
    34
    35@step(name="model_training")
    36-def train(x: MarshalData, y: MarshalData):
    37+def train(x: MarshalData, y: MarshalData) -> MarshalData:
    38 """Train a Booster model."""
    39 lgb_train = lgb.Dataset(x, y)
    40
    41-41
    41 params = {"objective": "multiclass",
    42 "metric": "softmax",
    43 "num_class": 3}
    44
    45- lgb.train(params=params, train_set=lgb_train)
    46+ model = lgb.train(params=params, train_set=lgb_train)
    47+
    48+ return model
    49+
    50+
    51+@step(name="register_model")
    52+def register_model(model: MarshalData, x: MarshalData, y: MarshalData):
    53+ """Register the model in the MLMD store."""
    54+ mlmd = mlmdutils.get_mlmd_instance()
    55+
    56+ signature = Signature(
    57+ input_size=[1] + list(x[0].shape),
    58+ output_size=[1] + list(y[0].shape),
    59+ input_dtype=x.dtype,
    60+ output_dtype=y.dtype)
    61+
    62+ model_artifact = artifacts.LightGBMModel(
    63+ model=model,
    64+ description="A simple LightGBM classifier",
    65+ version="1.0.0",
    66+ author="Kale",
    67+ signature=signature,
    68+ tags={"app": "lightgbm-tutorial"}).submit_artifact()
    69+
    70+ mlmd.link_artifact_as_output(model_artifact.id)
    71
    72
    73@pipeline(name="regression", experiment="lightgbm-tutorial")
    74def ml_pipeline():
    75 """Run the ML pipeline."""
    76 x_train, y_train = load_split_dataset()
    77- train(x_train, y_train)
    78+ model = train(x_train, y_train)
    79+ register_model(model, x_train, y_train)
    80
    81
    82if __name__ == "__main__":
    83 ml_pipeline()
  4. Create a new step function which serves the LightGBMModel artifact you created in the previous step, using the Kale serve API. The following snippet summarizes the changes in code:

    lightgbm_serve.py
    1# Copyright © 2022 Arrikto Inc. All Rights Reserved.
    2
    3"""Kale SDK.
    4-11
    4
    5This script uses an ML pipeline to train and serve an LightGBM Model.
    6"""
    7
    8import lightgbm as lgb
    9
    10from typing import Tuple
    11
    12from sklearn.model_selection import train_test_split
    13from sklearn.datasets import load_iris
    14
    15+from kale.serve import serve
    16from kale.ml import Signature
    17from kale.types import MarshalData
    18from kale.sdk import pipeline, step
    19-47
    19from kale.common import mlmdutils, artifacts
    20
    21
    22@step(name="data_loading")
    23def load_split_dataset() -> Tuple[MarshalData, MarshalData]:
    24 """Fetch Iris dataset."""
    25 # get data and target of the dataset
    26 iris = load_iris()
    27 x = iris.data
    28 y = iris.target
    29
    30 # split the dataset
    31 x_train, _, y_train, _ = train_test_split(x, y, test_size=.2,
    32 random_state=42)
    33 return x_train, y_train
    34
    35
    36@step(name="model_training")
    37def train(x: MarshalData, y: MarshalData) -> MarshalData:
    38 """Train a Booster model."""
    39 lgb_train = lgb.Dataset(x, y)
    40
    41 params = {"objective": "multiclass",
    42 "metric": "softmax",
    43 "num_class": 3}
    44
    45 model = lgb.train(params=params, train_set=lgb_train)
    46
    47 return model
    48
    49
    50@step(name="register_model")
    51-def register_model(model: MarshalData, x: MarshalData, y: MarshalData):
    52+def register_model(model: MarshalData, x: MarshalData, y: MarshalData) -> int:
    53 """Register the model in the MLMD store."""
    54 mlmd = mlmdutils.get_mlmd_instance()
    55
    56-67
    56 signature = Signature(
    57 input_size=[1] + list(x[0].shape),
    58 output_size=[1] + list(y[0].shape),
    59 input_dtype=x.dtype,
    60 output_dtype=y.dtype)
    61
    62 model_artifact = artifacts.LightGBMModel(
    63 model=model,
    64 description="A simple LightGBM classifier",
    65 version="1.0.0",
    66 author="Kale",
    67 signature=signature,
    68 tags={"app": "lightgbm-tutorial"}).submit_artifact()
    69
    70 mlmd.link_artifact_as_output(model_artifact.id)
    71+ return model_artifact.id
    72+
    73+
    74+@step(name="serve_model")
    75+def serve_model(model_id: int):
    76+ serve(name="lightgbm-tutorial", model_id=model_id)
    77
    78
    79@pipeline(name="regression", experiment="lightgbm-tutorial")
    80-80
    80def ml_pipeline():
    81 """Run the ML pipeline."""
    82 x_train, y_train = load_split_dataset()
    83 model = train(x_train, y_train)
    84- register_model(model, x_train, y_train)
    85+ model_id = register_model(model, x_train, y_train)
    86+ serve_model(model_id)
    87
    88
    89if __name__ == "__main__":
    90 ml_pipeline()
  5. Deploy and run your code as a KFP pipeline:

    $ python3 -m kale serve_lightgbm_model.py --kfp
  6. Select Runs to view the KFP run you just created. This is what it looks like when the pipeline completes successfully:

    ../../../_images/lightgbm-completed-run.png
  7. Wait until the pipeline completes. Check the Logs tab of the serve_model step to see whether the InferenceService is running.

    ../../../_images/lightgbm-logs.png
  8. Select Models and click on the endpoint you created:

    ../../../_images/lightgbm-endpoint.png

Get Predictions

In this section, you will query the model endpoint to get predictions for the examples in the validation subset.

  1. Navigate to the Models UI to retrieve the name of the InferenceService. In this example, it is xgboost-tutorial.

    ../../../_images/lightgbm-endpoint-name.png
  2. In the existing notebook, in a different code cell, initialize a Kale Endpoint object using the name of the InferenceService you retrieved in the previous step. Then, run the cell:

    endpoint = Endpoint(name="lightgbm-tutorial")

    Note

    When initializing an Endpoint, you can also pass the namespace of the InferenceService. For example, if your namespace is my-namespace:

    endpoint = Endpoint(name="lightgbm-tutorial", namespace="my-namespace")

    If you do not provide one, Kale assumes the namespace of the notebook server. In our case is kubeflow-user.

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-endpoint-define.png
  3. Examine a test sample and convert it into JSON format:

    index_test = 3 print(x_test[index_test]) print("Iris type class: ", y_test[index_test])

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-test-example.png
  4. Prepare the data payload for the prediction request. Copy and paste the following code in a new cell, and run it:

    data = {"inputs": [{"Column_0": [6.0], "Column_1": [2.9], "Column_2": [4.5], "Column_3": [1.5]}]}

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-json-payload.png
  5. Invoke the server to get predictions. Copy and paste the following snippet in a different code cell, and run it:

    # get and print the prediction res = endpoint.predict(json.dumps(data)) print(f"The prediction is '{np.argmax(res['predictions'])}'")

    This is how your notebook cell will look like:

    ../../../_images/lightgbm-pred.png

Summary

You have successfully created a Kubeflow pipeline that trains a LightGBM model, logs it in MLMD, and creates a model endpoint using the Kale serve API.

What’s Next

Check out how you can serve a Python function using the Triton Inference Server.