Serve a Python Function using Triton

This section will guide you through serving a plain Python function using the Triton Inference Server and Kale.

What You’ll Need

Procedure

This guide comprises three sections: In the first section, you will create a Python function and wrap it in way that you can use it with the Triton Inference Server. In the second section, you will leverage the Kale SDK to create an InferenceService using the Python function you created and the Triton backend. Finally, in the third section, you will invoke the model service to get back a prediction.

Create a Triton Inference Server Python Backend

This section will guide you through creating a Python function that performs a linear transformation on a given input. The function will be wrapped in a way that you can use it with the Triton Inference Server.

  1. Create a new notebook server using the default Kale Docker image. The image will have the following naming scheme:

    gcr.io/arrikto/jupyter-kale-py38:<IMAGE_TAG>

    Note

    The <IMAGE_TAG> varies based on the MiniKF or Arrikto EKF release.

  2. Connect to the notebook server, open a terminal, and create a new folder. Name it linear and navigate to it:

    $ mkdir linear && cd linear
  3. Create a file to place the configuration of your Python backend:

    $ touch config.pbtxt
  4. Copy and paste the following text inside the config.pbtxt file:

    name: "linear" backend: "python" input [{ name: "INPUT" data_type: TYPE_FP32 dims: [ 4 ] }] output [{ name: "OUTPUT" data_type: TYPE_FP32 dims: [ 4 ] }] instance_group [{ kind: KIND_CPU }]

    This configuration file defines the name of the model, the backend that will be used to serve it, the input and output data types, and the instance group that will be used to serve the model.

    You can see that the model expects a 4-dimensional input and returns a 4-dimensional output. The input and output data types are TYPE_FP32, which means that the model expects and returns 32-bit floating point numbers. The instance group is set to KIND_CPU, which means that the model will be served using a CPU instance.

    For more information on the configuration file, see the Triton Inference Server config documentation.

    Important

    The name of the model should match the name of the folder that contains the configuration file. In this case, the name of the model is linear, just like the name of the parent directory.

  5. Create a new folder to place the Python backend code:

    $ mkdir 1 && cd 1

    Important

    The name of the folder should match the version of the model. In this case, the version is 1. To learn more about the structure of a Triton model repository, see the Triton Inference Server model repository documentation.

  6. Create a file to place the Python backend code:

    $ touch model.py
  7. Copy and paste the following code inside the model.py file:

    triton_python_function.py
    1# Copyright © 2022 Arrikto Inc. All Rights Reserved.
    2
    3"""Triton Python model.
    4
    5This script defines a Triton Python model that can be used to serve a
    6simple Python function using the Triton Inference Server.
    7"""
    8
    9import json
    10
    11import triton_python_backend_utils as pb_utils
    12
    13
    14class TritonPythonModel:
    15 """A Triton Python function backend."""
    16
    17 def initialize(self, args):
    18 """Intialize any state associated with this model."""
    19 self.model_config = model_config = json.loads(args['model_config'])
    20
    21 # Get OUTPUT configuration
    22 output_config = pb_utils.get_output_config_by_name(
    23 model_config, "OUTPUT")
    24
    25 # Convert Triton types to numpy types
    26 self.output_dtype = pb_utils.triton_string_to_numpy(
    27 output_config['data_type'])
    28
    29 def execute(self, requests):
    30 """Execute the function."""
    31 output_dtype = self.output_dtype
    32
    33 responses = []
    34 for request in requests:
    35 input = pb_utils.get_input_tensor_by_name(request, "INPUT")
    36
    37 out = 2 * input.as_numpy() + 1
    38 out_tensor = pb_utils.Tensor("OUTPUT",
    39 out.astype(output_dtype))
    40
    41 inference_response = pb_utils.InferenceResponse(
    42 output_tensors=[out_tensor])
    43 responses.append(inference_response)
    44
    45 return responses

    Note

    Head over to the Triton Python backend documentation for more information on writing your Python backend.

  8. Upload the linear folder to S3. You can complete this step manually or by using the aws CLI:

    $ aws s3 cp cifar10 s3://<bucket-name>/linear --recursive

    Note

    You can use almost any object storage provider, such as AWS S3, Azure Blob Storage, or Google Cloud Storage. For a list of the KServe supported services and their configuration, see the KServe documentation.

  9. Retrieve the S3 URI pointing to your linear folder from the S3 UI. For example s3://<bucket-name>/.

    Important

    You should provide a URI pointing but not including the linear folder. In this case, if your URI is s3://<bucket-name>/linear, you should provide s3://<bucket-name>/.

Serve a Python function with Triton

This section will guide you through creating an InferenceService using the Triton backend and the Python function you created in the previous section and uploaded to S3.

  1. In a new terminal window, create a new file named s3-creds.yaml:

    $ touch s3-creds.yaml
  2. Copy and paste the following code into the s3-creds.yaml file:

    apiVersion: v1 kind: Secret metadata: name: s3-creds annotations: serving.kserve.io/s3-endpoint: s3.amazonaws.com serving.kserve.io/s3-region: <REGION> serving.kserve.io/s3-useanoncredential: "false" serving.kserve.io/s3-usehttps: "1" type: Opaque data: AWS_ACCESS_KEY_ID: <AWS-ACCESS-KEY-ID> AWS_SECRET_ACCESS_KEY: <AWS-SECRET-ACCESS-KEY>

    Replace the <REGION>, <AWS-ACCESS-KEY-ID, and <AWS-SECRET-ACCESS-KEY> placeholders with your credentials. KServe reads the secret annotations to inject the S3 environment variables on the storage initializer or model agent to download the models from S3 storage.

  3. Create a new file for your ServiceAccount resource:

    $ touch kserve-sa.yaml
  4. Copy and paste the following code into the kserve-sa.yaml file:

    apiVersion: v1 kind: ServiceAccount metadata: name: kserve-sa secrets: - name: s3-creds
  5. Apply the Secret and the ServiceAccount resources:

    $ kubectl apply -f s3-creds.yaml && kubectl apply -f kserve-sa.yaml

    Note

    If you are using a different object storage provider read the KServe documentation to configure your environment:

  6. Create a new Jupyter notebook (that is, an IPYNB file):

    ../../../_images/ipynb2.png
  7. Copy and paste the import statements in the next code cell, and run it:

    import json from kale.serve import serve, Endpoint

    This is how your notebook cell will look like:

    ../../../_images/triton-imports.png
  8. Instruct Kale to serve the model using the S3 URI you retrieved in a previous step. Copy and paste the following code in the next code cell, and run it:

    config = {"protocol_version": "v2", "predictor": {"service_account_name": "kserve-sa", "storage_uri": "s3://arrikto-docs-kale-serve/triton/", "model_format": {"name": "triton"}}} isvc = serve(name="linear", serve_config=config)

    This is how your notebook cell will look like:

    ../../../_images/triton-serve.png

Get Predictions

In this section, you will query the model endpoint to get predictions.

  1. Navigate to the Models UI to retrieve the name of the InferenceService. In this example, it is linear.

    ../../../_images/triton-endpoint-name.png
  2. In the existing notebook, in a different code cell, initialize a Kale Endpoint object using the name of the InferenceService you retrieved in the previous step. Then, run the cell:

    endpoint = Endpoint(name="linear")

    Note

    When initializing an Endpoint, you can also pass the namespace of the InferenceService. For example, if your namespace is my-namespace:

    endpoint = Endpoint(name="linear", namespace="my-namespace")

    If you do not provide one, Kale assumes the namespace of the notebook server. In our case it is kubeflow-user.

    This is how your notebook cell will look like:

    ../../../_images/triton-endpoint-define.png
  3. Convert the test example into JSON format. Copy and paste the following code into a new code cell, and run it:

    data = {"inputs":[{ "name": "INPUT", "shape": [4], "datatype": "FP32", "data": [1, 2, 3, 4]}]}

    This is how your notebook cell will look like:

    ../../../_images/triton-test-example-json.png
  4. Invoke the server to get predictions. Copy and paste the following snippet in a different code cell, and run it:

    res = endpoint.predict(json.dumps(data)) print(res)

    This is how your notebook cell will look like:

    ../../../_images/triton-pred.png

Summary

You have successfully served a simple Python function using the Triton Inference Server and the Kale serve API.

What’s Next

Check out how you can serve a custom model.