Serve PyTorch Models

This section will guide you through serving a PyTorch model, using the Kale serve API.

What You’ll Need


This guide comprises three sections: In the first section, you explore and process the dataset. Then, in the second section, you build, train, and package a PyTorch model using the Torch model archiver for TorchServe. Next, you serve the model using the Kale serve API and, finally, in the third section, you will invoke the model service to get predictions on a holdout test subset.

Explore Dataset

In this guide, you will work with the CIFAR10 dataset. The CIFAR10 dataset consists of 60000 32x32 RBG images. The dataset creators have categorized the images in 10 different classes and the end goal is to correctly predict the object that each image depicts.

  1. Create a new notebook server using the Kale GPU Docker image. The image will have the following naming scheme:<IMAGE_TAG>


    The <IMAGE_TAG> varies based on the MiniKF or Arrikto EKF release.


    If you want to have access to a GPU device you must specifically request one or more from the Jupyter Web App UI. For this user guide, access to a GPU device is not required, but we recommend to add one so that you can get better results.

  2. Create a new Jupyter notebook (that is, an IPYNB file):

  3. Install the necessary dependencies in the first code cell. Copy and paste the following code, and run the cell:

    !pip3 install torch>=1.3.1 torchvision==0.8.2 torch-model-archiver==0.6.0

    This is how your notebook cell will look like:

  4. Restart the notebook’s kernel using the corresponding button in the UI:

  5. Copy and paste the import statements in the next code cell, and run it:

    import json import numpy as np import matplotlib.pyplot as plt import torch import torchvision import torch.nn as nn import torch.optim as optim import torchvision.transforms as transforms from kale.serve import Endpoint

    This is how your notebook cell will look like:

  6. Load the dataset into train and test subsets. Copy and paste the following code into a new cell, and run it:

    batch_size = 4 transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # Download the images and transform them to Tensors trainset = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform) testset = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform) # Load the data trainloader =, batch_size=batch_size, shuffle=True, num_workers=2) testloader =, batch_size=batch_size, shuffle=False, num_workers=2) classes = ("plane", "car", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck")

    This is how your notebook cell will look like:

  7. Visualize images from the dataset and print the category they belong to:

    # get some random training images dataiter = iter(trainloader) images, labels = _, ax = plt.subplots(1, 4, figsize=(10, 5)) for i in range(4): img = images[i] img = img / 2 + 0.5 npimg = img.numpy() label = labels[i] ax[i % 4].imshow(np.transpose(npimg, (1, 2, 0))) ax[i % 4].set_title(classes[label]) ax[i % 4].axis("off")

    This is how your notebook cell will look like:


Serve PyTorch Model

  1. In the same notebook server, open a terminal, and create two new Python files:

    $ touch $ touch
    • Inside the first module you will define the PyTorch model.
    • Inside the second module you will define a handler component. TorchServe needs this module to process the data before passing it to the model.


    To learn more about TorchServe handlers, see the TorchServe handlers documentation. TorchServe provides several default handlers for common use cases, but you can also define your own.

  2. Define a simple PyTorch Convolutional Neural Network (CNN). Copy and paste the following code inside
    1# Copyright © 2022 Arrikto Inc. All Rights Reserved.
    3"""PyTorch Model Definition.
    5This script defines a simple PyTorch CNN.
    8import torch
    9import torch.nn as nn
    10import torch.nn.functional as f
    13class Net(nn.Module):
    14 """Define CNN model."""
    16 def __init__(self):
    17 super(Net, self).__init__()
    18 self.conv1 = nn.Conv2d(3, 6, 5)
    19 self.conv2 = nn.Conv2d(6, 16, 5)
    20 self.fc1 = nn.Linear(16 * 5 * 5, 120)
    21 self.fc2 = nn.Linear(120, 84)
    22 self.fc3 = nn.Linear(84, 10)
    24 def forward(self, x):
    25 x = self.conv1(x)
    26 x = f.relu(x)
    27 x = f.max_pool2d(x, kernel_size=2)
    28 x = self.conv2(x)
    29 x = f.relu(x)
    30 x = f.max_pool2d(x, kernel_size=2)
    32 x = torch.flatten(x, 1)
    33 x = self.fc1(x)
    34 x = f.relu(x)
    35 x = self.fc2(x)
    36 x = f.relu(x)
    37 x = self.fc3(x)
    39 return x
  3. Copy and paste the following code inside
    1# Copyright © 2022 Arrikto Inc. All Rights Reserved.
    3"""CIFAR10 handler Definition.
    5This script defines a simple TorchServe handler.
    8from torchvision import transforms
    9from torch.profiler import ProfilerActivity
    10from ts.torch_handler.image_classifier import ImageClassifier
    13class Cifar10Classifier(ImageClassifier):
    14 """Cifar10Classifier Handler."""
    16 class_names = ["plane", "car", "bird", "cat", "deer",
    17 "dog", "frog", "horse", "ship", "truck"]
    19 image_processing = transforms.Compose([
    20 transforms.ToTensor(),
    21 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    23 def __init__(self):
    24 super(Cifar10Classifier, self).__init__()
    25 self.profiler_args = {
    26 "activities": [ProfilerActivity.CPU],
    27 "record_shapes": True,
    28 }
    30 def postprocess(self, data):
    31 """Convert the predicted output response to a label.
    33 Args:
    34 data (list): The predicted output response.
    36 Returns:
    37 list : A list of dictionaries with processed predictions.
    38 """
    39 pred = data.argmax(1).tolist()
    40 labels = [self.class_names[p] for p in pred]
    41 return labels

    This script defines a custom image classifier handler, which processes the inputs before passing them to the model, and postprocesses the predictions before returning them to the client.

  4. Return to your Notebook file and run the following code to train your model for one epoch:

    from classifier import Net net = Net() epochs = 1 # Change this number if you have a GPU device criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) for epoch in range(epochs): # loop over the dataset multiple times running_loss = 0.0 for i, data in enumerate(trainloader, 0): # get the inputs; data is a list of [inputs, labels] inputs, labels = data # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # print statistics running_loss += loss.item() # print every 2000 mini-batches if i % 2000 == 1999: print("Epoch:", epoch + 1, "---- Mini batch:", i + 1, "---- Loss:", "%.3f" % (running_loss / 2000)) running_loss = 0.0

    This is how your notebook cell will look like:



    If you have a GPU device, you can train the model for more epochs. To do so, change the value of the epochs variable to a higher number. This will improve the accuracy of the model, but producing an accurate model is not the focus of this user guide.

  5. Serialize the model locally. Copy and paste the following code into a new cell, and run it:

    from kale.marshal import save save(net, "model")

    This is how your notebook cell will look like:


    This command will create a directory, containing the weights of the model, and its definition. You will need these two files to package the model in a .mar file that can be deployed with KServe.

  6. In the terminal, run the torch-model-archiver CLI to package the model and the handler:

    jovyan@serve-pytorch-0:~$ torch-model-archiver \ > --model-name cifar10 \ > --version 1.0 \ > --model-file /home/jovyan/ \ > --serialized-file /home/jovyan/ \ > --handler /home/jovyan/cifar10_handler

    This command will create a cifar10.mar file. You will need this file in the next step.


    To learn more about .mar archives, how you can create one, what are the different options, and how to use them, see the TorchServe documentation.

  7. Create and name a folder name cifar10. This is the folder you will point TorchServe to. Inside there should be two other directories:

    • a config directory containing a file
    • a model-store directory to hold the model archive

    The following file tree depicts the final structure of the folder:

    jovyan@serve-pytorch-0:~$ tree cifar10 cifar10 ├── config │  └── └── model-store └── cifar10.mar
  8. Place the following contents in the file:


    To learn more about the configuration options, head to the configuration guide for TorchServe.

  9. Copy the cifar10.mar file in the model-store folder:

    $ cp cifar10.mar cifar10/model-store
  10. Upload the cifar10 folder to S3. You can complete this step manually or by using the aws CLI:

    $ aws s3 cp cifar10 s3://<bucket-name>/cifar10 --recursive


    You can use almost any object storage provider, such as AWS S3, Azure Blob Storage, or Google Cloud Storage. For a list of the KServe supported services and their configuration, see the KServe documentation.

  11. Retrieve the S3 URI pointing to your cifar10 folder from the S3 UI. For example s3://<bucket-name>/.


    You should provide a URI pointing but not including the cifar10 folder. In this case, if your URI is s3://<bucket-name>/cifar10, you should provide s3://<bucket-name>/.

  12. In your terminal, create a new file named s3-creds.yaml:

    $ touch s3-creds.yaml
  13. Copy and paste the following code into the s3-creds.yaml file:

    apiVersion: v1 kind: Secret metadata: name: s3-creds annotations: <REGION> "false" "1" type: Opaque data: AWS_ACCESS_KEY_ID: <AWS-ACCESS-KEY-ID> AWS_SECRET_ACCESS_KEY: <AWS-SECRET-ACCESS-KEY>

    Replace the <REGION>, <AWS-ACCESS-KEY-ID, and <AWS-SECRET-ACCESS-KEY> placeholders with your credentials. KServe reads the secret annotations to inject the S3 environment variables on the storage initializer or model agent to download the models from S3 storage.

  14. Create a new file for your ServiceAccount resource:

    $ touch kserve-sa.yaml
  15. Copy and paste the following code into the kserve-sa.yaml file:

    apiVersion: v1 kind: ServiceAccount metadata: name: kserve-sa secrets: - name: s3-creds
  16. Apply the Secret and the ServiceAccount resources:

    $ kubectl apply -f s3-creds.yaml && kubectl apply -f kserve-sa.yaml


    If you are using a different object storage provider read the KServe documentation to configure your environment:

  17. In the Notebook server you have running, instruct Kale to serve the model using the S3 URI you retrieved in a previous step:

    from kale.serve import serve config = {"predictor": {"service_account_name": "kserve-sa", "storage_uri": "s3://kserve-examples/", "model_format": {"name": "pytorch"}}} isvc = serve(name="cifar-10", serve_config=config)

    This is how your notebook cell will look like:


Get Predictions

In this section, you will query the model endpoint to get predictions for the images in the test subset.

  1. Navigate to the Models UI to retrieve the name of the InferenceService. In this example, it is cifar10.

  2. In the existing notebook, in a different code cell, initialize a Kale Endpoint object using the name of the InferenceService you retrieved in the previous step. Then, run the cell:

    endpoint = Endpoint(name="cifar10")


    When initializing an Endpoint, you can also pass the namespace of the InferenceService. For example, if your namespace is my-namespace:

    endpoint = Endpoint(name="cifar10", namespace="my-namespace")

    If you do not provide one, Kale assumes the namespace of the notebook server. In our case it is kubeflow-user.

    This is how your notebook cell will look like:

  3. Visualize a test sample:

    dataiter = iter(testloader) images, labels = img = images[1] label = labels[1] img = img / 2 + 0.5 npimg = img.numpy() plt.title(classes[label]) plt.imshow(np.transpose(npimg, (1, 2, 0)))

    This is how your notebook cell will look like:

  4. Transform the example image the same way you did during training:

    transform = transforms.Compose( [transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) transformed_data = transform(img)

    This is how your notebook cell will look like:

  5. Convert the test example into JSON format. Copy and paste the following code into a new code cell, and run it:

    data = {"instances": [{"data": transformed_data.tolist()}]}

    This is how your notebook cell will look like:

  6. Invoke the server to get predictions. Copy and paste the following snippet in a different code cell, and run it:

    res = endpoint.predict(json.dumps(data)) print(res)

    This is how your notebook cell will look like:



You have successfully served a PyTorch model stored on S3, using the Kale serve API.

What’s Next

Check out how you to serve an XGBoost model.