Configure Kubernetes Spec for Model¶

In this section, you will set Kubernetes configurations on a trained Machine Learning (ML) model you will serve with Kale and KFServing. You are going to configure the Kubernetes spec and metadata for the InferenceService and its underlying resources.

Overview

What You'll Need
Procedure
Summary
What's Next

What You'll Need ¶

An Arrikto EKF or MiniKF deployment with the default Kale Docker image.
An understanding of how you can serve a model with Kale.

Procedure ¶

Create a new Notebook server using the default Kale Docker image. The image will have the following naming scheme:
```
gcr.io/arrikto/jupyter-kale-py36:<IMAGE_TAG>
```
Note

The <IMAGE_TAG> varies based on the MiniKF or Arrikto EKF release.
Create a new Jupyter Notebook (that is, an IPYNB file):
Copy and paste the import statements in the first code cell, then run it:

- hide: code
```
import json

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

from kale.common.serveutils import serve
```
This is how your notebook cell will look like:
In a different code cell, prepare your dataset and train your model. Then, run it:

- hide: code
```
# load the data
x, y = make_classification(random_state=42)
x_raw, x_test_raw, y, y_test = train_test_split(x, y, test_size=0.1)

# process the data
scaler = StandardScaler()
x = scaler.fit_transform(x_raw)

# train the model
model = LogisticRegression(max_iter=1000)
model.fit(x, y)
```
This is how your notebook cell will look like:
Define the data processing function that you want to turn into a KFServing transformer component in a different code cell and run it:

- hide: code
```
def process_raw(x):
    import numpy as np
    # convert your data to a NumPy array
    x = np.array(x)
    # expand your data's dimensions
    x = x[None,:]
    # scale your features
    x = scaler.transform(x)
    # turn your data back to a Python list and reduce the dimensions
    x = x.tolist()[0]
    return x
```
This is how your notebook cell will look like:
Specify your desired Kubernetes configuration for the deployment of your model in a different code cell and run it:

- hide: code
```
deploy_config = {"limits": {"cpu": "1", "memory": "4Gi"},
                 "requests": {"cpu": "100m", "memory": "3Gi"},
                 "labels": {"my-kale-model": "logistic-regression"}}
```
This is how your notebook cell will look like:

Note

To find out which are all the available configurations you can specify, take a look at the DeployConfig API.
Call the serve function and pass the trained model, the preprocessing function, its dependencies, and the deployment configuration as arguments. Then, run the cell:

- hide: code
```
kfserver = serve(model, preprocessing_fn=process_raw,
                 preprocessing_assets={'scaler': scaler},
                 deploy_config=deploy_config)
```
This is how your notebook cell will look like:

Important

The KFServing controller uses some default values for limits and requests that the admin sets, unless you explicitly specify them. At the same time, Kubernetes will not schedule pods when their requests exceed their limits.

To ensure that the resulting resources will have valid specs, whenever you set either one of limits or requests make sure to specify some value for the other one as well, following the aforementioned restriction. If you provide a value just for requests which exceeds the default limits, Kubernetes will not schedule the resulting Pods.

The deploy_config argument accepts either a dict, which Kale will use to initialize a DeployConfig object, or a DeployConfig directly.
Invoke the server to get predictions in a different code cell and run it:

- hide: code
```
data = json.dumps({"instances": x_test_raw.tolist()})
predictions = kfserver.predict(data)
```
This is how your notebook cell will look like: