ServeConfig¶

ServeConfig is a Kale object, which you can use to configure an InferenceService. Within a ServeConfig object you can define the backend you want to use to serve your model, limit its resources, and set the serving account for your predictor and transformer pods.

Overview

Import
Attributes
Initialization

Import ¶

The object lives in the kale.serve module. Import it as follows:

from kale.serve import ServeConfig

Attributes ¶

Name	Type	Default	Description
`env`	`List[V1EnvVar]`	`[]`	Extends the `env` field of a container
`env_from`	`List[V1EnvFromSource]`	`[]`	Extends the `envFrom` field of the container
`requests`	`Dict[str, str]`	`{}`	Sets `resources.requests` for the container
`limits`	`Dict[str, str]`	`{}`	Sets `resources.limits` for the container
`annotations`	`Dict[str, str]`	`{}`	Sets `annotations` for the Pod
`predictor`	`Dict[str, Any]`	`{}`	Sets the predictor’s spec, and the predictor’s Pod `affinity`, `tolerations` and `node_selector` fields
`transformer`	`Dict[str, Any]`	`{}`	Sets the transformer’s spec, and the transformer’s Pod `affinity`, `tolerations` and `node_selector` fields
`labels`	`Dict[str, str]`	`{}`	Sets `labels` for the Pod
`node_selector`	`Dict[str, str]`	`{}`	Sets the `node_selector` for the Pod
`affinity`	`V1Affinity`	`None`	Sets the `affinity` of the Pod
`tolerations`	`List[V1Tolerations]`	`[]`	Sets `tolerations` for the Pod
`protocol_version`	`str`	`None`	The `protocol version` of the predictor

Important

If you set any of the env, env_from, requests, limits, affinity or tolerations fields of the ServeConfig object, they populate the according predictor and transformer fields. This functionallity allows you to define values for both the predictor and transformer Pods and containers at the same time. For example, if you want the limits field to be equal to {"memory": "4Gi"} for both the predictor and transformer containers, the ServeConfig object can be the following:

serve_config = {"limits": {"memory": "4Gi"}}

Otherwise, you can set specific values to each Pod and container. If you want the limits field to be different for the predictor and transformer containers the ServeConfig object should be the following:

serve_config = {"predictor": {
                   "container": {
                       "resources": "limits": {"memory": "4Gi"}}},
                "transformer": {
                   "container": {
                       "resources": "limits": {"memory": "2Gi"}}}}

The way each generic field gets populated is the following:

If a generic value is defined and a specific one is not, then the specific value gets populated with the generic one.
For the env, env_from and tolerations fields, if both the generic and specific fields are defined, then the two fields get merged.
For the affinity, request, limits and node_selector fields, if the specific field is defined, the generic one is ignored.

Initialization ¶

You may initialize a ServeConfig similarly to any other Python object:

config = ServeConfig(env=[V1EnvVar(name="ENV1", value="VALUE1")],
                     labels={"significant-label": "a-value"},
                     runtime_version="2.6.2",
                     predictor={
                         "tolerations": [V1Toleration(key="key1",
                                                      operator="Exists",
                                                      value="value1",
                                                      effect="NoSchedule")],
                         "container": V1Container(name="container_name",
                                                  image="image_str")})

However, you can also initialize a field that expects Kubernetes objects by passing a dictionary, which Kale will then deserialize into the corresponding Kubernetes object. For example:

complex_env = {"name": "MY_POD_IP",
               "valueFrom": {"fieldRef": {"fieldPath": "status.podIP"}}}

predictor_dict = {
    "affinity": {
        "nodeAffinity": {
            "requiredDuringSchedulingIgnoredDuringExecution": {
                "nodeSelectorTerms": [{
                    "matchExpressions": [{"key": "disktype",
                                          "operator": "In",
                                          "values": ["ssd"]}]}]}}},
    "tolerations": [{"key": "key1", "operator": "Exists",
                     "value": "value1", "effect": "NoExecute"}],
    "node_selector": {"node": "node1"},
    "containers": [{"env":[{"name": "name_str", "value": "value_str"}],
                    "name": "container_name",
                    "image": "image_str",
                    "resources": {"limits": {"memory": "4Gi"}}}]}

config = ServeConfig(env=[V1EnvVar(name="ENV", value="VALUE"), complex_env],
                     limits={"cpu": "100m", "memory": "1Gi"},
                     node_selector={"node-id": "1234"},
                     predictor=predictor_dict)

To configure an Inference service using a ServeConfig object, you can pass it to the serve() function located in the same package:

from kale.serve import serve

isvc = serve(model=model, serve_config=config)

To learn more about the frequent uses of the ServeConfig object you can follow the user guides for the supported ML frameworks. For example:

Use the ServeConfig object to retrieve a model stored in an external object storage service, like S3, by following the PyTorch and Triton user guides.
Use the ServeConfig object to serve custom predictors and transformers by following the user guides in the custom inference services section.
Use the ServeConfig object to configure common parameters for the predictor and transformer Pods by following the InferenceServe configuration user guide.

ServeConfig¶

Import¶

Attributes¶

Initialization¶

Import ¶

Attributes ¶

Initialization ¶