ServeConfig

ServeConfig is a Kale object, which you can use to configure an InferenceService. Within a ServeConfig object you can define the backend you want to use to serve your model, limit its resources, and set the serving account for your predictor and transformer pods.

Import

The object lives in the kale.serve module. Import it as follows:

from kale.serve import ServeConfig

Attributes

Name Type Default Description
env List[V1EnvVar] [] Extends the env field of a container
env_from List[V1EnvFromSource] [] Extends the envFrom field of the container
requests Dict[str, str] {} Sets resources.requests for the container
limits Dict[str, str] {} Sets resources.limits for the container
annotations Dict[str, str] {} Sets annotations for the Pod
predictor Dict[str, Any] {} Sets the predictor’s spec, and the predictor’s Pod affinity, tolerations and node_selector fields
transformer Dict[str, Any] {} Sets the transformer’s spec, and the transformer’s Pod affinity, tolerations and node_selector fields
labels Dict[str, str] {} Sets labels for the Pod
node_selector Dict[str, str] {} Sets the node_selector for the Pod
affinity V1Affinity None Sets the affinity of the Pod
tolerations List[V1Tolerations] [] Sets tolerations for the Pod
protocol_version str None The protocol version of the predictor

Important

If you set any of the env, env_from, requests, limits, affinity or tolerations fields of the ServeConfig object, they populate the according predictor and transformer fields. This functionallity allows you to define values for both the predictor and transformer Pods and containers at the same time. For example, if you want the limits field to be equal to {"memory": "4Gi"} for both the predictor and transformer containers, the ServeConfig object can be the following:

serve_config = {"limits": {"memory": "4Gi"}}

Otherwise, you can set specific values to each Pod and container. If you want the limits field to be different for the predictor and transformer containers the ServeConfig object should be the following:

serve_config = {"predictor": { "container": { "resources": "limits": {"memory": "4Gi"}}}, "transformer": { "container": { "resources": "limits": {"memory": "2Gi"}}}}

The way each generic field gets populated is the following:

  • If a generic value is defined and a specific one is not, then the specific value gets populated with the generic one.
  • For the env, env_from and tolerations fields, if both the generic and specific fields are defined, then the two fields get merged.
  • For the affinity, request, limits and node_selector fields, if the specific field is defined, the generic one is ignored.

See also

In the table above, we also mention objects that are part of the Kubernetes Python client library, as well as the KServe Python client library. For details on the structure of the Kubernetes and KServe objects refer to:

Initialization

You may initialize a ServeConfig similarly to any other Python object:

config = ServeConfig(env=[V1EnvVar(name="ENV1", value="VALUE1")], labels={"significant-label": "a-value"}, runtime_version="2.6.2", predictor={ "tolerations": [V1Toleration(key="key1", operator="Exists", value="value1", effect="NoSchedule")], "container": V1Container(name="container_name", image="image_str")})

However, you can also initialize a field that expects Kubernetes objects by passing a dictionary, which Kale will then deserialize into the corresponding Kubernetes object. For example:

complex_env = {"name": "MY_POD_IP", "valueFrom": {"fieldRef": {"fieldPath": "status.podIP"}}} predictor_dict = { "affinity": { "nodeAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [{ "matchExpressions": [{"key": "disktype", "operator": "In", "values": ["ssd"]}]}]}}}, "tolerations": [{"key": "key1", "operator": "Exists", "value": "value1", "effect": "NoExecute"}], "node_selector": {"node": "node1"}, "containers": [{"env":[{"name": "name_str", "value": "value_str"}], "name": "container_name", "image": "image_str", "resources": {"limits": {"memory": "4Gi"}}}]} config = ServeConfig(env=[V1EnvVar(name="ENV", value="VALUE"), complex_env], limits={"cpu": "100m", "memory": "1Gi"}, node_selector={"node-id": "1234"}, predictor=predictor_dict)

To configure an Inference service using a ServeConfig object, you can pass it to the serve() function located in the same package:

from kale.serve import serve isvc = serve(model=model, serve_config=config)

To learn more about the frequent uses of the ServeConfig object you can follow the user guides for the supported ML frameworks. For example:

  • Use the ServeConfig object to retrieve a model stored in an external object storage service, like S3, by following the PyTorch and Triton user guides.
  • Use the ServeConfig object to serve custom predictors and transformers by following the user guides in the custom inference services section.
  • Use the ServeConfig object to configure common parameters for the predictor and transformer Pods by following the InferenceServe configuration user guide.