DistributedConfig is a Kale object, which holds information relevant to the configuration of distributed training experiments.


The object lives in the kale.distributed module. Import it as follows:

from kale.distributed import DistributedConfig


Name Type Default Description
env List[V1EnvVar] [] Extends the env field of a container
env_from List[V1EnvFromSource] [] Extends the envFrom field of the container
requests Dict {} Sets resources.requests for the container
limits Dict {} Sets resources.limits for the container
annotations Dict {} Sets annotations for the Pod
labels Dict {} Sets labels for the Pod
node_selector Dict {} Sets the node_selector for the Pod
affinity V1Affinity None Sets the affinity of the Pod
tolerations List[V1Tolerations] [] Sets tolerations for the Pod
run_policy Dict | V1RunPolicy None Encapsulates various runtime policies of the distributed training job


In the table above, we also mention objects that are part of the Kubernetes Python client library, as well as the Kubeflow Training Operator Python client library. For details on the structure of the Kubernetes objects please refer to the Official Python client library for Kubernetes. For details on the structure of the Kubeflow Training Operator objects please refer to the Official Python client library for the Kubeflow Training Operator.


The container-level options that you set in the configuration object, such as env, labels, annotations, limits, requests, etc., are propagated to every container that is part of the distributed training process. Thus, the master and every worker pod will have the same container-level options.


You may initialize a DistributedConfig object similarly to any other Python object:

config = DistributedConfig(env=[V1EnvVar(name="ENV1", value="VALUE1")], labels={"significant-label": "a-value"}, run_policy=V1RunPolicy(clean_pod_policy="All"))

However, you can also initialize a field that expects Kubernetes objects by passing a dictionary, which Kale will then deserialize into the corresponding Kubernetes object. For example:

complex_env = {"name": "MY_POD_IP", "valueFrom": {"fieldRef": {"fieldPath": "status.podIP"}}} config = DistributedConfig(env=[V1EnvVar(name="ENV2", value="VALUE2"), complex_env], limits={"cpu": "100m", "memory": "1Gi"}, node_selector={"node-id": "1234"})