DistributedConfig¶

DistributedConfig is a Kale object, which holds information relevant to the configuration of distributed training experiments.

Overview

Import
Attributes
Initialization

Import ¶

The object lives in the kale.distributed module. Import it as follows:

from kale.distributed import DistributedConfig

In the table above, we also mention objects that are part of the Kubernetes Python client library, as well as the Kubeflow Training Operator Python client library. For details on the structure of the Kubernetes objects please refer to the Official Python client library for Kubernetes. For details on the structure of the Kubeflow Training Operator objects please refer to the Official Python client library for the Kubeflow Training Operator.

Important

The container-level options that you set in the configuration object, such as env, labels, annotations, limits, requests, etc., are propagated to every container that is part of the distributed training process. Thus, the master and every worker pod will have the same container-level options.

Initialization ¶

You may initialize a DistributedConfig object similarly to any other Python object:

config = DistributedConfig(env=[V1EnvVar(name="ENV1", value="VALUE1")],
                           labels={"significant-label": "a-value"},
                           run_policy=V1RunPolicy(clean_pod_policy="All"))

However, you can also initialize a field that expects Kubernetes objects by passing a dictionary, which Kale will then deserialize into the corresponding Kubernetes object. For example:

complex_env = {"name": "MY_POD_IP",
               "valueFrom": {"fieldRef": {"fieldPath": "status.podIP"}}}
config = DistributedConfig(env=[V1EnvVar(name="ENV2", value="VALUE2"),
                                complex_env],
                           limits={"cpu": "100m", "memory": "1Gi"},
                           node_selector={"node-id": "1234"})

Name	Type	Default	Description
`env`	`List[V1EnvVar]`	`[]`	Extends the `env` field of a container
`env_from`	`List[V1EnvFromSource]`	`[]`	Extends the `envFrom` field of the container
`requests`	`Dict`	`{}`	Sets `resources.requests` for the container
`limits`	`Dict`	`{}`	Sets `resources.limits` for the container
`annotations`	`Dict`	`{}`	Sets `annotations` for the Pod
`labels`	`Dict`	`{}`	Sets `labels` for the Pod
`node_selector`	`Dict`	`{}`	Sets the `node_selector` for the Pod
`affinity`	`V1Affinity`	`None`	Sets the `affinity` of the Pod
`tolerations`	`List[V1Tolerations]`	`[]`	Sets `tolerations` for the Pod
`run_policy`	`Dict \| V1RunPolicy`	`None`	Encapsulates various runtime policies of the distributed training job

DistributedConfig¶

Import¶

Attributes¶

Initialization¶

Import ¶

Attributes ¶

Initialization ¶