Distributed Training¶

The Kale Distributed module provides the simplest way possible to distribute the training process of a Machine Learning (ML) model across multiple CPU cores or GPU devices. This guide provides simple examples for creating distributed jobs on a cluster using Kale and the Kubeflow Training Operator.

Overview