Katib Concepts

In this section, we describe the key concepts and terminology for performing automated hyperparameter tuning using Katib.

Experiment

An experiment is a single tuning run, also called an optimization run.

You specify configuration settings to define the experiment. The following are the main configurations:

  • Objective: What you want to optimize. This is the objective metric, also called the target variable. A common metric is the model’s accuracy in the validation pass of the training job (validation-accuracy). You also specify whether you want the hyperparameter tuning job to maximize or minimize the metric.
  • Search space: The set of all possible hyperparameter values that the hyperparameter tuning job should consider for optimization, and the constraints for each hyperparameter. Other names for search space include feasible set and solution space. For example, you may provide the names of the hyperparameters that you want to optimize. For each hyperparameter, you may provide a minimum and maximum value or a list of allowable values.
  • Search algorithm: The algorithm to use when searching for the optimal hyperparameter values.

Trial

A trial is one iteration of the hyperparameter tuning process. A trial corresponds to one job instance with a list of parameter assignments. The list of parameter assignments corresponds to a suggestion.

Each experiment runs several trials. The experiment runs the trials until it reaches either the objective or the configured maximum number of trials.

Job

A job is the process that runs to evaluate a trial and calculate its objective value. A job can be any type of Kubernetes resource or Kubernetes CRD.

Katib supports the following job types:

  • Kubernetes Job
  • Kubeflow TFJob
  • Kubeflow PyTorchJob
  • Kubeflow MPIJob
  • Kubeflow XGBoostJob
  • Tekton Pipelines
  • Argo Workflows

By offering the above job types, Katib supports multiple ML frameworks.