Kale Transformer Artifact

kale.Transformer is an MLMD ArtifactType which allows the logging of transformer components in MLMD.

Import

The object lives in the kale.common.artifacts module. Import it as follows:

from kale.common.artifacts import Transformer

Attributes

Name Type Default Description
name str None The name of the transformer
transformer_dir str None The path in which Kale stores the transformer package (assets and functions)
module_name str None The name of the module you define the transformer class
class_name str None The name of the transformer class
is_statefull boolean False If the transformer is written for a specific dataset is True, otherwise is False
preprocess function None The preprocess function the transformer will use
postprocess function None The postprocess function the transformer will use
transformer_assets Dict[str, variable] None Any global variables that the transformer may depend on
is_statefull boolean False If the transformer depends on stateful global variables, for example a word vectorizer, is True

Initialization

There are two APIs that you can use to create a kale.Transformer artifact:

  • The Subclassing API
  • The Functional API

Important

The two APIs are mutually exclusive. You can use one or the other, but not both.

Choose one of the following options, based on the API you want to use.

To use the Subclassing API, create a Python module, that is, a .py file, that defines the transformer object. The transformer object extends the kserve.Model class, that KServe provides, and overrides the preprocess and postprocess methods. For example:

import joblib import kserve from kale.serve import utils from typing import Dict class_names = [...] class Transformer(kserve.Model): """Transform the data.""" def __init__(self, name: str, predictor_host: str, protocol: str = "v1"): super().__init__(name) self.predictor_host = predictor_host serf.protocol = protocol def preprocess(self, inputs: Dict): """Preprocess the dataset.""" transformed_data = ... return {'instances': transformed_data.tolist()} def postprocess(self, inputs: Dict): """Postprocess the predictions""" return {"predictions": [class_names[i] for i in inputs["predictions"]]}

Define a kale.Transformer artifact and pass the name of the folder containing the transformer module you defined previously to the transformer_dir attribute. Also, pass the name of the module, for example transformer, as well as the name of the class, for example Transformer.

Important

Always use absolute paths when specifying the path to the tranformer folder.

See also

transformer_artifact = artifacts.Transformer( name="transformer", transformer_dir=<path-to-the-transformer-module>, module_name="transformer", class_name="Transformer", is_stateful=True)

To use the Functional API, provide the preprocess and postprocess functions that the transformer will use. For example:

Important

The preprocess and postprocess functions must be standalone functions. Thus, import all the Python modules the function depends on within its body.

def _preprocess(inputs): """Preprocess the dataset.""" transformed_data = ... return {'instances': transformed_data.toarray().tolist()} def _postprocess(inputs): """Postprocess the predictions""" class_names = [...] return {"predictions": [class_names[i] for i in inputs["predictions"]]} transformer_artifact = artifacts.Transformer( name="Vectorizer", preprocess=_preprocess, postprocess=_postprocess, transformer_assets = {"vectorizer": vectorizer}, is_stateful=True)