Example MLFlow Model#

This page explains how you can build a MLFlow Model on your local machine, using a random classifier as an example.

Introduction#

To use the MaaS component in Squirro you must first provide a MLFlow Model.

For examples of MLFlow Models, see the official MLFlow Documentation.

Creating a MLFlow Project#

Firstly, you must create a MLFlow Project, which is a folder that consists of the following files:

└── random_classifier
    ├── MLProject
    ├── conda.yaml
    └── train.py

The MLProject file contains metadata that allows MLFlow to operate with the project, as shown in the example below:

name: random_classifier

conda_env: conda.yaml

entry_points:
  main:
    command: "python train.py"

Next the conda.yml defines all needed specification of the python environment:

name: random_classifier
channels:
  - conda-forge
dependencies:
  - python=3.9.7
  - pip
  - pip:
    - mlflow>=1.0

Last but not least the train.py, it contains the actual code which is needed to train & execute the model:

import random

import mlflow.pyfunc
import pandas as pd  # noqa
from mlflow.models.signature import infer_signature
from mlflow.pyfunc import PythonModelContext


# Define the model class
class RandomClassifier(mlflow.pyfunc.PythonModel):
    def __init__(self, classes):
        self.classes = classes

    def predict(self, context, model_input):
        model_input["class"] = model_input["input"].apply(
            lambda column: random.choice(self.classes)
        )
        return model_input


def main():
    x = pd.read_json(
        '{"columns":["id","input"],"data":[["a","input1"],["b","input2"],["c","input3"]]}', "split"
    )
    y = pd.read_json(
        '{"columns":["id","input","class"],"data":[["a","input1","class1"],["b","input2","class2"],["c","input3","class3"]]}',
        "split",
    )

    with mlflow.start_run():
        model_path = "model"
        model_signature = infer_signature(x, y)
        random_model = RandomClassifier(classes=["class1", "class2", "class3"])
        mlflow.pyfunc.log_model(
            artifact_path=model_path,
            python_model=random_model,
            signature=model_signature,
        )

    data = pd.read_json(
        '{"columns":["input"],"data":["test","test","text text"]}', "split"
    )
    print(data.to_string())  # noqa
    random_model.predict(PythonModelContext([]), data)
    print(data.to_string())  # noqa


if __name__ == "__main__":
    main()

Note

  • It is important to highlight that the class needs to inherit from mlflow.pyfunc.PythonModel or another MLFlow built-in model flavor (external link).

  • Further for it to work with in Squirro with the provided libNLP step mlflow_maas it needs to return the id which is send on request time as explained. This can be assured with infer_signature as shown in line 32.

Building of MLFlow Model#

After creation of such an MLFlow Project you can run it with the command:

mlflow run random_classifier # the name refers to the MLFlow Project folder

If the MLFLow Project would contain a training routine, then it would be started with executing the command above. In our case of a static classifier or in the case of wrapping a pre-trained ML model it just executes the provided code, tests it and creates the MLFlow Model which is needed later.

During the execution of the command above there is the following statement printed:

...
Model saved in run <MODEL_ID>
...

the <MODEL_ID> is a hash and indicates where the MLFlow Model is stored. You can find it in the MLFlow base folder (mlruns/0/). This location is needed to proceed with the instructions for the upload of the MLFlow Model to the Squirro server.