Example MLFlow Model#
This page explains how you can build a MLFlow Model on your local machine, using a random classifier as an example.
Introduction#
To use the MaaS component in Squirro you must first provide a MLFlow Model.
For examples of MLFlow Models, see the official MLFlow Documentation.
Creating a MLFlow Project#
Firstly, you must create a MLFlow Project, which is a folder that consists of the following files:
└── random_classifier
├── MLProject
├── conda.yaml
└── train.py
The MLProject
file contains metadata that allows MLFlow to operate with the project, as shown in the example below:
name: random_classifier
conda_env: conda.yaml
entry_points:
main:
command: "python train.py"
Next the conda.yml
defines all needed specification of the python environment:
name: random_classifier
channels:
- conda-forge
dependencies:
- python=3.9.7
- pip
- pip:
- mlflow>=1.0
Last but not least the train.py
, it contains the actual code which is needed to train & execute the model:
import random
import mlflow.pyfunc
import pandas as pd # noqa
from mlflow.models.signature import infer_signature
from mlflow.pyfunc import PythonModelContext
# Define the model class
class RandomClassifier(mlflow.pyfunc.PythonModel):
def __init__(self, classes):
self.classes = classes
def predict(self, context, model_input):
model_input["class"] = model_input["input"].apply(
lambda column: random.choice(self.classes)
)
return model_input
def main():
x = pd.read_json(
'{"columns":["id","input"],"data":[["a","input1"],["b","input2"],["c","input3"]]}', "split"
)
y = pd.read_json(
'{"columns":["id","input","class"],"data":[["a","input1","class1"],["b","input2","class2"],["c","input3","class3"]]}',
"split",
)
with mlflow.start_run():
model_path = "model"
model_signature = infer_signature(x, y)
random_model = RandomClassifier(classes=["class1", "class2", "class3"])
mlflow.pyfunc.log_model(
artifact_path=model_path,
python_model=random_model,
signature=model_signature,
)
data = pd.read_json(
'{"columns":["input"],"data":["test","test","text text"]}', "split"
)
print(data.to_string()) # noqa
random_model.predict(PythonModelContext([]), data)
print(data.to_string()) # noqa
if __name__ == "__main__":
main()
Note
It is important to highlight that the class needs to inherit from
mlflow.pyfunc.PythonModel
or another MLFlow built-in model flavor (external link).Further for it to work with in Squirro with the provided libNLP step
mlflow_maas
it needs to return theid
which is send on request time as explained. This can be assured withinfer_signature
as shown inline 32
.
Building of MLFlow Model#
After creation of such an MLFlow Project you can run it with the command:
mlflow run random_classifier # the name refers to the MLFlow Project folder
If the MLFLow Project would contain a training routine, then it would be started with executing the command above. In our case of a static classifier or in the case of wrapping a pre-trained ML model it just executes the provided code, tests it and creates the MLFlow Model which is needed later.
During the execution of the command above there is the following statement printed:
...
Model saved in run <MODEL_ID>
...
the <MODEL_ID>
is a hash and indicates where the MLFlow Model is stored. You can find it in the MLFlow base folder (mlruns/0/
). This location is needed to proceed with the instructions for the upload of the MLFlow Model to the Squirro server.