Model-as-a-Service#

This page provides an overview of creating, customizing, and prototyping machine learning (ML) models as part of Squirro’s Model-as-a-Service (MaaS).

Overview#

Model-as-a-Service (MaaS) allows Squirro customers to import custom ML models and speed up the prototyping phase for ML projects in Squirro.

Installation#

To use Maas, you must install the required packages from the Squirro mirror on your target Squirro server as follows:

yum install squirro-miniforge
yum install squirro-python38-mlflow

Prerequisites#

Before uploading a model, you must first create one.

For more information on MLFlow Models, see the official MLFlow Model site or an Example MLFlow Model.

Note: As an alternative to MaaS, you can create no-code models using the Squirro AI Studio. For a complete tutorial, see AI Studio.

Creating an MLFlow Model#

You can create an MLFlow Model in one of two ways:

  • train a MLFlow model on your local machine or on your exploration server

  • wrap an existing (pre-trained) model into the structure of a MLFlow Model and run it locally

Either way, MLFlow stores the (trained) model in the MLFlow base folder (mlruns/0/) with a unique hash (<HASH>) after executing the run-command.

The simplest structure of the MLFlow Model is as follows:

├── artifacts
│   └── model
│       ├── conda.yaml
│       ├── MLmodel
│       ├── python_model.pkl
│       └── requirements.txt
└── meta.yaml

MLFlow documentation#

See the (external) links below for information on creating MLFlow models.

Data Structure#

To use the MLFlow Model later in the context of a Squirro ML Workflow you need to stick to a specific data structure:

  • the input must be a pandas dataframe with an id and named feature fields as columns.

  • the output must be a pandas dataframe with an id and result fields as columns.

For more information about pandas, see the official pandas website.

Example:

  • input DataFrame

        id                         text
    0  id0  this is a example sentence.
    1  id1                 hello world.
    2  id2             random sentence.
    3  id3               test sentence.
    
  • output DataFrame

        id   class
    0  id0  class1
    1  id1  class0
    2  id2  class0
    3  id3  class1
    

Uploading a Model#

There are two ways to upload a MLFlow Model to Squirro:

  • Via squirro_asset CLI Reference (large models >500MB (exact number is under revision) can cause nginx issues → then use scp):

    • go into the MLFlow base folder

    • send the (trained) model via squirro_asset

      squirro_asset -vvv mlflow_models upload -t $TOKEN -c $CLUSTER -f mlruns/0/<HASH>/
      
  • Via scp:

    • go into the MLFlow base folder

    • verify that the destination directory exists (on the Squirro server)

      <BASE_DIR>=/var/lib/squirro/topic/assets/mlflow_models # default path
      mkdir -p <BASE_DIR>/mlruns/0
      
    • compress the directory with the (trained) model (wherever you have trained your model)

      cd mlruns/0/ && tar -czvf trained_model.tar.gz <HASH>/
      
    • send it to the MLFlow base folder on the Squirro server

      scp trained_model.zip <SQUIRRO_SERVER_URL>:/tmp/
      
    • ssh into the Squirro server and unzip the sent file

      cd <BASE_DIR>/mlruns/0 &&
      mv /tmp/trained_model.tar.gz <BASE_DIR>/mlruns/0/  #create the dirs if not existing
      tar -xzvf trained_model.tar.gz
      
    • adjust artifact_uri in the meta.yaml with the new path of the MLFlow Model (file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts)

      sed -i '/artifact_uri/c\artifact_uri: file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts' <HASH>/meta.yaml
      

Starting of Service#

To start a Model-as-a-Service follow the steps below:

  • make sure you are in the MLFlow base folder on the Squirro server

  • activate the Squirro environment

    squirro_activate3
    
  • serve the model identified by the <HASH> as a service listening to the chosen port <PORT>

    mlflow models serve -m runs:/<HASH>/model -p <PORT>
    
    • use nohup or screen when starting the service so the MaaS does not stop when you terminate your ssh session

Note

  • there is no service orchestration provided at this stage

  • keep an eye on memory and storage consumption. Then among others:

    • a started model service loads the model in memory and keeps it there

    • there is a new conda environment created for every new model which has a different conda.yaml file

  • on-premise customers need to manually package their conda environment. This can be done as explained here.

Using MaaS#

To use your model, you must first create a ML Workflow.

Your ML Workflow can then be used as an inference ML Job scheduled in an interval or as a published model in the enrich pipeline.

For more information on publishing models, see How To Publish ML Models Using the Squirro Client.

Example Workflows#

Below are document-level and sentence-level examples of ML Workflows.

Document Level

{
    "dataset": {
        "infer": {
            "count": 10,
            "query_string": "language:en"
        }
    },
    "pipeline": [
        {
            "fields": [
                "body"
            ],
            "step": "loader",
            "type": "squirro_query"
        },
        {
            "fields": [
                "body"
            ],
            "step": "filter",
            "type": "empty"
        },
        {
            "input_mapping": {
                "body":"text"
            },
            "output_mapping": {
                "class":"keywords.prediction"
            },
            "process_endpoint": "http://localhost:<PORT>/invocations",
            "name": "mlflow_maas",
            "step": "mlflow_maas",
            "type": "mlflow_maas"
        },
        {
            "fields": [
                "keywords.prediction"
            ],
            "step": "saver",
            "type": "squirro_item"
        }
    ]
}
  • Sentence Level (With Entity Generation)

    {
        "dataset": {
            "infer": {
                "count": 10,
                "query_string": "language:en"
            }
        },
        "pipeline": [
            {
                "fields": [
                    "body"
                ],
                "step": "loader",
                "type": "squirro_query"
            },
            {
                "fields": [
                    "body"
                ],
                "step": "filter",
                "type": "empty"
            },
            {
                "input_fields": [
                    "body"
                ],
                "output_fields": [
                    "extract_sentences"
                ],
                "step": "tokenizer",
                "type": "sentences_nltk"
            },
            {
                "fields": [
                    "extract_sentences"
                ],
                "step": "filter",
                "type": "doc_split"
            },
            {
                "input_mapping": {
                    "extract_sentences":"text"
                },
                "output_mapping": {
                    "class":"prediction"
                },
                "process_endpoint": "http://localhost:<PORT>/invocations",
                "name": "mlflow_maas",
                "step": "mlflow_maas",
                "type": "mlflow_maas"
            },
            {
                "fields": [
                    "extract_sentences",
                    "prediction"
                ],
                "step": "filter",
                "type": "doc_join"
            },
            {
                "entity_name_field": "Catalyst",
                "entity_type": "Catalyst",
                "excluded_values": [],
                "extract_field": "extract_sentences",
                "format_values": false,
                "global_property_field_map": {},
                "modes": [
                    "process"
                ],
                "property_field_map": {
                    "Catalyst": [
                        "prediction"
                    ]
                },
                "required_properties": [
                    "Catalyst"
                ],
                "source_field": "body",
                "step": "filter",
                "type": "squirro_entity"
            },
            {
                "fields": [
                    "entities"
                ],
                "step": "saver",
                "type": "squirro_item"
            }
        ]
    }