
This page provides an overview of creating, customizing, and prototyping machine learning (ML) models as part of Squirro’s Model-as-a-Service (MaaS).


Model-as-a-Service (MaaS) allows Squirro customers to import custom ML models and speed up the prototyping phase for ML projects in Squirro.


To use Maas, you must install the required packages from the Squirro mirror on your target Squirro server as follows:

yum install squirro-miniforge
yum install squirro-python38-mlflow


Before uploading a model, you must first create one.

For more information on MLFlow Models, see the official MLFlow Model site or an Example MLFlow Model.

Note: As an alternative to MaaS, you can create no-code models using the Squirro AI Studio. For a complete tutorial, see AI Studio.

Creating an MLFlow Model#

You can create an MLFlow Model in one of two ways:

  • train a MLFlow model on your local machine or on your exploration server

  • wrap an existing (pre-trained) model into the structure of a MLFlow Model and run it locally

Either way, MLFlow stores the (trained) model in the MLFlow base folder (mlruns/0/) with a unique hash (<HASH>) after executing the run-command.

The simplest structure of the MLFlow Model is as follows:

├── artifacts
│   └── model
│       ├── conda.yaml
│       ├── MLmodel
│       ├── python_model.pkl
│       └── requirements.txt
└── meta.yaml

MLFlow documentation#

See the (external) links below for information on creating MLFlow models.

Data Structure#

To use the MLFlow Model later in the context of a Squirro ML Workflow you need to stick to a specific data structure:

  • the input must be a pandas dataframe with an id and named feature fields as columns.

  • the output must be a pandas dataframe with an id and result fields as columns.

For more information about pandas, see the official pandas website.


  • input DataFrame

        id                         text
    0  id0  this is a example sentence.
    1  id1                 hello world.
    2  id2             random sentence.
    3  id3               test sentence.
  • output DataFrame

        id   class
    0  id0  class1
    1  id1  class0
    2  id2  class0
    3  id3  class1

Uploading a Model#

There are two ways to upload a MLFlow Model to Squirro:

  • Via squirro_asset CLI Reference (large models >500MB (exact number is under revision) can cause nginx issues → then use scp):

    • go into the MLFlow base folder

    • send the (trained) model via squirro_asset

      squirro_asset -vvv mlflow_models upload -t $TOKEN -c $CLUSTER -f mlruns/0/<HASH>/
  • Via scp:

    • go into the MLFlow base folder

    • verify that the destination directory exists (on the Squirro server)

      <BASE_DIR>=/var/lib/squirro/topic/assets/mlflow_models # default path
      mkdir -p <BASE_DIR>/mlruns/0
    • compress the directory with the (trained) model (wherever you have trained your model)

      cd mlruns/0/ && tar -czvf trained_model.tar.gz <HASH>/
    • send it to the MLFlow base folder on the Squirro server

      scp <SQUIRRO_SERVER_URL>:/tmp/
    • ssh into the Squirro server and unzip the sent file

      cd <BASE_DIR>/mlruns/0 &&
      mv /tmp/trained_model.tar.gz <BASE_DIR>/mlruns/0/  #create the dirs if not existing
      tar -xzvf trained_model.tar.gz
    • adjust artifact_uri in the meta.yaml with the new path of the MLFlow Model (file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts)

      sed -i '/artifact_uri/c\artifact_uri: file:///<BASE_DIR>/mlruns/0/<HASH>/artifacts' <HASH>/meta.yaml

Starting of Service#

To start a Model-as-a-Service follow the steps below:

  • make sure you are in the MLFlow base folder on the Squirro server

  • activate the Squirro environment

  • serve the model identified by the <HASH> as a service listening to the chosen port <PORT>

    mlflow models serve -m runs:/<HASH>/model -p <PORT>
    • use nohup or screen when starting the service so the MaaS does not stop when you terminate your ssh session


  • there is no service orchestration provided at this stage

  • keep an eye on memory and storage consumption. Then among others:

    • a started model service loads the model in memory and keeps it there

    • there is a new conda environment created for every new model which has a different conda.yaml file

  • on-premise customers need to manually package their conda environment. This can be done as explained here.

Using MaaS#

To use your model, you must first create a ML Workflow.

Your ML Workflow can then be used as an inference ML Job scheduled in an interval or as a published model in the enrich pipeline.

For more information on publishing models, see How To Publish ML Models Using the Squirro Client.

Example Workflows#

Below are document-level and sentence-level examples of ML Workflows.

Document Level

    "dataset": {
        "infer": {
            "count": 10,
            "query_string": "language:en"
    "pipeline": [
            "fields": [
            "step": "loader",
            "type": "squirro_query"
            "fields": [
            "step": "filter",
            "type": "empty"
            "input_mapping": {
            "output_mapping": {
            "process_endpoint": "http://localhost:<PORT>/invocations",
            "name": "mlflow_maas",
            "step": "mlflow_maas",
            "type": "mlflow_maas"
            "fields": [
            "step": "saver",
            "type": "squirro_item"
  • Sentence Level (With Entity Generation)

        "dataset": {
            "infer": {
                "count": 10,
                "query_string": "language:en"
        "pipeline": [
                "fields": [
                "step": "loader",
                "type": "squirro_query"
                "fields": [
                "step": "filter",
                "type": "empty"
                "input_fields": [
                "output_fields": [
                "step": "tokenizer",
                "type": "sentences_nltk"
                "fields": [
                "step": "filter",
                "type": "doc_split"
                "input_mapping": {
                "output_mapping": {
                "process_endpoint": "http://localhost:<PORT>/invocations",
                "name": "mlflow_maas",
                "step": "mlflow_maas",
                "type": "mlflow_maas"
                "fields": [
                "step": "filter",
                "type": "doc_join"
                "entity_name_field": "Catalyst",
                "entity_type": "Catalyst",
                "excluded_values": [],
                "extract_field": "extract_sentences",
                "format_values": false,
                "global_property_field_map": {},
                "modes": [
                "property_field_map": {
                    "Catalyst": [
                "required_properties": [
                "source_field": "body",
                "step": "filter",
                "type": "squirro_entity"
                "fields": [
                "step": "saver",
                "type": "squirro_item"