ML Enrichments for Pipeline Workflows

ML Enrichments for Pipeline Workflows#

Profiles: Data Scientist, Project Creator

Once your models are published, you can use them to enrich the data loaded from any source by adding the models to the pipeline workflow.

To do this, you need to edit the pipeline workflow where you would like to add your published model.

Tip: Once you have opened the pipeline overview, you can edit any pipeline workflow by clicking on the pencil symbol on the top right.

image1

This takes you to the following screen which shows you all steps that can be added to an existing workflow.

Under the Classify section, all published models are shown. Here, you can choose the model you would like to use to enrich your data and add it to the pipeline via drag and drop.

image2

Once the model has been added to the pipeline and saved, it will appear with all the other steps and you can now use this workflow to load new and existing data sources and enrich it with the output from your machine-learning model.

If you have published multiple models in AI Studio, Squirro also allows you to add multiple models in your pipeline as different steps.

Each of these models can serve a different purpose in enriching your data and can augment the insights for the end-users using Squirro’s dashboards.

image3

Optimization for Execution of Multiple ML Workflows#

You can optimize the execution of multiple, similar ML workflows in the enrichment pipeline.

Machine Learning (ML) workflows, like the AI Studio ML models, consist of multiple steps from libNLP.

The steps within a workflow typically include:

  • data loading

  • data cleaning

  • pre-processing

  • classification

  • generation of entities

In a pipeline with similar ML workflows, identical steps are likely contained in multiple workflows.

For example, the identical sentence splitting or PDF sentence tokenization step can appear in all or multiple ML workflows present in the pipeline.

To avoid multiple executions of identical steps and increase the performance of the processing pipeline, you can enable the optimization of ML workflow execution.

Configuration#

In the Server space under Configuration, set the machinelearning.optimize.workflows to true:

image4