ML Enrichments for Pipeline Workflows#
Profiles: Data Scientist, Project Creator
Once your models are published, you can use them to enrich the data loaded from any source by adding the models to the pipeline workflow.
To do this, you need to edit the pipeline workflow where you would like to add your published model.
Tip: Once you have opened the pipeline overview, you can edit any pipeline workflow by clicking on the pencil symbol on the top right.
This takes you to the following screen which shows you all steps that can be added to an existing workflow.
Under the Classify section, all published models are shown. Here, you can choose the model you would like to use to enrich your data and add it to the pipeline via drag and drop.
Once the model has been added to the pipeline and saved, it will appear with all the other steps and you can now use this workflow to load new and existing data sources and enrich it with the output from your machine-learning model.
If you have published multiple models in AI Studio, Squirro also allows you to add multiple models in your pipeline as different steps.
Each of these models can serve a different purpose in enriching your data and can augment the insights for the end-users using Squirro’s dashboards.
Optimization for Execution of Multiple ML Workflows#
You can optimize the execution of multiple, similar ML workflows in the enrichment pipeline.
Machine Learning (ML) workflows, like the AI Studio ML models, consist of multiple steps from libNLP.
The steps within a workflow typically include:
data loading
data cleaning
pre-processing
classification
generation of entities
In a pipeline with similar ML workflows, identical steps are likely contained in multiple workflows.
For example, the identical sentence splitting or PDF sentence tokenization step can appear in all or multiple ML workflows present in the pipeline.
To avoid multiple executions of identical steps and increase the performance of the processing pipeline, you can enable the optimization of ML workflow execution.
Configuration#
In the Server space under Configuration, set the machinelearning.optimize.workflows
to true: