The data processing pipeline, referred to simply as the pipeline, with all of its processing steps, can be configured in the user interface using the Pipeline Editor.
This page provides an overview of how to do so.
Accessing the Pipeline Editor#
To access the pipeline editor, follow the steps below:
Click on the Squirro Spaces navigation menu.
By default, the pipeline view is displayed. This shows every configured pipeline workflow and each workflow’s various steps.
By clicking the pencil icon in the top right, the edit view is enabled. This is structured similarly to the dashboard editor.
The screen is split up into four areas:
The top toolbar, where the entire pipeline configuration can be saved and the editor can be closed.
The steps library on the left side, contains all available pipeline steps.
The configured pipeline workflows in the middle.
The properties of the selected pipeline workflow or step on the right.
The steps library contains all the steps that have been created on the Squirro instance. See Built-In Steps for a description of the built-in steps.
Additionally, developers can add custom steps using Pipelets.
The workflows are presented in the middle section. When the screen does not have enough space to show all workflows at the same time, the arrows can be used to scroll.
Clicking on the title of a workflow activates the pipeline properties, in which the name can be changed.
Next to the title a menu can be accessed which includes the main actions for editing pipelines:
Set as default: this will become the default pipeline for all newly created data sources.
Clone: create an exact duplicate of the pipeline workflow. Useful when a pipeline needs to be replicated with just small changes.
Delete: delete the pipeline. This is only available if no data source has been configured on the workflow.
Unlike other changes, deleting a pipeline workflow takes effect immediately, even when the Save button is not used.
Hovering over any of the steps provides a toolbar with the relevant actions, including editing and deleting, as well as extended documentation for the step.
The properties of the selected pipeline workflow or step are displayed in the properties bar and can be edited there. The number of type of properties available depend on the step. See Built-In Steps for a reference of the built-in steps and their properties.
Creating New Pipelines#
To create a new pipeline, click New Pipeline at the bottom of the steps library.
A number of presets are available that make it easy to get started with the right default steps:
Minimal: this is a good workflow for English-language content that can be uniquely deduplicated with an identifier (ID) from the source.
Standard: a safe default choice that also handles duplicates and will deal correctly with content of different languages.
Binary Documents: extracts content and generally handles the indexing of binary documents, such as PDF or Office formats.
Adding Pipeline Steps#
To add a pipeline step to a workflow, first make sure the workflow is in view by using the left/right arrow buttons.
Then drag the selected step to the workflow section. All the steps except those from the Flow and Custom sections are restricted to their corresponding section. To show this, the section will be surrounded by a border to highlight where a step can be moved.
When the step has been dragged to a valid location, a blue highlight box will appear. The step can now be placed by releasing the mouse button.
To provide custom configuration to the step, use the pencil icon in the step toolbar.