Data Processing Pipeline#

This section outlines the various ways data is enriched, related, classified, indexed, and otherwise processed as part of the data processing pipeline.

Throughout these docs, you’ll see the data processing pipeline referred to simply as the pipeline.

Overview#

For an overview of the data processing pipeline architecture, sections, and processing, see the Pipeline Overview.

Pipeline Steps#

The data processing pipeline is a collection of built-in steps, with the option to create custom steps.

For information on built-in steps, see Built-In Steps.

For information on custom steps, see Custom Steps.

Pipeline Editor#

The Pipeline Editor allows you to configure pipeline steps directly within the Squirro user interface.

To learn more, see Pipeline Editor.

Pipeline Reruns#

There are situations where the pipeline will need to be rerun. This includes rerunning pipeline workflows, individual pipeline steps, and specific enrichments.

See Pipeline Reruns for more information.

Pipeline Prioritization#

Squirro supports prioritization for certain data sources, which can be configured within the user interface.

To learn more, see Pipeline Prioritization.

Content Streamers#

In Squirro, a content streamer is a component that provides the functionality for managing the inputstream.

Content streamers are used by any Squirro service (e.g., api.provider, ingester) that needs to interact with the inputstream.

For more information on this topic, see Content Streamers.

Indexing Common Document Formats#

Squirro’s pipeline supports the processing of complex document types, such as common office formats like Word or Excel files or PDFs.

See Indexing Common Formats for instructions on setting up the pipeline to achieve this.

Trend Detection#

Trend detection is used to find anomalies in time-series data.

For reference information about trend detection, see Trend Detection.

For step-by-step instructions on using trend detection, see data-processing-trend-how.

Reset Project#

If you need to reset your project, you can do so from the user interface.

See Reset Project for more information.

Processing Errors#

Some steps in the pipeline may abort with an error. In those cases, the item is tagged with a Processing Error keyword.

For information on these errors, see Processing Errors.