Pipelets#

This page provides an overview of pipelets, which are plugins to the Squirro pipeline used to customize data processing.

Overview#

When you set up a data source to ingest data into Squirro, you need to assign a pipeline workflow to that data source. The pipeline workflow defines a set of built-in pipeline steps that are applied to the data passing through the pipeline.

In addition to the built-in steps, you can add your own custom enrichment or processing steps, called pipelets.

https://s3.amazonaws.com/download.squirro.net/docs/technical/pipelets/pipelet_overview.png

The Pipelets Tutorial covers creating, uploading, and using pipelets in your workflow. Note that uploading pipelets requires server-level administrator privileges.

Pipelets and the Data-Loading Process#

Pipelets modify items. Squirro refers to data as items once it’s transformed into the consistent and predictable Squirro item format.

For this reason, you should always design your pipelets to work with data in the Squirro item format. You don’t have to worry about the transformation, however. The Squirro data loader tool automatically transforms all source data (data produced by data loader plugins) into the Squirro item format.

Development Workflow#

To add a pipelet to the pipeline, you must first upload the pipelet to the Squirro server. Then you’ll use the pipeline editor (Pipeline tab) to add the pipelet to the pipeline and, if required, change its configuration.

See Development Workflow for an overview of the pipelets workflow, from getting started with the pipelet command line tool to development, validation, testing, and deployment.

Creating Pipelets#

Squirro uses the Python programming language to develop pipelets.

For an overview of pipelet creation, see Writing Pipelets.

Rerunning a Pipelet#

Pipelets enrich items loaded after the pipelet is configured.

To run a pipelet on items already loaded into your project, see Rerunning a Pipelet.

Reference#

See Pipelets Reference for reference information including classes, parameters, and returns.

Troubleshooting#

See Troubleshooting and FAQ for troubleshooting, including frequently asked questions.