Pipelets and the Dataloader#
Profile: Project Creator
This page describes how to use pipelets with the Data Loader CLI Tool.
Project creators working from the command line can use the --pipelets-file
and --pipelets-error-behavior
arguments to specify pipelets and their behaviour.
Overview#
The Data Loader CLI Tool can run multiple pipelets before or after the templating step.
The pipelets are instantiated once for each parallel process, but the code inside is executed once for each item.
Note: It is not recommended to open files in the consume method of the pipelet, you should do it in init().
–pipelets-file#
A pipelet’s location, input arguments, and execution order are described in a JSON file supplied as an argument for the loader script --pipelets-file
.
Reference: See Labels for more information.
–pipelets-error-behavior#
Also, the --pipelets-error-behavior
specifies the job’s behaviour in situations where a pipelet raises an exception.
The default is error
. In case of error
, the load will stop. The other option is warning
, which will only log the warning and continue the load.
squirro_data_load -v ^
--token %token% ^
--cluster %cluster% ^
--project-id %project_id% ^
--source-name csv_interactions ^
--source-type csv ^
--map-title InteractionSubject ^
--source-file interaction.csv ^
--facets-file config/sample_facets.json ^
--body-template-file template/template_body_interaction.html ^
--pipelets-error-behavior error ^
--pipelets-file config/sample_pipelets.json
Note: The lines have been wrapped with the circumflex ^
at the end of each line. On Mac and Linux you will need to use backslash \
instead.
Configuration#
Use the following pipelets config file:
config/sample_pipelets.json
{
"Sample": {
"file_location": "sample_pipelet.py",
"config": {}
}
}
And in Python pipelet script, consume method just adds the name Squirro
to the title of the item :
sample_pipelet.py
import squirro.sdk
VERSION = '0.0.1'
@squirro.sdk.require('log')
class Sample(squirro.sdk.PipeletV1):
def __init__(self, config):
self.config = config
def consume(self, item):
item['title'] = item.get('title', '') + ' - Squirro'
return item
The result of the example pipelet is shown below: