Development Workflow#

Profile: Project Creator, Python Engineer

For developing pipelets, Squirro provides the pipelet command line tool as part of the Squirro Toolbox.

Project creators and python engineers working from the command line can use the pipelet tool to create, validate, test and deploy pipelets.

Develop#

The first step is to create the pipelet. In the following examples the pipelet code was written to a file called pipelet.py in the current directory.

pipelet.py

from squirro.sdk import PipeletV1

class ModifyTitlePipelet(PipeletV1):
    def consume(self, item):
        item['title'] = item.get('title', '') + ' - Hello, World!'
        return item

Validate#

On the command line execute the pipelet validate command to verify that there are no errors in the pipelet code. This will ensure that no modules are imported that are disallowed from pipelets. See the section on Dependencies for more information.

pipelet validate pipelet.py

Test#

Use the pipelet consume command to simulate pipelet running. For this purpose, test items should be present in JSON text files on the disk. In the following example, there is a file called item.json located in the current directory with the content:

item.json

{
    "title": "Sample",
    "id": "first_item"
}

To test the pipelet with this test file, use:

pipelet consume pipelet.py -i item.json

This command will output the items that have been returned by the pipelet:

Loading items...
Loading item.json ...
Loaded.
Consuming item first_item
yielded item
{u'id': u'first_item', u'title': u'Sample - Hello, World!'}

On top of these manual tests, you can implement automated tests using the usual Python tools such as Nose.

Deploy#

Once the pipelet is ready, upload it to the Squirro server. The pipelet upload command achieves that:

pipelet upload --token <your_token> --cluster <cluster> pipelet.py "Hello World"

This will make the pipelet available with the name “Hello World”. In order to update the pipelet code on the server, you can re-executed this command at any time.

To use the pipelet in the pipeline of a project, you can use the Pipeline Editor UI to add the pipelet as an enrichment step to the selected pipeline workflow.

Using Additional Files with Pipelets#

In many cases, additional files like libraries and pre-trained models must be uploaded and used by a pipelet when it is run.

Using the following steps, you can upload files together with the pipelet and access them in the pipelet code:

  1. Specify the additional files in the pipelet upload command using the --data-file flag. For Example:

    Pipelet Upload Command

    pipelet upload \
        --data-file 'resource.txt' \
        --cluster <cluster> \
        --token <your_token> \
        'pipelet.py' \
        'TestPipelet'
    

    In this example, we are uploading a file “resource.txt” along with the pipelet.

  2. Access the contents of the file(s) from within the pipelet

    Pipelet File

    from squirro.sdk import require, PipeletV1
    
    @require('files')
    class TestPipelet(PipeletV1):
        def consume(self, item):
            with self.files.get_file('resource.txt') as f:
                data = f.read()