Pipelets Reference#

PipeletV1#

class PipeletV1(config)#

Baseclass for all Pipelets.

__init__(config)#

Constructor received the config from the source.

Parameters

config (dict) – Pipelet configuration. Each option specified by getArguments() is included here as a key specified by the option’s name and the user-provided value.

consume(item)#

Called per item in the pipeline, should return the modified item or yield as many items as are necessary. None is also a valid return value, in which case the item will be discarded.

Parameters

item (dict, as per item format) – The item to process.

Returns

Modified item, a generator of items (using yield) or None.

Return type

dict, generator of dict, or None

consume_multiple(items)#

The consume_multiple() method differs from consume() by taking multiple items instead of a single item as input, allowing the pipelet to perform work in batches. This can be useful when a backend is invoked over network, and can thus benefit from batching.

This works for server-side pipelets included in the pipeline only. Is not invoked when used by a client-side data loader (pipelets.json). If implemented and used server-side, consume_multiple() takes precedence over consume().

Parameters

items (list of item format dict) – List of items

Return type

None

Returns

Nothing should be returned by this method

Example

def consume_multiple(self, items):

    for item in items:
        # Process item. No need to return item.
List manipulation

This method should modify the items directly. For this a few patterns may be useful:

  • Deleting an item: to delete a specific item use del. Example:

    del items[1]
    
  • Overwriting all items: it can be easier to create a new list, and then replace it. For this, you need to use the slice notation as follows:

    ret = [ my, new, item, list ]
    items[:] = ret
    

    Simply re-assigning items would not work, as that change would be local to the method only and not be returned to the Squirro pipeline.

static getArguments()#

Define a custom set of configuration properties, which will be exposed in the UI as a form. The method is expected to return an array of dictionaries defining the required properties.

See Custom Configuration for detailed information about how this is used.

These properties are used to form the config passed into __init__().

  • name - Identifier of the configuration.

  • type - Configuration type: bool, int, string, code or password.

  • display_label - Displayed label in UI.

and additional, non-required properties such as

  • default - Default value for configuration.

  • syntax - Syntax for configuration of type code.

  • help - Additional help to display.

  • advanced - Place configuration under advanced section.

Return type

list of dict

Returns

A list of configuration properties

Example

@staticmethod
def getArguments():
    return [
        {
            "name": "prefix",
            "default": "Demo - ",
            "display_label": "Prefix",
            "help": "The title prefix for each item.",
            "type": "string"
        }
    ]

require#

require(thing, **kwargs)#

Decorator to manage dependency-injection.

Parameters

thing (str) –

Dependency injection. Valid values are 'cache', 'requests', 'files', 'log'.

'cache'

Sets self.cache on the pipelet instance. This is currently simply a dict but will provide persistent caching across cluster nodes in the future.

'requests'

Makes the requests library available as self.requests to pipelet instance.

'files'

Provides access to the pipelet’s data files by instantiating an instance of FileReader the pipelet instance as self.files.

'log'

Makes a logging.Logger instance from Python’s standard logging framework available to the pipelet instance as self.log.

FileReader#

class FileReader(root_folder)#

Inject component provided to access data files.

This class is used by squirro.sdk.require() when a files dependency is requested. It provides access to files and will work independently of which storage backend is used.

get_file(fname, mode='r')#

Returns a file object.

Parameters
  • fname – Path to file.

  • mode – File access mode. Allowed modes are r and rb.

exists(fname)#

Checks if provided file path exists.

Parameters

fname – Path to file.