Pipelets Reference#

PipeletV1#

class PipeletV1(config)#

Baseclass for all Pipelets.

__init__(config)#

Constructor received the config from the source.

Parameters:

config (dict) – Pipelet configuration. Each option specified by getArguments() is included here as a key specified by the option’s name and the user-provided value.

consume(item)#

Called per item in the pipeline, should return the modified item or yield as many items as are necessary. None is also a valid return value, in which case the item will be discarded.

Parameters:

item (dict, as per item format) – The item to process.

Returns:

Modified item, a generator of items (using yield) or None.

Return type:

dict, generator of dict, or None

consume_multiple(items)#

The consume_multiple() method differs from consume() by taking multiple items instead of a single item as input, allowing the pipelet to perform work in batches. This can be useful when a backend is invoked over network, and can thus benefit from batching.

This works for server-side pipelets included in the pipeline only. Is not invoked when used by a client-side data loader (pipelets.json). If implemented and used server-side, consume_multiple() takes precedence over consume().

Parameters:

items (list of item format dict) – List of items

Return type:

None

Returns:

Nothing should be returned by this method

Example:

def consume_multiple(self, items):

    for item in items:
        # Process item. No need to return item.
List manipulation:

This method should modify the items directly. For this a few patterns may be useful:

  • Deleting an item: to delete a specific item use del. Example:

    del items[1]
    
  • Overwriting all items: it can be easier to create a new list, and then replace it. For this, you need to use the slice notation as follows:

    ret = [ my, new, item, list ]
    items[:] = ret
    

    Simply re-assigning items would not work, as that change would be local to the method only and not be returned to the Squirro pipeline.

static getArguments()#

Define a custom set of configuration properties, which will be exposed in the UI as a form. The method is expected to return an array of dictionaries defining the required properties.

See Custom Configuration for detailed information about how this is used.

These properties are used to form the config passed into __init__().

  • name - Identifier of the configuration.

  • type - Configuration type: bool, int, string, code or password.

  • display_label - Displayed label in UI.

and additional, non-required properties such as

  • default - Default value for configuration.

  • syntax - Syntax for configuration of type code.

  • help - Additional help to display.

  • advanced - Place configuration under advanced section.

Return type:

list of dict

Returns:

A list of configuration properties

Example:

@staticmethod
def getArguments():
    return [
        {
            "name": "prefix",
            "default": "Demo - ",
            "display_label": "Prefix",
            "help": "The title prefix for each item.",
            "type": "string"
        }
    ]

require#

require(thing, **kwargs)#

Decorator to manage dependency-injection.

Parameters:

thing (str) –

Dependency injection. Valid values are 'cache', 'requests', 'files', 'log'.

'cache'

Sets self.cache on the pipelet instance. This is currently simply a dict but will provide persistent caching across cluster nodes in the future.

'requests'

Makes the requests library available as self.requests to pipelet instance.

'files'

Provides access to the pipelet’s data files by instantiating an instance of FileReader the pipelet instance as self.files.

'log'

Makes a logging.Logger instance from Python’s standard logging framework available to the pipelet instance as self.log. Also, creates a structured logger from the structlog library as self.slog, and a dual logger as self.dlog which emits a message both as an unstructured and structured log.

FileReader#

class FileReader(root_folder)#

Inject component provided to access data files.

This class is used by squirro.sdk.require() when a files dependency is requested. It provides access to files and will work independently of which storage backend is used.

get_file(fname, mode='r')#

Returns a file object.

Parameters:
  • fname – Path to file.

  • mode – File access mode. Allowed modes are r and rb.

exists(fname)#

Checks if provided file path exists.

Parameters:

fname – Path to file.