Pipelets Reference#
PipeletV1#
- class PipeletV1(config)#
Baseclass for all Pipelets.
- __init__(config)#
Constructor received the config from the source.
- Parameters:
config (dict) – Pipelet configuration. Each option specified by
getArguments()
is included here as a key specified by the option’s name and the user-provided value.
- consume(item)#
Called per item in the pipeline, should return the modified item or yield as many items as are necessary. None is also a valid return value, in which case the item will be discarded.
- Parameters:
item (dict, as per item format) – The item to process.
- Returns:
Modified item, a generator of items (using yield) or None.
- Return type:
dict, generator of dict, or None
- consume_multiple(items)#
The
consume_multiple()
method differs fromconsume()
by taking multipleitems
instead of a single item as input, allowing the pipelet to perform work in batches. This can be useful when a backend is invoked over network, and can thus benefit from batching.This works for server-side pipelets included in the pipeline only. Is not invoked when used by a client-side data loader (pipelets.json). If implemented and used server-side,
consume_multiple()
takes precedence overconsume()
.- Parameters:
items (
list
of item formatdict
) – List of items- Return type:
None
- Returns:
Nothing should be returned by this method
- Example:
def consume_multiple(self, items): for item in items: # Process item. No need to return item.
- List manipulation:
This method should modify the
items
directly. For this a few patterns may be useful:Deleting an item: to delete a specific item use
del
. Example:del items[1]
Overwriting all items: it can be easier to create a new list, and then replace it. For this, you need to use the slice notation as follows:
ret = [ my, new, item, list ] items[:] = ret
Simply re-assigning
items
would not work, as that change would be local to the method only and not be returned to the Squirro pipeline.
- static getArguments()#
Define a custom set of configuration properties, which will be exposed in the UI as a form. The method is expected to return an array of dictionaries defining the required properties.
See Custom Configuration for detailed information about how this is used.
These properties are used to form the
config
passed into__init__()
.name
- Identifier of the configuration.type
- Configuration type:bool
,int
,string
,code
orpassword
.display_label
- Displayed label in UI.
and additional, non-required properties such as
default
- Default value for configuration.syntax
- Syntax for configuration of typecode
.help
- Additional help to display.advanced
- Place configuration under advanced section.
- Return type:
list
ofdict
- Returns:
A list of configuration properties
- Example:
@staticmethod def getArguments(): return [ { "name": "prefix", "default": "Demo - ", "display_label": "Prefix", "help": "The title prefix for each item.", "type": "string" } ]
require#
- require(thing, **kwargs)#
Decorator to manage dependency-injection.
- Parameters:
thing (str) –
Dependency injection. Valid values are
'cache'
,'requests'
,'files'
,'log'
.'cache'
Sets
self.cache
on the pipelet instance. This is currently simply a dict but will provide persistent caching across cluster nodes in the future.'requests'
Makes the requests library available as
self.requests
to pipelet instance.'files'
Provides access to the pipelet’s data files by instantiating an instance of
FileReader
the pipelet instance asself.files
.'log'
Makes a
logging.Logger
instance from Python’s standard logging framework available to the pipelet instance asself.log
. Also, creates a structured logger from thestructlog
library asself.slog
, and a dual logger asself.dlog
which emits a message both as an unstructured and structured log.
FileReader#
- class FileReader(root_folder)#
Inject component provided to access data files.
This class is used by
squirro.sdk.require()
when afiles
dependency is requested. It provides access to files and will work independently of which storage backend is used.- get_file(fname, mode='r')#
Returns a file object.
- Parameters:
fname – Path to file.
mode – File access mode. Allowed modes are
r
andrb
.
- exists(fname)#
Checks if provided file path exists.
- Parameters:
fname – Path to file.