Filter#
- class Filter(config)#
Bases:
BatchedStep
The
Filter
steps take the stream ofDocument
and filter out specific entries based on the criteria of the step or perform reduction/enlargment operations on fields andDocument
.- Parameters:
step (str) – filter
mark_as_skipped (bool, False) – Keep track of rejected items by marking them as skipped. The default behaviour is to completely discard documents that are filtered out. With this property the document is kept in the pipeline but skipped (ignored) by most steps. Only steps that have handle_skipped settings can be set to process skipped documents.
Methods Summary
reject_doc
(doc)Helper to reject documents.
Methods Documentation
- reject_doc(doc)#
Helper to reject documents.
If mark_as_skipped is True, then the document is returned with skipped set to True. Otherwise nothing is returned.
Implementations use this by using the following line of code inside process_doc for any document that is to be filtered out:
return self.reject_doc(doc)