WhitelistFilter#

class WhitelistFilter(config)#

Bases: Filter

The whitelist Filter removes a Document or marks a Document to be skip based on a whitelist

Input - all input fields need to be of type str or list [ str ].

Output - No output specific output format due to removal or marking to skip of the whole Document

Parameters:
  • type (str) – whitelist

  • replacements (dict, {}) – Dictionary of replacement values for non-whitelisted field values

  • whitelists (dict) – Dictionary of fields to apply the whitelisting, each with a list of whitelisted values

  • is_unique (bool, True) – Whether values are unique or not

Example

{
    "step": "filter",
    "type": "whitelist",
    "whitelists":{"body":["the","body","whitelists"]}
}

Methods Summary

process_doc(doc)

Process a document

Methods Documentation

process_doc(doc)#

Process a document

Parameters:

doc (Document) – Document

Returns:

Processed document

Return type:

Document