BlacklistFilter#

class BlacklistFilter(config)#

Bases: Filter

The blacklist Filter removes a Document or marks a Document to be skip based on a blacklist

Input - all input fields need to be of type str or list [ str ].

Output - No output specific output format due to removal or marking to skip of the whole Document

Parameters:
  • type (str) – blacklist

  • blacklists (dict) – Dictionary of fields to apply the blacklisting, each with a list of blacklisted values

Example

{
    "step": "filter",
    "type": "blacklist",
    "blacklists":{"body":["the","body","blacklist"]}
}

Methods Summary

process_doc(doc)

Process a document

Methods Documentation

process_doc(doc)#

Process a document

Parameters:

doc (Document) – Document

Returns:

Processed document

Return type:

Document