RegexFilter#
- class RegexFilter(config)#
Bases:
Filter
The regex
Filter
filters documents based on a supplied list of blacklist and whitelist regexesInput - all input fields needs to be of type
str
.Output - (optional) the output field is filled with data of type
str
.- Parameters
type (str) – regex
blacklist_regexes (list, []) – List of blacklist regexes to apply
fields (list) – Fields to apply regexes
output_field (str, None) – Field to record if regex matches
matching_label (str, 'match') – Label given if regex matches
non_matching_label (str, 'no_match') – Label given if regex does not match
whitelist_regexes (list, []) – List of whitelist regexes to apply
rule_field (str, None) – Field to record the rule which triggered the match (manly used in the context of proximity filters)
no_rule_matched_label (str, 'NO_RULE_MATCHED') – Rule given if regex does not match (manly used in the context of proximity filters)
default_language (str, 'en') – Default language if language_field is not present.
language_field (str, 'language') – Document field that gives the language.
Example
{ "step": "filter", "type": "regex", "fields": ["body"], "mark_as_skipped": true, "whitelist_regexes": ["^.{20,}$"] }
Attributes Summary
Methods Summary
process_doc
(doc)Process a document
Attributes Documentation
- REG_FLAGS = 0#
Methods Documentation