ProximityFilter

ProximityFilter#

class ProximityFilter(config)#

Bases: RegexFilter

The proximity RegexFilter filters all Document by checking if terms exist within a specified proximity.

Note
  • The format for proximity rules is based on the Squirro Phrase Search syntax (e.g. "issue shares~6"). To make the search one directional append "|" (e.g. "issue shares~6|"). It is possible to use more than 2 terms where the distance is used between consecutive terms, or use a single exact match term without proximity distance.

  • Expressions are case-insensitive.

  • There is a max of 20 words per rule to limit the complexity of the regex

Input - all input fields needs to be of type str.

Output - the output field is filled with data of type str.

Parameters:
  • type (str) – proximity

  • blacklist_terms (list, []) – List of blacklist proximity terms

  • whitelist_terms (list, []) – List of whitelist proximity terms

Example

{
    "step": "filter",
    "type": "proximity",
    "fields": ["body"],
    "matching_label": "m&a",
    "output_field": "prediction",
    "whitelist_terms": ["appoint CEO~3"]
}

Attributes Summary

Attributes Documentation

REG_FLAGS = 2#