ThresholdFilter

class squirro.lib.nlp.steps.filters.ThresholdFilter(config)

Bases: squirro.lib.nlp.steps.filters.Filter

The threshold Filter removes weighted facets with values below threshold.

Input - all input fields need to be of type list [dict] or dict containing { str : int or float}.

Output - the output fields data format differs in the step config:
  • skip:True: No specific output format due to removal or marking to skip of the whole Document if threshold isn’t met.

  • flatten:False: all output fields are filled with data of type list [dict] or dict containing {str : int or float}.

  • flatten:True: all output fields are filled with data of type list [str] or str.

Parameters
  • type (str) – threshold

  • field (str, None) – Field to threshold

  • fields (list, None) – List of fields to threshold

  • input_fields (list, None) – List of fields to threshold from

  • output_fields (list, None) – List of fields to threshold to

  • flatten (bool, False) – Whether or not to flatten the thresholded fields respectively remove the value of the dict in the field.

  • excluded_values (list, []) – Values that will not count towards the threshold.

  • threshold (float, 0.5) – Value under which the field will be removed.

  • skip (bool, True) – Whether or not to skip a doc below threshold.

Example

{
    "step": "filter",
    "type": "threshold",
    "fields": ["prediction"],
    "flatten": true,
    "threshold": 0.5
}

Methods Summary

process_doc(doc)

Process a document

Methods Documentation

process_doc(doc)

Process a document

Parameters

doc (Document) – Document

Returns

Processed document

Return type

Document