SquirroEntityFilter#
- class SquirroEntityFilter(config)#
Bases:
Filter
The Squirro entitiy
Filter
takes a set of fields and creates the Squirro entities structureNote - The
squirro.lib.nlp.steps.tokenizers.PdfSentencesTokenizer
need to be run before this step for PDF supportInput - There is no specific input field for this step.
Output - The output field is formatted as follows (page to rects is only produced for pdf files):
[ { "type": "ENTITY_TYPE", "name": "ENTITY_NAME", "extracts":[ { "text":"TEXT_FRAGMENT", "offset":"OFFSET", "length":"LENGTH" "page_to_rects":{ "PAGE_NUM":[{"x":"X", "y":"Y", "height":"H", "width":"W"}] }, } ], "properties":{ "PROPERTY_KEY":["PROPERTY_VALUE"]} } ]
- Parameters:
entity_name (str, None) – Name of entity (defaults to entity_type)
entity_name_field (str, None) – Field with entity name (defaults to entity_name if None)
entity_type (str) – Type of the squirro entity, value of the type field in the entity data structure
excluded_values (list, []) – Values that will not be added as entity properties
extract_field (str) – Field with list of text extracts
format_values (bool, False) – Whether or not to format string values as titles
global_property_field_map (dict, {}) – Map for fields with properties that are copied in from the item
output_field (str, 'entities') – Field to write resulting entities
property_field_map (dict, {}) – Map for fields with properties that match the number of extracts
property_value_map (dict, {}) – Map for renaming values of fields with properties that match the number of extracts
static_properties (dict, {}) – Map of static property values to attach to entities
required_properties (list, []) – Properties that must exist (after exclusion) for the entity to be added
source_field (str, '') – Field where extracted text originated
source_fields (list, []) – List of fields where extracted text originated
Example
{ "step": "filter", "type": "squirro_entity", "entity_name_field": "prediction", "entity_type": "ENTITY_TYPE", "excluded_values": [], "extract_field": "sentences", "format_values": false, "global_property_field_map": {}, "modes": ["process"], "property_field_map": { "PROPERTY_KEY": ["prediction"] }, "required_properties": ["PROPERTY_KEY"], "source_field": "body" }
Methods Summary
process_doc
(doc)Process a document
Methods Documentation