SemanticTermExpander

class squirro.lib.nlp.apps.query_processing.SemanticTermExpander(config)

Bases: squirro.lib.nlp.steps.batched_step.BatchedStep

Find and add synonyms dynamically via (domain-specific) language models.

Provide pretrained models per language via:

Example usage:

  • Input query: why is austria again the virus center

  • Output term expansion mutations:

{
    "austria": "( austria OR germany^0.86 OR switzerland^0.86 )",
    "virus": "( virus OR h5n1^0.88 OR flu^0.88 )",
    "center": "( center OR centers^0.8 OR centre^0.78 )"
}

Annotated mutation dictionary can be used in a following step.

Parameters
  • step (str, "app") – app

  • type (str, "query_processing") – query_processing

  • name (str, "semantic_term_expander") – semantic_term_expander

  • analyzed_input_field (str, "nlp") – analyzed spacy Doc

  • output_field (str, "term_expansion_mutations") – the modified query string

  • model_mapping (dict) – mapping from language code to semantic-model-path to load

  • expand_by_pos (list, ["NOUN","PROPN"]) – list of POS tags that are used to expand query term

  • max_expansions (int, 3) – how many single tokens should be used for expansions

  • expansions_per_term (int, 2) – how many similar terms are added to a selected token (for expansion)

  • similarity_threshold (float, 0.5) – minimum similarity of user-token to model-token to be used for expansion

  • path (str, ".") – path

Methods Summary

get_expand_candidates(analyzed_query)

Currently only apply semantic term expansion on single term tokens // and selected POS type.

process_doc(doc)

Process a document

Methods Documentation

get_expand_candidates(analyzed_query)

Currently only apply semantic term expansion on single term tokens // and selected POS type.

process_doc(doc)

Process a document

Parameters

doc (Document) – Document

Returns

Processed document

Return type

Document