SemanticTermExpander

SemanticTermExpander#

class SemanticTermExpander(config)#

Bases: BatchedStep

Find and add synonyms dynamically via (domain-specific) language models.

Provide pretrained models per language via:

path to file-system
or pretrained models available from gensim-data RaRe-Technologies/gensim-data

Example usage:

Input query: why is austria again the virus center
Output term expansion mutations:

{
    "austria": "( austria OR germany^0.86 OR switzerland^0.86 )",
    "virus": "( virus OR h5n1^0.88 OR flu^0.88 )",
    "center": "( center OR centers^0.8 OR centre^0.78 )"
}

Annotated mutation dictionary can be used in a following step.

Parameters:

step (str, "app") – app
type (str, "query_processing") – query_processing
name (str, "semantic_term_expander") – semantic_term_expander
analyzed_input_field (str, "nlp") – analyzed spacy Doc
output_field (str, "term_expansion_mutations") – the modified query string
model_mapping (dict) – mapping from language code to semantic-model-path to load
expand_by_pos (list, ["NOUN","PROPN"]) – list of POS tags that are used to expand query term
max_expansions (int, 3) – how many single tokens should be used for expansions
expansions_per_term (int, 2) – how many similar terms are added to a selected token (for expansion)
similarity_threshold (float, 0.5) – minimum similarity of user-token to model-token to be used for expansion
path (str, ".") – path

Methods Summary

`get_expand_candidates`(analyzed_query)	Currently only apply semantic term expansion on single term tokens // and selected POS type.
`process_doc`(doc)	Process a document

Methods Documentation

get_expand_candidates(analyzed_query)#: Currently only apply semantic term expansion on single term tokens // and selected POS type.

process_doc(doc)#

Process a document

Parameters:: doc (Document) – Document
Returns:: Processed document
Return type:: Document

SemanticTermExpander

Contents

SemanticTermExpander#