SemanticTermExpander#
- class SemanticTermExpander(config)#
Bases:
BatchedStep
Find and add synonyms dynamically via (domain-specific) language models.
Provide pretrained models per language via:
path to file-system
or pretrained models available from gensim-data RaRe-Technologies/gensim-data
Example usage:
Input query: why is austria again the virus center
Output term expansion mutations:
{ "austria": "( austria OR germany^0.86 OR switzerland^0.86 )", "virus": "( virus OR h5n1^0.88 OR flu^0.88 )", "center": "( center OR centers^0.8 OR centre^0.78 )" }
Annotated mutation dictionary can be used in a following step.
- Parameters
step (str, "app") – app
type (str, "query_processing") – query_processing
name (str, "semantic_term_expander") – semantic_term_expander
analyzed_input_field (str, "nlp") – analyzed spacy Doc
output_field (str, "term_expansion_mutations") – the modified query string
model_mapping (dict) – mapping from language code to semantic-model-path to load
expand_by_pos (list, ["NOUN","PROPN"]) – list of POS tags that are used to expand query term
max_expansions (int, 3) – how many single tokens should be used for expansions
expansions_per_term (int, 2) – how many similar terms are added to a selected token (for expansion)
similarity_threshold (float, 0.5) – minimum similarity of user-token to model-token to be used for expansion
path (str, ".") – path
Methods Summary
get_expand_candidates
(analyzed_query)Currently only apply semantic term expansion on single term tokens // and selected POS type.
process_doc
(doc)Process a document
Methods Documentation
- get_expand_candidates(analyzed_query)#
Currently only apply semantic term expansion on single term tokens // and selected POS type.