How To Handle User Query Terms Correctly#
Added in version 3.5.3.
The Query Term Matching Strategy specifies how Search Terms are interpreted and translated into Elasticsearch query clauses.
Note
See how Query Processing helps to handle Natural Language Queries properly by performing additional NLP tasks before querying Elasticsearch.
Tune-able Parameters
Searchable Field and Label boosting: Tune impact of terms that match in searchable fields
How to match terms:
Phrases: Apply Exact or more lenient matching? Additional boosting compared to normal term matches?
Sequence of Terms (Natural Language Query): How many terms have to match? Apply additional rescoring on exact matches?
Default Field Boosts#
Project Configuration topic.search.field-boosts
Define the impact that searchable fields & labels have to the overall document relevancy score.
"default_value": {
"title": 2,
"body": 1,
"nlp_tag__phrases": 1
}
With title
and body
being the default searchable fields. Here nlp_tag__phrases
refers to the label that contains key-phrases added by the NLP Keyphrase Tagger.
Default Term Matching Configuration#
Project Configuration topic.search.query-strategy
Impact of Matching Original vs. Language-Specific Analyzed Text#
When performing full-text search across multilingual content, it is important to consider how query terms match either the original (lightly normalized) text or the language-specific analyzed (stemmed and decomposed) version of the content. You can control the influence of these matches using the language_match_weight
setting.
Note
The language_match_weight
configuration only applies to the title
and body
fields of a document, as only these fields are indexed in both unstemmed
and stemmed
variants.
Example Configuration#
"language_match_weight": {
"unstemmed": 1.0,
"stemmed": 0.1
}
unstemmed
Boosting factor for matches on the original form of the text, after basic normalization. This includes processes such as lowercasing, replacing special characters, and folding characters like ö → o.
stemmed
Boosting factor for matches on the analyzed form of the text, where language-specific processing is applied, including stemming, stop-word removal, and compound decomposition.
With the values "unstemmed": 1.0
and "stemmed": 0.1
, the impact on search results is as follows:
Prioritize exact matches: Documents containing the exact query terms rank significantly higher than those with only stemmed matches.
Maintain precision: The 10x weight difference (1.0 vs 0.1) ensures that exact word matches dominate the relevance score.
Preserve recall: Stemmed matches still contribute to the score, allowing documents with related terms to be found, but at a much lower relevance.
For example, a search for running
strongly favors documents containing running
exactly, while documents with only run
, runs
, or runner
have much lower scores but still appear in the results.
German Query Example#
Suppose a user searches for the German word Apfelbaum
(apple tree).
Document Text#
Ein Apfelbaum wächst im Garten.
Tokenization Results#
Field Type |
Tokenized Terms |
---|---|
Unstemmed |
|
Stemmed |
|
Interpretation#
The unstemmed field matches the full term
Apfelbaum
directly.The stemmed field allows the query to also match decomposed parts like
apfel
andbaum
, increasing recall—especially if the user only searches forbaum
orapfel
.
Recommendation#
Use a higher weight for unstemmed
to prioritize exact term matches and phrase integrity, and a lower weight for stemmed
to capture additional semantically related matches without overwhelming relevance.
Term Sequence Match#
Example query: capital of Switzerland
"term_sequence": {
"operator": "OR",
"minimum_should_match": "3<75% 7<5",
"tie_breaker": 0.5
}
operator
:Default boolean logic used to interpret a term sequence in the query string if no operators are specified.
Valid values are:
OR
(Default): Example query is interpreted ascapital OR of OR Switzerland
. This is the preferred setting as it allows the usage ofminimum_should_match
conditions.AND
: Example query is interpreted ascapital AND of AND Switzerland
. This performs strict keyword matching and bypasses configuredminimum_should_match
conditions.
Type: string
minimum_should_match
:Specifies how many terms of the query have to match on the content.
The official syntax allows a combination of multiple rules that are separated by whitespace, for example the setting
3<75% 7<5
means:0 - 3 tokens: all tokens have to match
3 - 7 tokens: 75% of tokens have to match
> 7 tokens: at least 5 tokens have to match
Note: Requires
operator
set toOR
Type: number (int) or string (multiple conditions, percentages)
tie_breaker
:Specifies how the total score of matches on multiple fields are combined. Field with highest score counts most.
Type: number (float)
Phrase Match#
Example query: “global warming”
"phrase": {
"phrase_slop": 0,
"boost": 2,
"handle_synonyms": false,
"include_stemmed_fields": false
}
phrase_slop
:The
phrase_slop
configures how exact the phrase query is matched (exact vs lenient).Example settings:
0
: Exact phrase has to match in order.Matches
"global warming"
only1
: Allows one transposition.Matches
"warming global"
10
: Allows up to 10 non-matching terms in, to intervene the phrase in any order (proximity search).Matches
"the warming climate has a negative effect on the global economy"
Note: Exact matches have higher relevance compared to lenient matches (applies for settings with
phrase_slop
> 0).Type: number (int)
boost
:To boost phrase matches higher than normal search terms. Natural language queries can also automatically create phrases during the query-rewriting stage (Query Processing).
Type: number (float)
handle_synonyms
:Use configured synonyms on the project for phrase matches. Defaults to
false
Type: boolean
include_stemmed_fields
:If the phrase match should also search within the
stemmed
version of body and title. By defaultfalse
to return exact matches only. A query"BP"
should not matchbps
.Type: boolean
Multi Word Rescoring#
The general query applies the configured term_sequence
configuration.
But documents that contain the query terms within a single sentence or paragraph are not boosted by default. To improve that, rescoring
functions can be used.
Example query: global warming effects
Rescoring applies additional - potentially more expensive - scoring functions to the top N ranked documents (by the more general query) and resorts them afterwards.
"rescore": {
"on_term_sequences": {
"enabled": true,
"score_word_sequence_slop": 2,
"score_word_sequence_items": 100,
"score_word_query_weight": 0.7,
"score_word_rescore_query_weight": 1.2,
"score_word_score_mode": "total"
}
}
enabled
:Use rescoring on term sequences. Defaults to
true
Type: boolean
score_word_sequence_slop
:How close to each other the words in sequence scoring have to be for them to influence the scoring. A
slop
of 0 means the words have to be next to each other in the same order. Transposed terms have a slop of 2.Type: number (int)
score_word_sequence_items
:On how many items the word sequence scoring is applied. Defaults to
100
Type: number (int)
score_word_query_weight
:Weight of the original query when rescoring items for
score_word_sequence
. Defaults to0.7
Type: number (float)
score_word_rescore_query_weight
:Weight of the rescoring query when rescoring items for
score_word_sequence
. Defaults to1.2
Type: number (float)
score_word_score_mode
:How to combine the original score and the rescore score for
score_word_sequence
. Possible values are: total, multiply, avg, max, min. Defaults tototal
Type: string