Scoring Profiles and Roles#
Profiles: Project Creator, Search Engineer
This page presents an overview of Scoring Profiles and Scoring Roles within Squirro Cognitive Search.
Scoring profiles and roles are configured by search engineers, then used by project creators to finetune the search experience for end users.
Note
Optimizing each stage of the retrieval pipeline is crucial for achieving highly accurate search results.
Background: Document Relevancy#
Squirro Cognitive Search uses a default scoring algorithm (BM25) to retrieve an initial relevancy score indicating how relevant the document is for the given full-text search query. The relevancy score is then used to rank documents from highly to partially relevant and ideally contains the right information in the top results.
However, relevancy is not a static concept. Within a specific project, relevancy may depend on the overall business objectives (use cases), user preferences, or other metrics.
For example, given two projects, in the first you might want to prioritize popular documents, whereas in the second you want to promote documents that have been recently modified.
In Squirro Cognitive Search, Scoring Profiles and Scoring Roles can be used to finetune relevancy in the following ways:
Scoring Profiles define how additional ranking query clauses are built.
Scoring Roles define what profiles should get applied based on the current user.
Reference: Learn more about Document Relevancy.
Scoring Profiles#
Project Configuration topic.search.document-scoring-profiles
Scoring profiles use document metadata as additional filtering criteria to return the most relevant documents according to the selected scoring profile.
Scoring profiles can either reference a configured profile from the project configuration by name or leverage a plugin without any project configuration required.
Reference: Learn more about How to Use Scoring Profiles to Customize Document Relevancy Scoring.
Out-of-the-Box Scoring Profiles
Scoring profile plugins shipped out of the box include the following:
QueryProfile
ScriptProfile
PluginProfile
QueryProfile#
This plugin formulates additional queries based on Query Syntax.
It’s useful for promoting documents that meet certain criteria.
For boolean conditions, all matching documents are equally boosted (see the ScriptProfile for more sophisticated scoring approaches).
Note
The QueryProfile uses a Syntax Parser that combines statements using the OR
operator (in contrast to the AND
operator used to parse user queries)
This profile supports all feature that the Squirro Query Syntax offers, for example boosting fine-grained signals extracted from paragraphs or sentences during ingestion.
Reference: To learn more about using scoring profiles within Squirro query syntax, see Scoring Profiles and Queries.
Example Usage#
The following example promotes items equally that have been modified within the last three months:
{
"recently_modified__boost_equal": {
"query": "$modified_at > now/d-3M/M"
},
}
The following promotes items loaded from a specific source (faq) OR
have been classified during data ingestion time (as being a tutorial, for example).
.. code-block:
{
"from_knowledge_base": {
"query": "source:faq is_tutorial:True^100"
}
}
Advanced Usage: Personalization#
Personalized query clause generation leverages user information through Jinja Templating.
Note: The templated information must be available via the User Service.
The following example boosts documents where the current user is one of the authors:
{
"author_is_contributor": {
"query": "author:{{user}}^100"
}
}
The following example boosts documents that align with the user’s interests:
{
"user_interest_aligns": {
"query": "{% for interest in interests%} tag:{{interest}}^10 {% endfor %}"
}
}
ScriptProfile#
Using this plugin, you can formulate an Elasticsearch Script Score Query to implement your own custom scoring algorithm on top of the default search score.
This is useful for incorporating “static” signals that are independent of the query but highly correlated to relevance.
Example: You can promote previously modified items by applying a Gaussian Decay Function on the modified_at
field.
The ScriptProfile plugin allows the highest flexibility to modify relevancy scoring, but comes with performance implications and should be used with caution.
The generated clause gets applied directly on the top-level Squirro-Item and can only access common Fields and dynamically created Labels. There is no support for paragraph-level Signals/Entities.
Example Usage#
The following example boosts documents that are more important in your domain based on pre-computed centrality scores like PageRank:
"important_documents": {
"script": {
"source": "saturation(doc['kw_float']['pagerank'].value, 10)",
},
"debug": true
}
The following example boosts documents that have been recently modified considering recency (recent changes are more important - older ones less):
"recently_modified__boost_decay": {
"script": {
"source": "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['modified_at'].value)",
"params": {
"origin": "now",
"scale": "30d",
"offset" : "0",
"decay" : 0.3
}
},
"boost": 10,
"debug": true
}
Note: The origin
parameter supports date strings like 2022-08-01
, 2022-08-01T12:00:00Z
or now
(current day).
PluginProfile#
This plugin uses custom Python extensions that leverage any kind of metadata from third-party systems to achieve higher document relevancy.
The built-in PopularItem plugin can be used with PluginProfile.
Custom Python extensions that implement the RankClauseBuilder
interface (extensibility feature is currently under development).
They can leverage any kind of metadata from 3rd party systems to achieve higher document relevancy.
Plugin Profiles introduce the same flexibility to the generation of a Search Query (DSL) as Pipelets do for the Data Ingestion Pipeline.
Rank on Popular Items Example#
This built-in plugin keeps track of the popularity of Items and adds additional boosting queries ad-hoc without relying on pre-computed popularity scores attached to the items.
The following example applies an additional boost on popular items:
{
"popular_items": {
"asset_name": "popular_item",
"config": {
"last_months": 3,
# boost applies only if item was read at least 5 times
"min_popularity": 5,
# scope can be `project` or `user`
"scope": "project"
}
}
}
Plugin Reference:#
To learn more about available Scoring Plugins, see Plugin Reference.
This lists available plugins and their configuration options.
Scoring Roles#
Project Configuration topic.search.document-scoring-roles
Scoring Roles define what Scoring Profiles should actually get executed.
Certain Profiles might make sense to get applied to all users, whereas others may only need to apply to a certain group of people.
Role configuration allows a versatile way of configuring the mapping between project users and scoring profiles.
Reference: Learn more about How to Use Scoring Profiles to Customize Document Relevancy Scoring.
Anatomy of a Search#
A search query gets piped through the configured Query Processing Workflow to preprocess the query before forwarding it to the Search Engine.
Common preprocessing tasks involve:
Removal of unwanted terms like stopwords.
Query classification: Language, Query Type (keyword vs. natural language question).
Use-case-specific processing.
A processed example query might look like the following:
Original User Input
-------------------
query: "what are the annual reports of APPLE? $item_created_at > 2020"
Processed
---------
user_terms: "annual reports APPLE"
user_filters: "$item_created_at > 2020"
query_type: "question"
query_language: "en"
The processed query information can then be combined with the configured Scoring Profiles to retrieve the most relevant document for the user.
Figure 2: Combination of Query Term and Relevancy Signal Matching
Profile Execution Stage: Re-Scoring#
Not all profiles are suitable to be applied on all documents during the initial query phase, due to each additional ranking clause impacting latency.
Therefore the concept of Profile Execution Stages allows you to apply ranking profiles on either of the following:
All relevant documents that match the overall user query (
stage: query
)Only on the most relevant subset of top N ranked documents that match the search query (
stage: rescore
).
Figure 3: Applying Scoring Profiles on different stages.
Re-scoring: Precision meets Performance#
Performance#
Re-scoring is especially useful to improve precision by reordering just the top documents returned by the query
phase, using a secondary (more expensive) algorithm, instead of applying the expensive algorithm to all matching documents.
This is important to consider when using the Script Profile.
Precision#
Furthermore, Re-scoring helps to combine global relevance information with query-centric relevance signals in a more meaningful way.
An example of this is adding PageRank scores to the final ranking only. PageRank is a measure of the importance or informativeness of a document within a hyperlinked corpus of documents. Since the PageRank score is independent of a user query, it is a global feature of the corpus. The document relevance score (BM25), on the other hand, is dependent on a user query. Blindly combining the two scores (e.g., by multiplication) can easily result in one score overshadowing the other.
A more robust strategy is to use BM25 for coarse-grained selection of relevant documents in relation to the user query (recall), with subsequent re-evaluation of the top-scoring documents in relation to their PageRank score (improving precision).
"important_documents": {
"script": {
"source": "saturation(doc['kw_float']['pagerank'].value, 10)",
},
"debug": true,
"stage": "rescore",
"config": {
"rescore": {
"query_weight": 0.5,
"rescore_query_weight": 5,
"score_mode": "total",
"window_size": 50
}
}
}
Rescore Configuration Reference
Reference: Rescore Configuration
- window_size
- Type:
int
Required:False
Default:50
Control the number of top ranked documents that should be examined per shard.
- query_weight
- Type:
float
Required:False
Default:1.0
Control the relative importance of the original query.
- rescore_query_weight
- Type:
float
Required:False
Default:1.0
Control the relative importance of the applied rescore profile.
- score_mode
- Type:
string
Required:False
Default:"total"
Control the way how the scores (original, rescore) are combined.
Changelog#
Squirro 3.6.1: Initial Release of Scoring Profiles.
Squirro 3.6.2: Added support for native Elasticsearch Scripts using Script Profile.
Squirro 3.6.3: Introduced concept of Profile Execution-Stages (rescore vs. query).