How to Use Scoring Profiles to Customize Document Relevancy Scoring
Contents
How to Use Scoring Profiles to Customize Document Relevancy Scoring#
Warning
This feature is in technical preview and may change in a future release.
Introduction#
Squirro Cognitive Search uses a default scoring algorithm (BM25) to retrieve an initial relevancy score indicating how relevant the document is for the given fulltext-search query. The relevancy score is then used to rank documents from highly to partially relevant and ideally contains the right information in the top results.
But relevancy is not a static concept and depends highly on the overall business objectives (use case), user preferences or on other metrics. For example, you might want to promote documents which are very popular (have been read many times), promote documents that have been recently modified, or perhaps boost documents which are tagged with the label official.
In Squirro Cognitive Search, Scoring Profiles and Scoring Roles can be used to drive relevancy:
Scoring Profiles define how additional ranking query-clauses are build.
Scoring Roles define what profiles should get applied based on the current user.
When should you add Scoring Profiles#
Scoring Profiles should be added whenever the default scoring algorithm based solely on text matching does not meet your expectations and you want to consider additional document features for document relevancy ranking.
Figure 1: Introducing Relevancy Signals into Document Retrieval

Anatomy of a Search#
A search query gets piped through the configured Query Processing Workflow in order to preprocess the query before forwarding it to the Search Engine. Common preprocessing tasks involve:
Removal of unwanted terms like stopwords
Query classification: Language, Query Type (keyword vs. natural language question)
Use case specific processing
A processed example query might look like:
Original User Input
-------------------
query: "what are the annual reports of APPLE? $item_created_at > 2020"
Processed
---------
user_terms: "annual reports APPLE"
user_filters: "$item_created_at > 2020"
query_type: "question"
query_language: "en"
The processed query information can then be combined with the configured Scoring Profiles to retrieve the most relevant document for the user.
Figure 2: Combination of Query Term and Relevancy Signal Matching

Supported Profiles#
QueryProfile : Formulate additional queries based on the Squirro Query Syntax. Useful to promote documents that meet a certain criteria. Note that for boolean conditions, all matching documents are equally boosted (see ScriptProfile for more sophisticated scoring approaches). Note: Multiple query clauses are combined with the OR operator.
ScriptProfile : Formulate an Elasticsearch Script Score Query to implement your own custom scoring algorithm on top of the default search score (supported since version 3.6.2). This is especially useful to incorporate “static” signals that are independent of the query but highly correlated to relevance. For example you might want to promote previously worked on items by applying a Gaussian Decay Function on the modified_at field.
PluginProfile : Custom Python extensions that can leverage any kind of metadata from 3rd party systems in order to achieve higher document relevancy. Currently only the one built-in PopularItem plugin can be used.
How to add Scoring Profiles#
Scoring Profiles are maintained within the Configuration Service and can be configured for each project individually.
Project Configuration topic.search.document-scoring-profiles
{
{{profile_name}}: {
{{profile_type}}: {{profile_value}}
stage: string
boost: float
debug: bool
config: dict
}
}
Reference: Scoring Profile Schema
- {{profile_name}}
- Type:
string
Required:True
The profile name (
{{}}
substituted with actual name). Scoring Roles are linked with profiles per name.
- {{profile_type}}
- Type:
string
Required:True
With
{{profile_type}}
either query, script or asset_name.
- stage
- Type:
string
Required:False
Default:"query"
Control if the profile is applied on all matching documents or on the most relevant subset. Can be either
query
or rescore
- boost
- Type:
float
Default:1.0
Multiply the profiles document matching score with the provided boosting factor to increase the Profile’s overall impact.
- debug
- Type:
bool
Required:False
Default:False
If produced profile query clauses get logged. Logs are located at
topic.log.jsonl
ormachinelearning.log.jsonl
respectively.
- config
- Type:
dict
Required:False
Default:{}
Profile specific additional configuration, useful to provide configuration of custom plugins or built-in rescore stage.
Query Profile#
Note
The QueryProfile uses a Syntax Parser that combines statements using the OR
operator (in contrast to the AND
operator used to parse user queries)
This profile supports all feature that the Squirro Query Syntax offers, for example to boost on fine grained Signals extracted from paragraphs or sentences during ingestion.
Basic Usage#
{
"recently_modified__boost_equal": {
"query": "$modified_at > now/d-3M/M"
},
}
{
"from_knowledge_base": {
"query": "source:faq is_tutorial:True^100"
}
}
Advanced Usage: Personalization#
Personalized query clause generation leveraging user-information through Jinja Templating.
Note: The templated information has to be available via the User Service.
{
"author_is_contributor": {
"query": "author:{{user}}^100"
}
}
{
"user_interest_aligns": {
"query": "{% for interest in interests%} tag:{{interest}}^10 {% endfor %}"
}
}
Script Profile#
The ScriptProfile allows the highest flexibility to modify relevancy scoring, but comes with performance implications and should be used with caution.
The generated clause gets applied directly on the top-level Squirro-Item and can only access common Fields and dynamically created Labels (no support for paragraph level Signals/Entities).
"important_documents": {
"script": {
"source": "saturation(doc['kw_float']['pagerank'].value, 10)",
},
"debug": true
}
"recently_modified__boost_decay": {
"script": {
"source": "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['modified_at'].value)",
"params": {
"origin": "now",
"scale": "30d",
"offset" : "0",
"decay" : 0.3
}
},
"boost": 10,
"debug": true
}
Note: The origin parameter supports date strings like 2022-08-01
, 2022-08-01T12:00:00Z
or now
(current day).
Plugin Profiles#
Custom Python extensions that implement the RankClauseBuilder interface (extensibility feature is currently under development). They can leverage any kind of metadata from 3rd party systems in order to achieve higher document relevancy.
Plugin Profiles introduce the same flexibility to the generation of a Search Query (DSL) as Pipelets do for the Data Ingestion Pipeline.
Rank on Popular Items#
This built-in plugin keeps track of the popularity of Items and adds additional boosting queries ad-hoc without relying on pre-computed popularity-scores attached to the items.
{
"popular_items": {
"asset_name": "popular_item",
"config": {
"last_months": 3,
# boost applies only if item was read at least 5 times
"min_popularity": 5,
# scope can be `project` or `user`
"scope": "project"
}
}
}
How to apply Profiles using Scoring Roles#
Project Configuration topic.search.document-scoring-roles
Scoring Roles define what Scoring Profiles should actually get executed.
Certain Profiles might make sense to get applied to all users, whereas others only to a certain group of people.
The role configuration allows a versatile way of configuring the mapping between user’s and Scoring Profiles. Two options for role-profile mapping are supported:
Groups: Squirro maintains group relationship for all users. This information can be used to enable Profiles only for users members of a certain Squirro Group.
User Data: Squirro can also load and store key-value data from a 3rd party system, for example to what department a User belongs to in the Source System. This data can also be extended with custom user-preferences
{
{{role_name}}: {
enabled: bool
applies_to: {
groups: []
"${{user_value}}": []
}
profiles: []
}
}
Reference: Scoring Role Schema
- {{role_name}}
- Type:
string
Required:True
The role name (
{{}}
substituted with actual name)
- enabled
- Type:
bool
Required:False
Default:True
Control if role should get evaluated and used.
- applies_to
- Type:
dict
Required:False
Default:{}
Control for whom the role should be applied. Works with Squirro Group (key
groups
) mapping and dynamically evaluated User Information coming from 3rd party authentication providers (key${{user_value}}
like$department
). Dynamically mapped keys have to start with$
to be evaluated.Per default the role gets applied for all users.
- profiles
- Type:
List[str]
Required:True
List of associated Scoring Profiles that should be executed.
Role for all users#
{
"boost_recently_modified": {
"enabled": true,
"applies_to": {},
"profiles": ["recently_modified__boost_decay"]
}
}
Note: A missing or empty applies_to
key enables the role for everyone.
Role for specific users / groups#
Arbitrary user-metadata can be referenced using the $
prefix like applies_to.$department
.
Squirro Groups are referenced like applies_to.groups
{
"boost_popular_items_for_IT": {
"enabled": true,
"applies_to": {
"$department": "engineering"
},
"profiles": "popular_items"
}
}
{
"another_example": {
"enabled": true,
"applies_to": {
"groups": ["internal"],
"$department": ["engineering", "sales"]
},
"profiles": "popular_items"
}
}
Profile Execution Stage: Re-Scoring#
Not all profiles are suitable to be applied on all documents during the initial query phase (each additional ranking clause has an impact on latency). Therefore the concept of Profile Execution Stages allows to apply ranking profiles either on:
All relevant documents that match the overall user query (
stage: query
)Only on the most relevant subset of top N ranked documents that match the search query (
stage: rescore
).
Figure 3: Applying Scoring Profiles on different stages.

Re-scoring: Precision meets Performance#
Performance#
Re-scoring is especially useful to improve precision by reordering just the top documents returned by the query
phase, using a secondary (more expensive) algorithm, instead of applying the expensive algorithm to all matching documents.
This is important to consider when using the Script Profile.
Precision#
Furthermore, Re-scoring helps to combine global relevance information with query-centric relevance signals in a more meaningful way.
An example of this is adding PageRank scores to the final ranking only. PageRank is a measure of the importance or informativeness of a document within a hyperlinked corpus of documents. Since the PageRank score is independent of a user query, it is a global feature of the corpus. The document relevance score (BM25), on the other hand, is dependent on a user query. Blindly combining the two scores (e.g., by multiplication) can easily result in one score overshadowing the other.
A more robust strategy is to use BM25 for coarse-grained selection of relevant documents in relation to the user query (recall), with subsequent re-evaluation of the top-scoring documents in relation to their PageRank score (improving precision).
"important_documents": {
"script": {
"source": "saturation(doc['kw_float']['pagerank'].value, 10)",
},
"debug": true,
"stage": "rescore",
"config": {
"rescore": {
"query_weight": 0.5,
"rescore_query_weight": 5,
"score_mode": "total",
"window_size": 50
}
}
}
Reference: Rescore Configuration
- window_size
- Type:
int
Required:False
Default:50
Control the number of top ranked documents that should be examined per shard.
- query_weight
- Type:
float
Required:False
Default:1.0
Control the relative importance of the original query.
- rescore_query_weight
- Type:
float
Required:False
Default:1.0
Control the relative importance of the applied rescore profile.
- score_mode
- Type:
string
Required:False
Default:"total"
Control the way how the scores (original, rescore) are combined.
Changelog#
New in version 3.6.1: Initial Release of Scoring Profiles
New in version 3.6.2: Added support for native Elasticsearch Scripts using Script Profile
New in version 3.6.3: Introduced concept of Profile Execution-Stages (rescore vs. query)