Scoring Profiles and Roles#

Profiles: Project Creator, Search Engineer

This page presents an overview of Scoring Profiles and Scoring Roles within Squirro Cognitive Search.

Scoring profiles and roles are configured by search engineers, then used by project creators to finetune the search experience for end users.

Background: Document Relevancy#

Squirro Cognitive Search uses a default scoring algorithm (BM25) to retrieve an initial relevancy score indicating how relevant the document is for the given full-text search query. The relevancy score is then used to rank documents from highly to partially relevant and ideally contains the right information in the top results.

However, relevancy is not a static concept. Within a specific project, relevancy may depend on the overall business objectives (use cases), user preferences, or other metrics.

For example, given two projects, in the first you might want to prioritize popular documents, whereas in the second you want to promote documents that have been recently modified.

In Squirro Cognitive Search, Scoring Profiles and Scoring Roles can be used to finetune relevancy in the following ways:

  • Scoring Profiles define how additional ranking query clauses are built.

  • Scoring Roles define what profiles should get applied based on the current user.

Reference: Learn more about Document Relevancy.

Scoring Profiles#

Scoring profiles use document metadata as additional filtering criteria to return the most relevant documents according to the selected scoring profile.

Scoring profiles can either reference a configured profile from the project configuration by name or leverage a plugin without any project configuration required.

Out-of-the-Box Scoring Profiles

Scoring profile plugins shipped out of the box include the following:

  • QueryProfile

  • ScriptProfile

  • PluginProfile

QueryProfile#

This plugin formulates additional queries based on Query Syntax.

It’s useful for promoting documents that meet certain criteria.

For boolean conditions, all matching documents are equally boosted (see the ScriptProfile for more sophisticated scoring approaches).

Note

The QueryProfile uses a Syntax Parser that combines statements using the OR operator (in contrast to the AND operator used to parse user queries)

This profile supports all feature that the Squirro Query Syntax offers, for example boosting fine-grained signals extracted from paragraphs or sentences during ingestion.

Reference: To learn more about using scoring profiles within Squirro query syntax, see Scoring Profiles and Queries.

Example Usage#

The following example promotes items equally that have been modified within the last three months:

{
    "recently_modified__boost_equal": {
        "query": "$modified_at > now/d-3M/M"
    },
}

The following promotes items loaded from a specific source (faq) OR have been classified during data ingestion time (as being a tutorial, for example). .. code-block:

{
    "from_knowledge_base": {
        "query": "source:faq is_tutorial:True^100"
    }
}

Advanced Usage: Personalization#

Personalized query clause generation leverages user information through Jinja Templating.

Note: The templated information must be available via the User Service.

The following example boosts documents where the current user is one of the authors:

{
    "author_is_contributor": {
        "query": "author:{{user}}^100"
    }
}

The following example boosts documents that align with the user’s interests:

{
    "user_interest_aligns": {
        "query": "{% for interest in  interests%} tag:{{interest}}^10 {% endfor %}"
    }
}

ScriptProfile#

Using this plugin, you can formulate an Elasticsearch Script Score Query to implement your own custom scoring algorithm on top of the default search score.

This is useful for incorporating “static” signals that are independent of the query but highly correlated to relevance.

Example: You can promote previously modified items by applying a Gaussian Decay Function on the modified_at field.

The ScriptProfile plugin allows the highest flexibility to modify relevancy scoring, but comes with performance implications and should be used with caution.

The generated clause gets applied directly on the top-level Squirro-Item and can only access common Fields and dynamically created Labels. There is no support for paragraph-level Signals/Entities.

Example Usage#

The following example boosts documents that are more important in your domain based on pre-computed centrality scores like PageRank:

"important_documents": {
    "script": {
        "source": "saturation(doc['kw_float']['pagerank'].value, 10)",
    },
    "debug": true
}

The following example boosts documents that have been recently modified considering recency (recent changes are more important - older ones less):

"recently_modified__boost_decay": {
    "script": {
        "source": "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['modified_at'].value)",
        "params": {
            "origin": "now",
            "scale": "30d",
            "offset" : "0",
            "decay" : 0.3
        }
    },
    "boost": 10,
    "debug": true
}

Note: The origin parameter supports date strings like 2022-08-01, 2022-08-01T12:00:00Z or now (current day).

PluginProfile#

This plugin uses custom Python extensions that leverage any kind of metadata from third-party systems to achieve higher document relevancy.

The built-in PopularItem plugin can be used with PluginProfile.

Custom Python extensions that implement the RankClauseBuilder interface (extensibility feature is currently under development).

They can leverage any kind of metadata from 3rd party systems to achieve higher document relevancy.

Plugin Profiles introduce the same flexibility to the generation of a Search Query (DSL) as Pipelets do for the Data Ingestion Pipeline.

Plugin Reference:#

To learn more about available Scoring Plugins, see Plugin Reference.

This lists available plugins and their configuration options.

Scoring Roles#

Scoring Roles define what Scoring Profiles should actually get executed.

Certain Profiles might make sense to get applied to all users, whereas others may only need to apply to a certain group of people.

Role configuration allows a versatile way of configuring the mapping between project users and scoring profiles.

Reference: To learn more about how to configure Scoring Roles, see How to Use Scoring Profiles to Customize Document Relevancy Scoring.

Profile Execution Stage: Re-Scoring#

Not all profiles are suitable to be applied on all documents during the initial query phase, due to each additional ranking clause impacting latency.

Therefore the concept of Profile Execution Stages allows you to apply ranking profiles on either of the following:

  • All relevant documents that match the overall user query (stage: query)

  • Only on the most relevant subset of top N ranked documents that match the search query (stage: rescore).

Figure 3: Applying Scoring Profiles on different stages.

../../../_images/search-pipeline_scoring_profile_staging_query_example.drawio.png

Re-scoring: Precision meets Performance#

Performance#

Re-scoring is especially useful to improve precision by reordering just the top documents returned by the query phase, using a secondary (more expensive) algorithm, instead of applying the expensive algorithm to all matching documents.

This is important to consider when using the Script Profile.

Precision#

Furthermore, Re-scoring helps to combine global relevance information with query-centric relevance signals in a more meaningful way.

An example of this is adding PageRank scores to the final ranking only. PageRank is a measure of the importance or informativeness of a document within a hyperlinked corpus of documents. Since the PageRank score is independent of a user query, it is a global feature of the corpus. The document relevance score (BM25), on the other hand, is dependent on a user query. Blindly combining the two scores (e.g., by multiplication) can easily result in one score overshadowing the other.

A more robust strategy is to use BM25 for coarse-grained selection of relevant documents in relation to the user query (recall), with subsequent re-evaluation of the top-scoring documents in relation to their PageRank score (improving precision).

Apply rescoring on the pre-computed PageRank score on top ranked items only#
"important_documents": {
    "script": {
        "source": "saturation(doc['kw_float']['pagerank'].value, 10)",
    },
    "debug": true,
    "stage": "rescore",
    "config": {
        "rescore": {
            "query_weight": 0.5,
            "rescore_query_weight": 5,
            "score_mode": "total",
            "window_size": 50
        }
    }
}

Rescore Configuration Reference

Reference: Rescore Configuration
window_size
Type: int
Required: False
Default: 50

Control the number of top ranked documents that should be examined per shard.

query_weight
Type: float
Required: False
Default: 1.0

Control the relative importance of the original query.

rescore_query_weight
Type: float
Required: False
Default: 1.0

Control the relative importance of the applied rescore profile.

score_mode
Type: string
Required: False
Default: "total"

Control the way how the scores (original, rescore) are combined.

Changelog#

  • Squirro 3.6.1: Initial Release of Scoring Profiles.

  • Squirro 3.6.2: Added support for native Elasticsearch Scripts using Script Profile.

  • Squirro 3.6.3: Introduced concept of Profile Execution-Stages (rescore vs. query).