Scoring Profiles and Roles

Scoring Profiles and Roles#

Profiles: Project Creator, Search Engineer

This page presents an overview of Scoring Profiles and Scoring Roles within Squirro Cognitive Search.

Scoring profiles and roles are configured by search engineers, then used by project creators to finetune the search experience for end users.

Note

Optimizing each stage of the retrieval pipeline is crucial for achieving highly accurate search results.

Background: Document Relevancy#

Squirro Cognitive Search uses a default scoring algorithm (BM25) to retrieve an initial relevancy score indicating how relevant the document is for the given full-text search query. The relevancy score is then used to rank documents from highly to partially relevant and ideally contains the right information in the top results.

However, relevancy is not a static concept. Within a specific project, relevancy may depend on the overall business objectives (use cases), user preferences, or other metrics.

For example, given two projects, in the first you might want to prioritize popular documents, whereas in the second you want to promote documents that have been recently modified.

In Squirro Cognitive Search, Scoring Profiles and Scoring Roles can be used to finetune relevancy in the following ways:

Scoring Profiles define how additional ranking query clauses are built.
Scoring Roles define what profiles should get applied based on the current user.

Reference: Learn more about Document Relevancy.

Scoring Profiles#

Project Configuration topic.search.document-scoring-profiles

Scoring profiles use document metadata as additional filtering criteria to return the most relevant documents according to the selected scoring profile.

Scoring profiles can either reference a configured profile from the project configuration by name or leverage a plugin without any project configuration required.

Reference: Learn more about How to Use Scoring Profiles to Customize Document Relevancy Scoring.

Out-of-the-Box Scoring Profiles

Scoring profile plugins shipped out of the box include the following:

QueryProfile
ScriptProfile
PluginProfile

Note

All those named Scoring Profiles can be executed manually inside a searchbar by using the profile:{name:<PROFILE_NAME>} query syntax. The name: prefix is required here to explicitly use a named scoring profiles (as defined in the project settings). Using the syntax like profile:{last_read} will only look for installed scoring-plugins by their name (learn more).

QueryProfile#

This plugin formulates additional queries based on Query Syntax.

It’s useful for promoting documents that meet certain criteria.

For boolean conditions, all matching documents are equally boosted (see the ScriptProfile for more sophisticated scoring approaches).

Note

The QueryProfile uses a Syntax Parser that combines statements using the OR operator (in contrast to the AND operator used to parse user queries)

This profile supports all feature that the Squirro Query Syntax offers, for example boosting fine-grained signals extracted from paragraphs or sentences during ingestion.

Reference: To learn more about using scoring profiles within Squirro query syntax, see Scoring Profiles and Queries.

Example Usage#

The following example promotes items equally that have been modified within the last three months:

{
    "recently_modified__boost_equal": {
        "query": "$modified_at > now/d-3M/M"
    },
}

The following promotes items loaded from a specific source (faq) OR have been classified during data ingestion time (as being a tutorial, for example). .. code-block:

{
    "from_knowledge_base": {
        "query": "source:faq is_tutorial:True^100"
    }
}

Advanced Usage: Personalization#

Personalized query clause generation leverages user information through Jinja Templating.

Note: The templated information must be available via the User Service.

The following example boosts documents where the current user is one of the authors:

{
    "author_is_contributor": {
        "query": "author:{{user}}^100"
    }
}

The following example boosts documents that align with the user’s interests:

{
    "user_interest_aligns": {
        "query": "{% for interest in  interests%} tag:{{interest}}^10 {% endfor %}"
    }
}

ScriptProfile#

Using this plugin, you can formulate an Elasticsearch Script Score Query to implement your own custom scoring algorithm on top of the default search score.

This is useful for incorporating “static” signals that are independent of the query but highly correlated to relevance.

Example: You can promote previously modified items by applying a Gaussian Decay Function on the modified_at field.

The ScriptProfile plugin allows the highest flexibility to modify relevancy scoring, but comes with performance implications and should be used with caution.

The generated clause gets applied directly on the top-level Squirro-Item and can only access common Fields and dynamically created Labels. There is no support for paragraph-level Signals/Entities.

Example Usage#

The following example boosts documents that are more important in your domain based on pre-computed centrality scores like PageRank:

"important_documents": {
    "script": {
        "source": "saturation(doc['kw_float']['pagerank'].value, 10)",
    },
    "debug": true
}

The following example boosts documents that have been recently modified considering recency (recent changes are more important - older ones less):

"recently_modified__boost_decay": {
    "script": {
        "source": "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['modified_at'].value)",
        "params": {
            "origin": "now",
            "scale": "30d",
            "offset" : "0",
            "decay" : 0.3
        }
    },
    "boost": 10,
    "debug": true
}

Note: The origin parameter supports date strings like 2022-08-01, 2022-08-01T12:00:00Z or now (current day).

PluginProfile#

This profile uses a custom ScoringPlugin that can leverage any kind of metadata from third-party systems to achieve higher document relevancy. The built-in PopularItem plugin can be used with PluginProfile.

Custom Python extensions that implement the RankClauseBuilder interface (extensibility feature is currently under development).

They can leverage any kind of metadata from 3rd party systems to achieve higher document relevancy.

Plugin Profiles introduce the same flexibility to the generation of a Search Query (DSL) as Pipelets do for the Data Ingestion Pipeline. Semantic search, for example, is implemented using a plugin that uses vector embeddings to find similar documents.

Example: Predefine how semantic search should be performed#

Semantic search is usually executed by explicitly using the semantic ScoringPlugin with profile:{ semantic } (learn more) and custom arguments can be set to override default settings, like profile:{semantic worker:snowflake} (learn more). Scoring Profiles can also act as a shorthand to predefine custom settings, so that project admins/developers can simply use profile:{ name:my_semantic } and manage how semantic search should be performed from a central place.

{
    "my_semantic": {
        "asset_name": "semantic",
        "config": {
            "perform_only_knn": true,
            "vector_field": "embeddings.custom_snowflake_embedding_field",
            "worker": "snowflake-v2"
        }
}

Rank on Popular Items Example#

This built-in plugin keeps track of the popularity of Items and adds additional boosting queries ad-hoc without relying on pre-computed popularity scores attached to the items.

The following example applies an additional boost on popular items:

{
    "popular_items": {
        "asset_name": "popular_item",
        "config": {
            "last_months": 3,

            # boost applies only if item was read at least 5 times
            "min_popularity": 5,

            # scope can be `project` or `user`
            "scope": "project"
        }
    }
}

Plugin Reference:#

To learn more about available Scoring Plugins, see scoring-plugins-reference.

This lists available plugins and their configuration options.

Scoring Roles#

Project Configuration topic.search.document-scoring-roles

Scoring Roles define what Scoring Profiles should actually get executed.

Certain Profiles might make sense to get applied to all users, whereas others may only need to apply to a certain group of people.

Role configuration allows a versatile way of configuring the mapping between project users and scoring profiles.

Reference: Learn more about How to Use Scoring Profiles to Customize Document Relevancy Scoring.

Anatomy of a Search#

A search query gets piped through the configured Query Processing Workflow to preprocess the query before forwarding it to the Search Engine.

Common preprocessing tasks involve:

Removal of unwanted terms like stopwords.
Query classification: Language, Query Type (keyword vs. natural language question).
Use-case-specific processing.

A processed example query might look like the following:

Original User Input
-------------------

    query:          "what are the annual reports of APPLE? $item_created_at > 2020"

Processed
---------

    user_terms:     "annual reports APPLE"
    user_filters:   "$item_created_at > 2020"
    query_type:     "question"
    query_language: "en"

The processed query information can then be combined with the configured Scoring Profiles to retrieve the most relevant document for the user.

Figure 2: Combination of Query Term and Relevancy Signal Matching

https://s3.amazonaws.com/download.squirro.net/docs/technical/search/relevancy/search-pipeline_scoring_profile_overview_query_example.drawio.png

Profile Execution Stage: Re-Scoring#

Not all profiles are suitable to be applied on all documents during the initial query phase, due to each additional ranking clause impacting latency.

Therefore the concept of Profile Execution Stages allows you to apply ranking profiles on either of the following:

All relevant documents that match the overall user query (stage: query)
Only on the most relevant subset of top N ranked documents that match the search query (stage: rescore).

Figure 3: Applying Scoring Profiles on different stages.

Re-scoring: Precision meets Performance#

Performance#

Re-scoring is especially useful to improve precision by reordering just the top documents returned by the query phase, using a secondary (more expensive) algorithm, instead of applying the expensive algorithm to all matching documents.

This is important to consider when using the Script Profile.

Precision#

Furthermore, Re-scoring helps to combine global relevance information with query-centric relevance signals in a more meaningful way.

An example of this is adding PageRank scores to the final ranking only. PageRank is a measure of the importance or informativeness of a document within a hyperlinked corpus of documents. Since the PageRank score is independent of a user query, it is a global feature of the corpus. The document relevance score (BM25), on the other hand, is dependent on a user query. Blindly combining the two scores (e.g., by multiplication) can easily result in one score overshadowing the other.

A more robust strategy is to use BM25 for coarse-grained selection of relevant documents in relation to the user query (recall), with subsequent re-evaluation of the top-scoring documents in relation to their PageRank score (improving precision).

Apply rescoring on the pre-computed PageRank score on top ranked items only#

"important_documents": {
    "script": {
        "source": "saturation(doc['kw_float']['pagerank'].value, 10)",
    },
    "debug": true,
    "stage": "rescore",
    "config": {
        "rescore": {
            "query_weight": 0.5,
            "rescore_query_weight": 5,
            "score_mode": "total",
            "window_size": 50
        }
    }
}

Rescore Configuration Reference

Reference: Rescore Configuration

window_size: Type: int

Required: False

Default: 50

Control the number of top ranked documents that should be examined per shard.

query_weight: Type: float

Required: False

Default: 1.0

Control the relative importance of the original query.

rescore_query_weight: Type: float

Required: False

Default: 1.0

Control the relative importance of the applied rescore profile.

score_mode: Type: string

Required: False

Default: "total"

Control the way how the scores (original, rescore) are combined.

Changelog#

Squirro 3.6.1: Initial Release of Scoring Profiles.
Squirro 3.6.2: Added support for native Elasticsearch Scripts using Script Profile.
Squirro 3.6.3: Introduced concept of Profile Execution-Stages (rescore vs. query).

Scoring Profiles and Roles

Contents

Scoring Profiles and Roles#

Background: Document Relevancy#

Scoring Profiles#

QueryProfile#

Example Usage#

Advanced Usage: Personalization#

ScriptProfile#

Example Usage#

PluginProfile#

Example: Predefine how semantic search should be performed#

Rank on Popular Items Example#

Plugin Reference:#

Scoring Roles#

Anatomy of a Search#

Profile Execution Stage: Re-Scoring#

Re-scoring: Precision meets Performance#

Performance#

Precision#

Changelog#