Typeahead Suggestions#

Profiles: Project Creator, Search User

The search bar within Squirro Cognitive Search offers a rich set of type-ahead functionalities out of the box.

Configured out of the box, search users are provided suggestions based on typehead configurations that can be modified by the project creator.

../../../_images/searchbar_suggestions.png

Based on Document Features#

Typeahead suggestions can come from three different sources:

  • Content (Key Phrases and Title), as described on this page.

  • Labels and their values.

  • Communities.

Content-based Typeahead#

Squirro provides search completion based on document features. This is in addition to the typeahead based on label values.

image31

Features#

  • Matching on key-phrases and document title

  • Fuzzy phrase-term matching

    • Optional enforcing of in-order term matching

    • Support for phrase slop: maximum number of intervening unmatched positions

      image32

Setup#

To enable the content-based typeahead, add the NLP Keyphrase Tagger (see Discover (NLP Tagger)) step to the corresponding pipeline.

Add the NLP Keyphrase Tagger pipelet (found in the Discover section) to a pipeline workflow:

image33

This enrichment will create a hidden label nlp_tag__phrases, which contains the extracted key phrases:

image34

Based on User Interaction#

  • User’s Saved Searches

  • User’s Search History

  • Popular queries (see Popular Query Suggestions) across the project (or for a single user).

Using SquirroClient#

All suggestions can be accessed programmatically via the SquirroClient (Python SDK).

For this, see TopicApiBaseMixin, specifically get_typeahead_suggestions().

Configuration#

Typeahead Suggestions can be configured in Setup > Settings > Project Configuration.

Suggestions Based on Document Content#

Project Configuration topic.typeahead.content.configuration

Reference: Configuration Schema
pydantic model ContentTypeaheadConfig#

Content-based Typeahead configuration.

For more information about the Typeahead Suggestions feature in Squirro see: https://docs.squirro.com/en/latest/technical/search/features/typeahead.html

Fields
field fuzziness: str = 'AUTO'#

Specifies the maximum edit distance allowed for matching. The fuzziness parameter can be specified as: 0, 1, 2: The maximum allowed Levenshtein Edit Distance (or number of edits), AUTO: Generates an edit distance based on the length of the term.

field fuzzy_max_expansions: int = 1#

Maximum number of variations created.

field fuzzy_prefix_length: int = 2#

Number of beginning characters left unchanged when creating expansions.

field highlighter: squirro.common.schema.typeahead.ContentTypeaheadHighlighter = ContentTypeaheadHighlighter.UNIFIED#

Highlighter type.

field in_order: bool = False#

Require matched spans to be in-order.

field slop: int = 6#

Maximum number of intervening unmatched positions.

field terminate_after: int = 100000#

Tradeoff between accurate matching and speed. The typeahead search request gets terminated after N matching documents (per shard) are found. Potentially higher scoring results may be missed out.

Suggestions Based on Label Values#

Project Configuration topic.typeahead.facetvalue.configuration

{
    "terminate_after": string,
    "sample_shard_size": int,
    "aggregation_method": "significant_terms",
    "community_drill_down": {
        "enabled": true,
        "for_top_n": 1
    }
}
Reference: Configuration Schema
terminate_after
Type: int
Required: False
Default: 15000

Tradeoff between accurate matching and speed. The typeahead search request gets terminated after N matching documents (per shard) are found. Potentially higher scoring results may be missed out.

sample_shard_size
Type: int
Required: False
Default: 100

Facet Value suggestions is based on the Terms aggregation on the matching subset of Items. The sample_shard_size parameter limits how many top-scoring documents are collected in the sample processed on each shard. Lower values tend to return label-value suggestions faster (but with less accurate order) and tightens the focus to high-relevance matches rather than the potentially very long tail of low-quality matches.

aggregation_method
Type: string
Required: False
Default: significant_terms

Choose between significant_terms or terms aggregation to find best matching label values.

community_drill_down
Type: json
Required: False

Enable drilling down of top-ranked communities by setting community_drill_down.enabled:true