SemanticSearch#

class SemanticSearch#

semantic | Perform knn search on paragraph embeddings

Leverages registered embeddings service to encode the query. Note: index embeddings have to be compatible with query embeddings

pydantic model PluginConfig#
Fields
  • embedding_type (Optional[squirro.common.clients.transformers.EmbeddingDataType])

  • k (int)

  • knn_boost (float)

  • normalize_embeddings (bool)

  • num_candidates (int)

  • similarity_threshold (Optional[float])

  • text (str)

  • worker (str)

PluginConfig.plugin_name: ClassVar[str] = 'semantic'#

Used to register and reference the plugin within a query.

field PluginConfig.text: str = ''#

Text snippet on which knn search should get applied on

field PluginConfig.num_candidates: int = 100#

Amount of candidates to consider per shard

field PluginConfig.k: int = 20#

Amount of k nearest neighbours to retrieve

field PluginConfig.knn_boost: float = 100#

KNN Boosting factor

field PluginConfig.worker: str = 'query-fast'#

What deployed sentence-embeddings worker (@transformer-service) should be used

field PluginConfig.normalize_embeddings: bool = False#

If set to true, embeddings will have length 1 (normalized). In that case, the faster dot-product instead of cosine similarity will be used. Note that normalization works well only with the float embedding type.

field PluginConfig.similarity_threshold: Optional[float] = None#

The required minimum similarity for a vector to be considered a match (optional). The scale of the threshold is dependent on the used similarity metric, and refers to the true similarity before it has been transformed into _score and boost applied - use the corresponding inverted score function.

field PluginConfig.embedding_type: Optional[squirro.common.clients.transformers.EmbeddingDataType] = None#

The data type used to encode embeddings. Either float or byte. If set to byte, embeddings are quantized. If not set, the default type is read from the project configuration using the topic.search.default-embedding-settings config.