SemanticRescore

SemanticRescore#

class SemanticRescore#

semantic_rescore | Rescore the top N ranked paragraphs based on the similarity between query embedding and pre-computed paragraph embeddings. This is performend within the rescore phase inside Elastic.

Requires a dedicated sentence-embeddings worker to be registered (learn more).

pydantic model PluginConfig#

Fields:

apply_phrases_to_encoder (bool)
embedding_type (squirro.common.clients.transformers.EmbeddingDataType | None)
text (str | None)
truncate_dimensions (int | None)
vector_field (str | None)
worker (str)

PluginConfig.plugin_name: ClassVar[str] = 'semantic_rescore'#: Used to register and reference the plugin within a query.

field PluginConfig.text: Optional[str] = ''#: Explicitly set Text as input for embeddings, otherwise overall user-query terms are injected & embedded.

field PluginConfig.apply_phrases_to_encoder: bool = True#: Query Processing might rewrite the user query to match detected entities exactly as a phrase. If this is enabled, then the additional phrase is appended to the user terms and used in the query embedding call.

field PluginConfig.worker: str = 'query-fast'#: What deployed sentence-embeddings worker (@transformer-service) should be used

field PluginConfig.embedding_type: Optional[EmbeddingDataType] = None#: The data type used to encode embeddings. Either float or byte. If set to byte, embeddings are quantized. If not set, the default type is read from the project configuration using the topic.search.default-embedding-settings config.

field PluginConfig.truncate_dimensions: Optional[int] = 0#: The dimension to truncate sentence embeddings to. 0 does no truncation. Only applicable for models that are trained with MRL.

field PluginConfig.vector_field: Optional[str] = None#: The name of the vector field in Elasticsearch used for querying e.g. 384-byte-intfloat/multilingual-e5-small.

SemanticRescore

Contents

SemanticRescore#