TransformerEmbedder

TransformerEmbedder#

class TransformerEmbedder(config)#

Bases: Embedder

The transformer Embedder encodes text based on a pre-trained transformers model.

Note Currently out of the box available:

transformer: tensorflow, model_name: https://tfhub.dev/google/universal-sentence-encoder/4
transformer: huggingface, model_name: distilbert
transformer: sentence_transformer, model_name: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Input - the input field needs to be of type str

Output - the output field is filled with data of type numpy.ndarray

Parameters:

type (str) – transformers
transformer (str) – Transformer type (tensorflow/huggingface)
model_name (str, None) – pre-trained transformer name/location
custom_model_path (str, None) – custom path of local model storage (/var/lib/squirro/machinelearning/pretrained_models/..)
norm (str, None) – if specified and using sentence_transformer, embeddings are normalized (valid options: “l1”, “l2”, “max”)
dimensions (int, None) – specifies the dimension of the embedding

Example

{
    "step": "embedder",
    "type": "transformers",
    "transformer": "tensorflow",
    "model_name": "https://tfhub.dev/google/universal-sentence-encoder/4",
    "input_field": "extract",
    "output_field": "embedded_extract"
}

Methods Summary

process_batch(batch)

Process a batch of documents.

Methods Documentation

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters:: batch (list(Document)) – List of documents
Returns:: List of processed documents
Return type:: list(Document)

TransformerEmbedder

Contents

TransformerEmbedder#