TransformerEmbedder#
- class TransformerEmbedder(config)#
Bases:
Embedder
The transformer
Embedder
encodes text based on a pre-trained transformers model.- Note Currently out of the box available:
transformer: tensorflow, model_name: https://tfhub.dev/google/universal-sentence-encoder/4
transformer: huggingface, model_name: distilbert
transformer: sentence_transformer, model_name: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Input - the input field needs to be of type
str
Output - the output field is filled with data of type
numpy.ndarray
- Parameters:
type (str) – transformers
transformer (str) – Transformer type (tensorflow/huggingface)
model_name (str, None) – pre-trained transformer name/location
custom_model_path (str, None) – custom path of local model storage (/var/lib/squirro/machinelearning/pretrained_models/..)
norm (str, None) – if specified and using sentence_transformer, embeddings are normalized (valid options: “l1”, “l2”, “max”)
dimensions (int, None) – specifies the dimension of the embedding
Example
{ "step": "embedder", "type": "transformers", "transformer": "tensorflow", "model_name": "https://tfhub.dev/google/universal-sentence-encoder/4", "input_field": "extract", "output_field": "embedded_extract" }
Methods Summary
process_batch
(batch)Process a batch of documents.
Methods Documentation