TransformerEmbedder

class squirro.lib.nlp.steps.embedders.TransformerEmbedder(config)

Bases: squirro.lib.nlp.steps.embedders.Embedder

Transformers Embedder: it allows the encode text with pre-trained transformers

Currently out of the box available: - transformer: tensorflow, model_name: https://tfhub.dev/google/universal-sentence-encoder/4 - transformer: huggingface, model_name: distilbert - transformer: sentence_transformer, model_name: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Parameters
  • type (str) – transformers

  • transformer (str) – Transformer type (tensorflow/huggingface)

  • model_name (str, None) – pre-trained transformer name/location

  • custom_model_path (str, None) – custom path of local model storage (/var/lib/squirro/machinelearning/pretrained_models/..)

  • norm (str, None) – if specified and using sentence_transformer, embeddings are normalized (valid options: “l1”, “l2”, “max”)

  • dimensions (int, None) – specifies the dimension of the embedding

Methods Summary

process_batch(batch)

Process a batch of documents.

Methods Documentation

process_batch(batch)

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters

batch (list(Document)) – List of documents

Returns

List of processed documents

Return type

list(Document)