SklearnTFIDFEmbedder

SklearnTFIDFEmbedder#

class SklearnTFIDFEmbedder(config)#

Bases: Embedder

The TFIDF Embedder encodes provided text based on the sklearn TFIDF-Vectorizer.

Input - the input field needs to be of type str.

Output - the output field is filled with data of type numpy.ndarray

Parameters:

type (str) – sklearn_tfidf
model_kwargs (dict, {}) – Keyword arguments to pass on to sklearn TfidfVectorizer

Example

{
    "step": "embedder",
    "type": "sklearn_tfidf",
    "name": "sklearn_tfidf",
    "input_field": "body",
    "model_kwargs": {
        "min_df": 5,
        "ngram_range": "1, 3"
    },
    "output_field": "embedded_body"
}

Methods Summary

`load`()	Load a step
`process_batch`(batch)	Process a batch of documents.
`save`()	Save a step
`train`(docs)	Train on a step of a set of documents

Methods Documentation

load()#: Load a step

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters:: batch (list(Document)) – List of documents
Returns:: List of processed documents
Return type:: list(Document)

save()#: Save a step

train(docs)#

Train on a step of a set of documents

Parameters:: docs (generator(Document)) – Generator of documents
Returns:: Generator of processed documents
Return type:: generator(Document)

SklearnTFIDFEmbedder

Contents

SklearnTFIDFEmbedder#