BowEmbedder

BowEmbedder#

class BowEmbedder(config)#

Bases: Embedder

The bag of words Embedder encodes provided text based on gensim doc2bow.

Intput - the input field needs to be of type list [ str ]

Output - the output field is filled with data of type numpy.ndarray

Parameters:

type (str) – bow
min_doc_frequency (int, 2) – Minimum number of documents a term must appear in
max_doc_fraction (float, 0.75) – Maximum fraction of documents a term can appear in
max_n_words (int, None) – Maximum dictionary size

Example

{
    "step": "embedder",
    "type": "bow",
    "input_field": "text",
    "output_field": "embedded_text"
}

Methods Summary

`load`()	Load a step
`process_batch`(batch)	Process a batch of documents.
`save`()	Save a step
`train`(docs)	Train on a step of a set of documents

Methods Documentation

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

train(docs)#

Train on a step of a set of documents