SklearnProjector

SklearnProjector#

class SklearnProjector(config)#

Bases: Projector

The generic scikit-learn Projector step projects from one vector space to another. For more info see Decomposition.

Note - So far only svd is supported -> TruncatedSVD

Input - all input fields need to be of type list [ float or int ] or numpy.ndarray

Output - all output fields are filled with data of type numpy.ndarray with shape (‘n_components’,)

Parameters:

type (str) – sklearn
model_type (str, 'svd') – Type of scikit-learn projection
model_kwargs (dict, {}) – Keyword arguments for the scikit-learn model
n_components (int) – Number of vector components after projection
normalize_output (bool, True) – Whether or not to normalize the output

Example

{
    "step": "projector",
    "type": "sklearn",
    "model_type": "svd",
    "n_components": 100,
    "input_field": "embedded_text",
    "output_field": "embedded_text"
}

Methods Summary

`load`()	Load a step
`process_batch`(batch)	Process a batch of documents.
`save`()	Save a step
`train`(docs)	Train on a step of a set of documents

Methods Documentation

load()#: Load a step

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters:: batch (list(Document)) – List of documents
Returns:: List of processed documents
Return type:: list(Document)

save()#: Save a step

train(docs)#

Train on a step of a set of documents

Parameters:: docs (generator(Document)) – Generator of documents
Returns:: Generator of processed documents
Return type:: generator(Document)

SklearnProjector

Contents

SklearnProjector#