# SklearnProjector#

class squirro.lib.nlp.steps.projectors.SklearnProjector(config)#

The generic scikit-learn Projector step projects from one vector space to another. For more info see Decomposition.

Note - So far only svd is supported -> TruncatedSVD

Input - all input fields need to be of type list [ float or int ] or numpy.ndarray

Output - all output fields are filled with data of type numpy.ndarray with shape (‘n_components’,)

Parameters
• type (str) – sklearn

• model_type (str, 'svd') – Type of scikit-learn projection

• model_kwargs (dict, {}) – Keyword arguments for the scikit-learn model

• n_components (int) – Number of vector components after projection

• normalize_output (bool, True) – Whether or not to normalize the output

Example

{
"step": "projector",
"type": "sklearn",
"model_type": "svd",
"n_components": 100,
"input_field": "embedded_text",
"output_field": "embedded_text"
}


Methods Summary

 Load a step process_batch(batch) Process a batch of documents. Save a step train(docs) Train on a step of a set of documents

Methods Documentation

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters

batch (list(Document)) – List of documents

Returns

List of processed documents

Return type
save()#

Save a step

train(docs)#

Train on a step of a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)