SklearnClassifier

SklearnClassifier#

class SklearnClassifier(config)#

Bases: Classifier

The sklearn Classifier is a generic scikit-learn classifier.

Note - currently we support following scikit-learn classifier:

Input - the input field need to be of type list [ float or int ] or numpy.ndarray. In addition also the types str, float, int, list [ str ] and numpy.sparse matrices are supported.

Output - the output field is filled with data of type str or dict { str: float } if the model has the property “predict_proba”. The key of the dict is the predicted class name and the value is the probability/confidence returned by the model.

Parameters:

type (str) – sklearn
model_type (str) – Type of scikit learn model
model_kwargs (dict, {}) – Keyword arguments for scikit learn model
use_sparse (bool, False) – Whether or not to use a sparse input matrix

Example

{
    "step": "classifier",
    "type": "sklearn",
    "model_type": "GaussianNB",
    "model_kwargs": {},
    "label_field": "label",
    "input_fields": ["embedded_extract"],
    "output_field": "prediction",
}

Methods Summary

`load`()	Load a step
`process_batch`(batch)	Process a batch of documents.
`save`()	Save a step
`train`(docs)	Train on a step of a set of documents

Methods Documentation

load()#: Load a step

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters:: batch (list(Document)) – List of documents
Returns:: List of processed documents
Return type:: list(Document)

save()#: Save a step

train(docs)#

Train on a step of a set of documents

Parameters:: docs (generator(Document)) – Generator of documents
Returns:: Generator of processed documents
Return type:: generator(Document)

SklearnClassifier

Contents

SklearnClassifier#