# CosineSimilarityClassifier#

class squirro.lib.nlp.steps.classifiers.CosineSimilarityClassifier(config)#

The cosine similarity Classifier uses the cosine similarity to decide which text fragment is closest to which class.

Input - the input field need to be of type list [ float or int ] or numpy.ndarray

Output - the output field is filled with data of type dict { str : float }. The key of the dict is the predicted class name and the value is the cosine distance to the closest reference data point.

Parameters

type (str) – cosine_similarity

Example

{
"step": "classifier",
"type": "cosine_similarity",
"label_field": "label",
"input_field": "embedded_extract",
"output_field": "prediction",
}


Methods Summary

 Load a step process(docs) Process a set of documents process_batch(batch) Process a batch of documents. Save a step train(docs) Train on a step of a set of documents

Methods Documentation

process(docs)#

Process a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters

batch (list(Document)) – List of documents

Returns

List of processed documents

Return type
save()#

Save a step

train(docs)#

Train on a step of a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)