FastTextClassifier

FastTextClassifier#

class FastTextClassifier(config)#

Bases: Classifier

The fastText Classifier uses the fastText library for text embedding and classification provided by Facebook’s AI Research lab.

Input - all input fields need to be of type str.

Output - the output field is filled with data of type dict { str: float }. The key of the dict is the predicted class name and the value is the probability/confidence returned by the model.

Parameters:

type (str) – fasttext
cutoff (int, 100000) – Cutoff for quantization
learning_rate (float, 1.0) – Learning rate
min_count (int, 1) – Minimum number of words appearances to be included in dictionary
min_prob (float, 0.0) – Minimum prediction probability to return
n_epochs (int, 25) – Number of training epochs
n_grams (int, 2) – N of N-grams
n_predictions (int, None) – Number of label predictions to return. By default this will be the number of unique labels.
quantize (bool, False) – Whether or not to quantize the model

Example

{
    "step": "classifier",
    "type": "fasttext",
    "input_fields": ["extract"],
    "output_field": "prediction",
    "label_field": "label",
}

Methods Summary

`clean`()	Clean step
`load`()	Load a step
`process_doc`(doc)	Process a document
`save`()	Save a step
`train`(docs)	Train on a step of a set of documents

Methods Documentation

clean()#: Clean step

load()#: Load a step

process_doc(doc)#

Process a document

Parameters:: doc (Document) – Document
Returns:: Processed document
Return type:: Document

save()#: Save a step

train(docs)#

Train on a step of a set of documents

Parameters:: docs (generator(Document)) – Generator of documents
Returns:: Generator of processed documents
Return type:: generator(Document)

FastTextClassifier

Contents

FastTextClassifier#