FastTextClassifier

class squirro.lib.nlp.steps.classifiers.FastTextClassifier(config)

Bases: squirro.lib.nlp.steps.classifiers.Classifier

The fastText Classifier uses the fastText library for text embedding and classification provided by Facebook’s AI Research lab.

Input - all input fields need to be of type str.

Output - the output field is filled with data of type dict { str: float }. The key of the dict is the predicted class name and the value is the probability/confidence returned by the model.

Parameters
  • type (str) – fasttext

  • cutoff (int, 100000) – Cutoff for quantization

  • learning_rate (float, 1.0) – Learning rate

  • min_count (int, 1) – Minimum number of words appearances to be included in dictionary

  • min_prob (float, 0.0) – Minimum prediction probability to return

  • n_epochs (int, 25) – Number of training epochs

  • n_grams (int, 2) – N of N-grams

  • n_predictions (int, None) – Number of label predictions to return. By default this will be the number of unique labels.

  • quantize (bool, False) – Whether or not to quantize the model

Example

{
    "step": "classifier",
    "type": "fasttext",
    "input_fields": ["extract"],
    "output_field": "prediction",
    "label_field": "label",
}

Methods Summary

clean()

Clean step

load()

Load a step

process_doc(doc)

Process a document

save()

Save a step

train(docs)

Train on a step of a set of documents

Methods Documentation

clean()

Clean step

load()

Load a step

process_doc(doc)

Process a document

Parameters

doc (Document) – Document

Returns

Processed document

Return type

Document

save()

Save a step

train(docs)

Train on a step of a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)