AutoMLClassifier#

class AutoMLClassifier(config)#

Bases: Classifier

The AutoML Classifier uses the autogluon tabular library for AutoML

Input - all input fields need to be of type str.

Output - prediction of the extract as a string

Parameters:
  • type (str) – automl

  • preset (str) – Choose between presets “best_quality” “high_quality” “good_quality” “medium_quality”

  • hyperparameters (dict, None) – Hyperparameters for additional models. If None, the default hyperparameters are used.

  • fit_parameters (dict, None) – Parameters for the TabularPredictor.fit() function. If None, the default parameters are used.

Example

{
    "step": "classifier",
    "type": "automl",
    "input_fields": ["extract"],
    "output_field": "prediction",
    "label_field": "label",
    "hyperparameters": {"FASTTEXT": []},
    "fit_parameters": {"time_limit": 30, "num_bag_folds": 5}
}

Methods Summary

clean()

Clean step

evaluate(tabular_docs)

load()

Load a step

process_batch(batch)

Process a batch of documents.

train(docs)

Train on a step of a set of documents

Methods Documentation

clean()#

Clean step

evaluate(tabular_docs)#
load()#

Load a step

process_batch(batch)#

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters:

batch (list(Document)) – List of documents

Returns:

List of processed documents

Return type:

list(Document)

train(docs)#

Train on a step of a set of documents

Parameters:

docs (generator(Document)) – Generator of documents

Returns:

Generator of processed documents

Return type:

generator(Document)