StopwordsNormalizer#

class StopwordsNormalizer(config)#

Bases: Normalizer

The stopwords Normalizer strips stopwords from the text

Input - all input fields need to be of type str.

Output - all output fields are filled with data of type str.

Parameters
  • type (str) – stopwords

  • stopwords (list, []) – List of stopwords

  • stopwords_file (str, None) – Path to the file with list of stopwords

Example

{
    "step": "normalizer",
    "type": "stopwords",
    "input_fields": ["body"],
    "output_fields": ["normalized_body"],
    "stopwords": ["the","stopwords"]
}

Methods Summary

load()

Load a step

process_doc(doc)

Process a document

save()

Save a step

Methods Documentation

load()#

Load a step

process_doc(doc)#

Process a document

Parameters

doc (Document) – Document

Returns

Processed document

Return type

Document

save()#

Save a step