SpacesTokenizer#

class SpacesTokenizer(config)#

Bases: Tokenizer

Spaces Tokenizer that splits the input fields on spaces.

Input - all input fields need to be of type str

Output - all output fields are filled with data of type list [ str ]

Parameters

type (str) – spaces

Example

{
    "step": "tokenizer",
    "type": "spaces",
    "input_fields": ["body"],
    "output_fields": ["words"]
}

Methods Summary

process_doc(doc)

Process a document

Methods Documentation

process_doc(doc)#

Process a document

Parameters

doc (Document) – Document

Returns

Processed document

Return type

Document