SpacesTokenizer#
- class SpacesTokenizer(config)#
Bases:
TokenizerSpaces
Tokenizerthat splits the input fields on spaces.Input - all input fields need to be of type
strOutput - all output fields are filled with data of type
list[str]- Parameters:
type (str) – spaces
Example
{ "step": "tokenizer", "type": "spaces", "input_fields": ["body"], "output_fields": ["words"] }
Methods Summary
process_doc(doc)Process a document
Methods Documentation