SpacesTokenizer#
- class SpacesTokenizer(config)#
Bases:
Tokenizer
Spaces
Tokenizer
that splits the input fields on spaces.Input - all input fields need to be of type
str
Output - all output fields are filled with data of type
list
[str
]- Parameters:
type (str) – spaces
Example
{ "step": "tokenizer", "type": "spaces", "input_fields": ["body"], "output_fields": ["words"] }
Methods Summary
process_doc
(doc)Process a document
Methods Documentation