DocSplitFilter#
- class DocSplitFilter(config)#
Bases:
Filter
The doc_split
Filter
splits a singleDocument
into many via iterable fields of the same length.Input - all input fields need to be of type
list
[object
].Output - The output fields are going to be of type
object
.- Parameters
type (str) – doc_split
fields (list) – Fields to split on (must all be same length)
flatten (bool, False) – Whether or not to flatten the split fields
copy_fields (bool, False) – Whether to refer or copy the other fields of the document to the new generated documents. If you expect to mutate the other fields of the generated document in subsequent steps, then set this parameter to true. However, please note that this may come with a performance penalty when the document has many number of elements for the defined fields.
Example
{ "step": "filter", "type": "doc_split", "fields":["sentences"] }
Methods Summary
process
(docs)Process a set of documents
train
(docs)Train on a step of a set of documents
Methods Documentation
- process(docs)#
Process a set of documents