DocSplitFilter

class squirro.lib.nlp.steps.filters.DocSplitFilter(config)

Bases: squirro.lib.nlp.steps.filters.Filter

Splits a single Document into many via iterable fields of the same length.

Parameters
  • type (str) – doc_split

  • fields (list) – Fields to split on (must all be same length)

  • flatten (bool, False) – Whether or not to flatten the split fields

  • copy_fields (bool, False) – Whether to refer or copy the other fields of the document to the new generated documents. If you expect to mutate the other fields of the generated document in subsequent steps, then set this parameter to true. However, please note that this may come with a performance penalty when the document has many number of elements for the defined fields.

Methods Summary

process(docs)

Process a set of documents

train(docs)

Train on a step of a set of documents

Methods Documentation

process(docs)

Process a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)

train(docs)

Train on a step of a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)