DocJoinFilter#

class DocJoinFilter(config)#

Bases: Filter

The doc_join Filter joins multiple Document into one Document via id or a specified field.

Note - This step assumes the fields arrive in a pre-sorted state, so that when the value of the join_field (if specified) changes, a new Document is created. Fields from subsequent Documents with the same join_field will be appended until the join_field changes again.

Input - No specific input fields type required.

Output - The output field type is going to be of type list [ input field type ].

Parameters:
  • type (str) – doc_join

  • fields (list, []) – Fields to join

  • join_field (str, 'id') – Field to join on

Example

{
    "step": "filter",
    "type": "doc_join",
    "fields":["sentences"]
}

Methods Summary

process(docs)

Process a set of documents

train(docs)

Train on a step of a set of documents

Methods Documentation

process(docs)#

Process a set of documents

Parameters:

docs (generator(Document)) – Generator of documents

Returns:

Generator of processed documents

Return type:

generator(Document)

train(docs)#

Train on a step of a set of documents

Parameters:

docs (generator(Document)) – Generator of documents

Returns:

Generator of processed documents

Return type:

generator(Document)