# DocJoinFilter#

class squirro.lib.nlp.steps.filters.DocJoinFilter(config)#

The doc_join Filter joins multiple Document into one Document via id or a specified field.

Note - This step assumes the fields arrive in a pre-sorted state, so that when the value of the join_field (if specified) changes, a new Document is created. Fields from subsequent Documents with the same join_field will be appended until the join_field changes again.

Input - No specific input fields type required.

Output - The output field type is going to be of type list [ input field type ].

Parameters
• type (str) – doc_join

• fields (list, []) – Fields to join

• join_field (str, 'id') – Field to join on

Example

{
"step": "filter",
"type": "doc_join",
"fields":["sentences"]
}


Methods Summary

 process(docs) Process a set of documents train(docs) Train on a step of a set of documents

Methods Documentation

process(docs)#

Process a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)

train(docs)#

Train on a step of a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)