DocJoinFilter

class squirro.lib.nlp.steps.filters.DocJoinFilter(config)

Bases: squirro.lib.nlp.steps.filters.Filter

Joins many Documents together via id or a field. Currently this step assumes the fields arrive in a pre-sorted order, so that when the join_field (if specified) changes, a new Document is created. Fields from subsequent Documents with the same join_field will be appended until the join_field changes again.

Parameters
  • type (str) – doc_join

  • fields (list, []) – Fields to join

  • join_field (str, 'id') – Field to join on

Methods Summary

process(docs)

Process a set of documents

train(docs)

Train on a step of a set of documents

Methods Documentation

process(docs)

Process a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)

train(docs)

Train on a step of a set of documents

Parameters

docs (generator(Document)) – Generator of documents

Returns

Generator of processed documents

Return type

generator(Document)