SquirroEntityCleaner

class squirro.lib.nlp.steps.savers.SquirroEntityCleaner(config)

Bases: squirro.lib.nlp.steps.savers.Saver

Remove entities that match the properties.

This is used to ensure that no duplicate entities show up when a pipeline is rerun on the same items.

Parameters
  • type (str) – squirro_entity_cleaner

  • client_id (str, None) – Squirro client id

  • client_secret (str, None) – Squirro client secret

  • cluster (str) – Squirro cluster URL

  • project_id (str) – id of Squirro project

  • token (str) – Squirro token

  • properties (dict, {}) – Properties that need to match for entities to be removed. If left empty, all entities for the items are removed.

  • delete_batch_size (int, 500) – How many entities to delete per deletion request.

Methods Summary

process_batch(docs)

Process a batch of documents.

Methods Documentation

process_batch(docs)

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters

batch (list(Document)) – List of documents

Returns

List of processed documents

Return type

list(Document)