SquirroEntityCleaner

class squirro.lib.nlp.steps.savers.SquirroEntityCleaner(config)

Bases: squirro.lib.nlp.steps.savers.Saver

Removes Squirro entities that match the field properties. This is used to ensure that no duplicate entities show up when a ml workflow is rerun on the same items.

Note - The fields client_id, client_secret, cluster, token and project_id do not need to be set if it is used inside the Squirro machinelearning service

Parameters
  • type (str) – squirro_entity_cleaner

  • client_id (str, None) – Squirro client id

  • client_secret (str, None) – Squirro client secret

  • cluster (str) – Squirro cluster URL

  • token (str) – Squirro token

  • project_id (str) – id of Squirro project

  • properties (dict, {}) – Properties that need to match for entities to be removed. If left empty, all entities for the items are removed.

  • delete_batch_size (int, 500) – How many entities to delete per deletion request.

Example

{
    "step": "saver",
    "type": "squirro_entity_cleaner",
    "cluster": "CLUSTER",
    "token": "TOKEN",
    "project_id": "PROJECT_ID",
    "properties": {"model_id":"MODEL_ID"}
}

Methods Summary

process_batch(docs)

Process a batch of documents.

Methods Documentation

process_batch(docs)

Process a batch of documents. If not defined will default to using self.process_doc for each document in the batch.

Parameters

batch (list(Document)) – List of documents

Returns

List of processed documents

Return type

list(Document)