SquirroEntityCleaner#
- class SquirroEntityCleaner(config)#
Bases:
Saver
Removes Squirro entities that match the field properties. This is used to ensure that no duplicate entities show up when a ml workflow is rerun on the same items.
Note - The fields client_id, client_secret, cluster, token and project_id do not need to be set if it is used inside the Squirro machinelearning service
- Parameters:
type (str) – squirro_entity_cleaner
client_id (str, None) – Squirro client id
client_secret (str, None) – Squirro client secret
cluster (str) – Squirro cluster URL
token (str) – Squirro token
project_id (str) – id of Squirro project
properties (dict, {}) – Properties that need to match for entities to be removed. If left empty, all entities for the items are removed.
delete_batch_size (int, 500) – How many entities to delete per deletion request.
Example
{ "step": "saver", "type": "squirro_entity_cleaner", "cluster": "CLUSTER", "token": "TOKEN", "project_id": "PROJECT_ID", "properties": {"model_id":"MODEL_ID"} }
Methods Summary
process_batch
(docs)Process a batch of documents.
Methods Documentation