MLGroundTruthMixin#

class MLGroundTruthMixin#

Bases: object

Methods Summary

delete_groundtruth(project_id, groundtruth_id)

Delete Ground Truth

delete_groundtruth_label(project_id, ...)

Delete labeled extract

delete_groundtruth_rule(project_id, ...)

Delete rule

get_groundtruth(project_id, groundtruth_id)

Get a single Ground Truth.

get_groundtruth_item(project_id, ...[, ...])

Returns a item of the provided project enriched with Ground Truth data.

get_groundtruth_items(project_id, groundtruth_id)

Returns items for the provided project enriched with Ground Truth data.

get_groundtruth_label(project_id, ...)

Get a single labeled extract from a Ground Truth.

get_groundtruth_labels(project_id, ...[, ...])

Return the labeled extract of a ground truth for a project in a list.

get_groundtruth_labels_batched(project_id, ...)

Returns a generator that goes through all the valid labeled extracts from the provided groundtruth.

get_groundtruth_rule(project_id, ...)

Get a single rule of the Ground Truth.

get_groundtruth_rules(project_id, groundtruth_id)

Get all rules for the Ground Truth.

get_groundtruths(project_id)

Return all ground truth for a project in a list.

modify_groundtruth(project_id, groundtruth_id)

Modify an existing Ground Truth.

modify_groundtruth_label(project_id, ...[, ...])

Modify an existing labeled extract.

modify_groundtruth_rule(project_id, ...)

Modify an existing rule.

new_groundtruth(project_id, name, config)

Create a new Ground Truth.

new_groundtruth_label(project_id, ...)

Create a new labeled extract.

new_groundtruth_labels(project_id, ...)

Create multiple labeled extracts.

new_groundtruth_rule(project_id, ...)

Create a new rule in Ground Truth.

Methods Documentation

delete_groundtruth(project_id, groundtruth_id)#

Delete Ground Truth

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth.

delete_groundtruth_label(project_id, groundtruth_id, label_id)#

Delete labeled extract

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth.

  • label_id – Id of the labeled extract

delete_groundtruth_rule(project_id, groundtruth_id, rule_id)#

Delete rule

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth.

  • rule_id – Id of the rule

get_groundtruth(project_id, groundtruth_id)#

Get a single Ground Truth.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the GroundTruth

Examples

Get a single groundtruth:

>>> client.get_groundtruth(
            project_id="DSuNrcnlSc6x5SJZh02IyQ",
            groundtruth_id="n57sJlpcTLy4_XxdvHvx5g")
{'bulk_labeling_status': 'no bulk labelings',
'config': {'candidatesets': [], 'labels': ['yes', 'no']},
'created_at': '1996-03-13T00:00:00',
'id': 'n57sJlpcTLy4_XxdvHvx5g',
'modified_at': '2015-06-09T00:00:00',
'name': 'Commercial final fly share white focus voice.',
'project_id': 'DSuNrcnlSc6x5SJZh02IyQ',
'bulk_labeling_status': 'no bulk labelings'
'rules': {}}
get_groundtruth_item(project_id, groundtruth_id, item_id, highlight_query='', user_id=None, temporal_version='2024-12-21T06:11:17.885151', label=None, include_sentences=False)#

Returns a item of the provided project enriched with Ground Truth data.

Parameters:
  • project_id – Id of the Squirro project

  • groundtruth_id – Id of the GroundTruth

  • item_id – Id of the item

  • highlight_query – query containing highlight information

  • user_id – Id of the user to filter Ground Truth by

  • temporal_version – Temporal version of the Ground Truth

  • label – Label tag to filter Ground Truth by

  • include_sentences – Flag to return documents split in sentences

Returns:

get_groundtruth_items(project_id, groundtruth_id, user_id=None, temporal_version=None, label=None, labelled_filter=None, highlight_filter=False, **kwargs)#

Returns items for the provided project enriched with Ground Truth data.

Parameters:
  • project_id – Id of the Squirro project

  • groundtruth_id – Id of the GroundTruth

  • user_id – Id of the user to filter Ground Truth by

  • temporal_version – temporal version of the Ground Truth

  • label – label to filter Ground Truth by

  • labelled_filter – filter if all items, only the already labelled or only the unlabelled items should get returned (accepted values:’all’,’labelled’ and ‘not_labelled’)

:param highlight_filter : filter if only the highlighted items should get returned :type kwargs: :param kwargs: Additional query parameters. All keyword arguments are

passed on verbatim to the API.

Returns:

get_groundtruth_label(project_id, groundtruth_id, label_id)#

Get a single labeled extract from a Ground Truth.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the GroundTruth

  • label_id – Id of the labeled extract

get_groundtruth_labels(project_id, groundtruth_id, user_id=None, temporal_version=None, label=None, extract_query=None, item_ids=[], count=None, start=None)#

Return the labeled extract of a ground truth for a project in a list. Note: to avoid timeout issues in large ground truths, use get_groundtruth_labels_batched instead.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the GroundTruth

  • user_id – Id of the user to filter Ground Truth by

  • temporal_version – temporal version of the Ground Truth

  • label – label to filter Ground Truth by

Param:

item_ids: item_ids to filter Ground Truth by

Param:

count: num of elements to retrieve of the Ground Truth

Param:

start: pagination offset for the retrieval of the Ground Truth

get_groundtruth_labels_batched(project_id, groundtruth_id, batch_size=1000, temporal_version=None)#

Returns a generator that goes through all the valid labeled extracts from the provided groundtruth. The generator is only valid for a short period of time, as it uses Elasticsearch’s PIT API under the hood. It is preferable to use this method instead of get_groundtruth_labels when the number of labels is large, as it avoids timeout issues.

Parameters:
  • project_id (str) – The ID of the project.

  • groundtruth_id (str) – The ID of the groundtruth.

  • batch_size (int, optional) – The size of the batch requested to the API. Defaults to 1000. If time out issues are encountered, try to reduce this value.

  • temporal_version (str, optional) – The datetime string corresponding to the Ground Truth temporal version. Defaults to None.

Yields:

dict – A dictionary containing information about a labeled extract, including the extract text, keywords, item ID, etc.

get_groundtruth_rule(project_id, groundtruth_id, rule_id)#

Get a single rule of the Ground Truth.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the GroundTruth

  • rule_id – Id of the rule

get_groundtruth_rules(project_id, groundtruth_id)#

Get all rules for the Ground Truth.

Parameters:
  • project_id – Id of the Squirro project

  • groundtruth_id – Id of the GroundTruth

get_groundtruths(project_id)#

Return all ground truth for a project in a list.

Parameters:

project_id – Id of the Squirro project.

modify_groundtruth(project_id, groundtruth_id, name=None, config=None)#

Modify an existing Ground Truth.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth.

  • name – Name of the Ground Truth.

  • config – Dictionary of Ground Truth config.

modify_groundtruth_label(project_id, groundtruth_id, label_id, validity, label=None)#

Modify an existing labeled extract.

Parameters:
  • project_id – Id of the Squirro project

  • groundtruth_id – Id of the Ground Truth

  • label_id – Id of the labeled extract

  • validity – validity of the labeled extract

  • label – label of the labeled extract

modify_groundtruth_rule(project_id, groundtruth_id, rule_id, rule)#

Modify an existing rule.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth.

  • rule_id – Id of the rule

  • rule – information of the rule.

new_groundtruth(project_id, name, config)#

Create a new Ground Truth.

Parameters:
  • project_id – Id of the Squirro project.

  • name – Name of the Ground Truth.

  • config – Ground Truth Config.

new_groundtruth_label(project_id, groundtruth_id, label)#

Create a new labeled extract.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth

  • label – information of the labeled extract.

new_groundtruth_labels(project_id, groundtruth_id, labels)#

Create multiple labeled extracts.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth.

  • labels – list of dicts, where each dict contains information of a labeled extract.

new_groundtruth_rule(project_id, groundtruth_id, rule)#

Create a new rule in Ground Truth.

Parameters:
  • project_id – Id of the Squirro project.

  • groundtruth_id – Id of the Ground Truth

  • rule – information of the rule.