Squirro Client Reference#
This page provides AI Studio reference for the SquirroClient (Python SDK).
Ground Truth#
config = {
"type": "text",
"tagging_level": "sentences",
"label": ["dog", "no dog"],
"description": "In this ground truth we select sentences are dog or not dog related.",
"candidateset_ids": [CANDIDATE_SET_ID]
}
client.new_groundtruth(PROJECT_ID,'Dog Ground Truth',config)
config = {
"type": "text",
# the tagging_level cannot be changed
"label": ["dog", "no dog"],
"description": "In this ground truth we select sentences are dog or not dog related.",
"candidateset_ids": [CANDIDATE_SET_ID]
}
client.modify_groundtruth(PROJECT_ID, GROUNDTRUTH_ID, name='Dog Ground Truth (modified name)', config=config)
client.delete_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)
client.get_groundtruths(PROJECT_ID)
client.get_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)
Labeled Extract#
label = {
"item_id": SQUIRRO_ITEM_ID,
"extract": "The dog (Canis familiaris when considered a distinct species or Canis lupus familiaris when considered a subspecies of the wolf) is a domesticated carnivore of the family Canidae.",
"label": "dog",
"language": "en",
"keywords": {},
"candidateset_id": CANDIDATE_SET_ID,
}
client.new_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, label)
client.modify_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID, 'positive')
client.delete_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)
client.get_groundtruth_labels(PROJECT_ID, GROUNDTRUTH_ID)
client.get_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)
Rule#
rule = {
"query": "dog sitter",
"proximity": 6,
"is_sequence": True,
"type": "inclusive",
"labeled_item_id": LABELED_EXTRACT_ID,
}
client.new_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, rule)
client.modify_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID, rule)
client.delete_groundtruth_rule('PROJECT_ID, GROUNDTRUTH_ID, RULE_ID)
client.get_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID)
Labels Balancer#
The balancer
step uniforms the distribution of the number of elements per class in a data set. Balancing is needed to allow the ML algorithm to learn more generally instead of over fitting to the largest populated class bucket.
Note
The balancer
only works within a batch if the batch size is smaller than the data set size.
Parameters#
class_field
: key name in which the classes are located.classes
: list of all classes which are used in the classification.not_class
: boolean which states if a not class should be instantiated or not.output_label_field
: field in which the label are stored (only important ifnot_class
isTrue
).deviation
(optional): Max deviation from the smallest class bucket to the largest bucket (1. = 100%, 0. = 0%).seed
(optional): Seed for the randomization process.
Example#
{
"step": "balancer",
"type": "balancer",
"name": "balancer",
"classes": ["A","B","C","D"],
"class_field": "label",
"not_class": false
"output_label_field": "balanced_label"
}
Data Randomizer#
The randomizer
step shuffles the order of the documents. The randomization of the data set allows the ML algorithm to come up with a more generally-applicable solution.
Note
Documents are only shuffled within a batch if the batch size is smaller than the data set size.
Parameters#
seed
(optional): Seed for the randomization process
Example#
{
"step": "randomizer",
"type": "randomizer"
}
Batch Randomizer#
This step creates a checkpoint and shuffles the order of the batches before the execution of the next step.
Parameters#
checkpoint_processing
(optional): Boolean which indicates if a checkpoint gets created in a non-training execution.
Example#
{
"step": "batch_randomizer",
"type": "batch_randomizer",
"checkpoint_processing": true
}
Squirro Ground Truth loader#
The squirro_groundtruth
step loads your ground truth and transforms it into the Document
structure so that the data points can be used in the pipeline for training and validating a model.
Parameters#
temporal_version
: Date which defines which Ground Truth version should get selectedgroundtruth_id
: Id of the Squirro Ground Truthproject_id
: Id of Squirro projectcluster
: URL of the clustertoken
: Squirro token
Example#
{
"step": "loader",
"type": "squirro_groundtruth",
"fields": [],
"temporal_version": "2020-10-07T16:24:01.36052",
"groundtruth_id": GROUNDTRUTH_ID,
"project_id": PROJECT_ID,
"cluster": CLUSTER,
"token": TOKEN
}