AI Studio Step 2: Ground Truth
Contents
AI Studio Step 2: Ground Truth#
The Ground Truth is a set of text extracts with associated labels.
It is the basis of machine-learning AI Studio Step 3: Models, where a Model learns its behavior from the text extracts and labels in a ground truth set.
Structure#
A ground truth consists of up to three components, with several properties:
Metadata
Labeled extracts
Rules
Metadata#
id
: Unique identifier of the ground truthname
: Name of the ground truthdescription
: Description about the ground truthtype
: Type of the ground truth (accepted values,text
,text+proximity
)tagging_level
: Level on which the extracts get tagged on (accepted valuessentence
,document
)label
: List of labels which can be used for taggingcandidateset_ids
, List of Candidate Set ids, which are used to generate ground truth
Labeled Extracts#
id
: Unique identifier of the labeled extractextract
: Includes of a text section (e.g. a sentence if tagging_level is set to ‘sentence’)label
: Label which classifies theextract
language
: Language of the Squirro Itemkeywords
: Additional keywords from the Squirro Itemitem_id
: Squirro item id in which the extract was foundcandidateset_id
: Id of Candidate Set which helped to find the extract
Additionally, there is a temporal versioning component in place with the fields:
user
: Which user did a changevalidity
: Indicates if a label ispositive
considered as true ornegative
considered as falsecreated_at
: Time of creation
Rules#
Note
Rules apply in cases where the ground truth is of the type text+proximity
.
id
: Unique identifier of the rulequery
: Query text of the ruleproximity
: Allowed distance of the words with in thequery
is_sequence
: Booleantype
: Type of the rule (inclusive
,exclusive
)labeled_item_id
: Id from the labeled item which is connected to the rule
Ground Truth in AI Studio Overview#
All ground truths of a project are listed in the Ground Truths overview screen as shown below:
Tip: A new Ground Truth can be created by clicking on the plus button on the top right corner of the screen.
In addition to continuing labeling when selecting the Ground Truth name, other useful actions can be performed on ground truths, including the following:
Edit: Edit the configuration of a selected ground truth.
Delete: Delete an unused ground truth.
Build Model: Build a model based on the selected ground truth.
To create a ground truth, certain metadata is required, including the following:
Title: Title of the ground truth.
Description: Description of the ground truth.
Tagging Level: Level on which the extracts get tagged in the ground truth (only sentence level available by default).
Labels: List of labels that can be used for tagging in the ground truth.
Define Proximity Search Rules: Option to allow creating proximity rules in addition to tagging of the ground truth.
There are four different screens available after selecting Start Labeling:
List View
Focus View
Labeling Overview
Rules Overview
In the List view, all documents of a AI Studio Step 1: Candidate Sets are listed. As an overview, you see all titles, highlights, and already-tagged extracts of the documents from the selected candidate set.
The document can be tagged and studied in further detail on the right side.
In the left panel, you can switch between different candidate sets.
The proximity rule is defined after tagging a text extract if the Define Proximity Search rules option is activated in the configuration of the ground truth.
In the Focus view screen, you are guided to the highlighted section in the documents of the candidate set.
The Labeling overview screen allows you to get an overview of the already-tagged text extracts.
The Rules overview screen lists all defined proximity rules if the Define Proximity Search rules option is activated in the configuration of the ground truth.
Squirro Client Reference#
Ground Truth#
config = {
"type": "text",
"tagging_level": "sentences",
"label": ["dog", "no dog"],
"description": "In this ground truth we select sentences are dog or not dog related.",
"candidateset_ids": [CANDIDATE_SET_ID]
}
client.new_groundtruth(PROJECT_ID,'Dog Ground Truth',config)
config = {
"type": "text",
# the tagging_level cannot be changed
"label": ["dog", "no dog"],
"description": "In this ground truth we select sentences are dog or not dog related.",
"candidateset_ids": [CANDIDATE_SET_ID]
}
client.modify_groundtruth(PROJECT_ID, GROUNDTRUTH_ID, name='Dog Ground Truth (modified name)', config=config)
client.delete_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)
client.get_groundtruths(PROJECT_ID)
client.get_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)
Labeled Extract#
label = {
"item_id": SQUIRRO_ITEM_ID,
"extract": "The dog (Canis familiaris when considered a distinct species or Canis lupus familiaris when considered a subspecies of the wolf) is a domesticated carnivore of the family Canidae.",
"label": "dog",
"language": "en",
"keywords": {},
"candidateset_id": CANDIDATE_SET_ID,
}
client.new_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, label)
client.modify_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID, 'positive')
client.delete_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)
client.get_groundtruth_labels(PROJECT_ID, GROUNDTRUTH_ID)
client.get_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)
Rule#
rule = {
"query": "dog sitter",
"proximity": 6,
"is_sequence": True,
"type": "inclusive",
"labeled_item_id": LABELED_EXTRACT_ID,
}
client.new_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, rule)
client.modify_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID, rule)
client.delete_groundtruth_rule('PROJECT_ID, GROUNDTRUTH_ID, RULE_ID)
client.get_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID)