AI Studio Step 2: Ground Truth#

The Ground Truth is a set of text extracts with associated labels.

It is the basis of machine-learning AI Studio Step 3: Models, where a Model learns its behavior from the text extracts and labels in a ground truth set.

Structure#

A ground truth consists of up to three components, with several properties:

  • Metadata

  • Labeled extracts

  • Rules

Metadata#

  • id: Unique identifier of the ground truth

  • name: Name of the ground truth

  • description: Description about the ground truth

  • type: Type of the ground truth (accepted values, text, text+proximity)

  • tagging_level: Level on which the extracts get tagged on (accepted values sentence, document)

  • label: List of labels which can be used for tagging

  • candidateset_ids, List of Candidate Set ids, which are used to generate ground truth

Labeled Extracts#

  • id: Unique identifier of the labeled extract

  • extract: Includes of a text section (e.g. a sentence if tagging_level is set to ‘sentence’)

  • label: Label which classifies the extract

  • language: Language of the Squirro Item

  • keywords: Additional keywords from the Squirro Item

  • item_id: Squirro item id in which the extract was found

  • candidateset_id: Id of Candidate Set which helped to find the extract

Additionally, there is a temporal versioning component in place with the fields:

  • user: Which user did a change

  • validity: Indicates if a label is positive considered as true or negative considered as false

  • created_at: Time of creation

Rules#

Note

Rules apply in cases where the ground truth is of the type text+proximity.

  • id: Unique identifier of the rule

  • query: Query text of the rule

  • proximity: Allowed distance of the words with in the query

  • is_sequence: Boolean

  • type: Type of the rule (inclusive, exclusive)

  • labeled_item_id: Id from the labeled item which is connected to the rule

Ground Truth in AI Studio Overview#

All ground truths of a project are listed in the Ground Truths overview screen as shown below:

image1

Tip: A new Ground Truth can be created by clicking on the plus button on the top right corner of the screen.

In addition to continuing labeling when selecting the Ground Truth name, other useful actions can be performed on ground truths, including the following:

  • Edit: Edit the configuration of a selected ground truth.

  • Delete: Delete an unused ground truth.

  • Build Model: Build a model based on the selected ground truth.

image2

To create a ground truth, certain metadata is required, including the following:

  • Title: Title of the ground truth.

  • Description: Description of the ground truth.

  • Tagging Level: Level on which the extracts get tagged in the ground truth (only sentence level available by default).

  • Labels: List of labels that can be used for tagging in the ground truth.

  • Define Proximity Search Rules: Option to allow creating proximity rules in addition to tagging of the ground truth.

There are four different screens available after selecting Start Labeling:

  • List View

  • Focus View

  • Labeling Overview

  • Rules Overview

In the List view, all documents of a AI Studio Step 1: Candidate Sets are listed. As an overview, you see all titles, highlights, and already-tagged extracts of the documents from the selected candidate set.

The document can be tagged and studied in further detail on the right side.

In the left panel, you can switch between different candidate sets.

image3

The proximity rule is defined after tagging a text extract if the Define Proximity Search rules option is activated in the configuration of the ground truth.

image4

In the Focus view screen, you are guided to the highlighted section in the documents of the candidate set.

image5

The Labeling overview screen allows you to get an overview of the already-tagged text extracts.

image6

The Rules overview screen lists all defined proximity rules if the Define Proximity Search rules option is activated in the configuration of the ground truth.

image7

Share Labeling#

Ground truth labeling is a shared activity that should include as many business domain experts as possible. This can clash with the Squirro permissions model that, by default, only allows project administrators access to the AI Studio.

To solve this, a ground truth labeling view can be shared with all users.

Setup#

If you have item-level access control on your project (usually via Query Templates), then do not enable this feature yet. The shared view retrieves items in administrator mode and could thus return items the user should not have access to.

To share a ground truth labeling view with users, follow this process:

  1. Enable the feature in the Configuration Service: Go to the Server space, navigate to Configuration and then find and enable the configuration frontend.userapp.share-gt-labeling-non-admin.

Note: This step only needs to be done once on the server. When making this change it will take up to a minute to take effect and will require a page reload to apply.

  1. Navigate to the AI Studio and open the training view of the ground truth you want to share.

    image11

  2. Click Share Labeling. In the dialog that opens chose which sentences you would like to present to your users:

    • Show highlighted sentences only: present the user with sentences that have been highlighted from the candidate set query.

    • Show all sentences: show all sentences from items that match the candidate set query.

    image12

  3. Copy the resulting share link to the clipboard and share it with your users.

Usage#

When following the shared link, your users will be able to access the separate labeling view. They will still need at least Reader access to the Squirro project.

Reference: For more information on Squirro permissions, see Squirro Roles & Permissions.

image13

On the labeling view, users are presented with a reduced ground truth view showing the Focus View.

By default, all selected sentences are presented to the user (as per the selection done in the dialog earlier). The user can limit the view to only show the labeled or unlabeled sentences. This is done using the filter button next to the search bar.

The user can also navigate between the different candidate sets in the menu on the left side and can use the search bar to reduce the presented sentences.

Squirro Client Reference#

Ground Truth#

config = {
            "type": "text",
            "tagging_level": "sentences",
            "label": ["dog", "no dog"],
            "description": "In this ground truth we select sentences are dog or not dog related.",
            "candidateset_ids": [CANDIDATE_SET_ID]
        }

client.new_groundtruth(PROJECT_ID,'Dog Ground Truth',config)

config = {
            "type": "text",
            # the tagging_level cannot be changed
            "label": ["dog", "no dog"],
            "description": "In this ground truth we select sentences are dog or not dog related.",
            "candidateset_ids": [CANDIDATE_SET_ID]
        }

client.modify_groundtruth(PROJECT_ID, GROUNDTRUTH_ID, name='Dog Ground Truth (modified name)', config=config)

client.delete_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)

client.get_groundtruths(PROJECT_ID)

client.get_groundtruth(PROJECT_ID, GROUNDTRUTH_ID)

Labeled Extract#

label = {
    "item_id": SQUIRRO_ITEM_ID,
    "extract": "The dog (Canis familiaris when considered a distinct species or Canis lupus familiaris when considered a subspecies of the wolf) is a domesticated carnivore of the family Canidae.",
    "label": "dog",
    "language": "en",
    "keywords": {},
    "candidateset_id": CANDIDATE_SET_ID,
}

client.new_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, label)

client.modify_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID, 'positive')

client.delete_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)

client.get_groundtruth_labels(PROJECT_ID, GROUNDTRUTH_ID)

client.get_groundtruth_label(PROJECT_ID, GROUNDTRUTH_ID, LABELED_EXTRACT_ID)

Rule#

rule = {
    "query": "dog sitter",
    "proximity": 6,
    "is_sequence": True,
    "type": "inclusive",
    "labeled_item_id": LABELED_EXTRACT_ID,
}
client.new_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, rule)

client.modify_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID, rule)

client.delete_groundtruth_rule('PROJECT_ID, GROUNDTRUTH_ID, RULE_ID)

client.get_groundtruth_rule(PROJECT_ID, GROUNDTRUTH_ID, RULE_ID)