How to Create a Sentence-Level Classification Model in AI Studio#
Profiles: Data Scientist, Model Creator, Search Engineer
This page describes how to create a sentence-level classification model in AI Studio.
It is geared towards data scientists, model creators, and search engineers working within a project that has appropriate data sources connected.
Reference: To learn how to create a document-level classification model, see How to Create a Document-Level Classification Model in AI Studio.
Prerequisites#
To create a sentence-level classification model, you must have the following:
An existing Squirro project you are logged into.
Either a Member or Administrater role in the project. (See Squirro Roles & Permissions for more information.)
Relevant data connected to the project. (See How to Load Data Using the UI for more information.)
Once you have these prerequisites, you are ready to get started with AI Studio.
Example Project#
This example sentence-level classification model is built using the Cognitive Search: Food Safety application available from start.squirro.com.
Using this application, you can follow this guide to build a binary sentence-level classification model.
Reference: To learn how to create a copy of this application, see How to Install A Squirro Application.
Note
This example uses sentence-level, binary classification.
Step 1: Create a New Ground Truth#
To create a new ground truth in AI Studio, follow the steps below:
1: Create and Configure a New Ground Truth#
Note
The first time you launch AI Studio within a project, you will be prompted to create a new ground truth by a welcome screen.
To create a new ground truth, follow the steps below:
Open your Squirro project.
Navigate to the Setup space.
Click the AI Studio tab.
Click Launch AI Studio.
On the Ground Truths screen, click Create a New Ground Truth.
The last step above launches a modal window to configure the ground truth with the following settings:
Title: Title of the ground truth as it will appear in AI Studio.
Description: A description of the ground truth.
Tagging Level: Level on which the extracts get tagged in the ground truth. Select Sentence Level.
Sentence Splitting: The sentence splitting method to use for the ground truth. As the project data is predominately composed of news coming from RSS feeds, select Machine Readable News (MRN).
Standard - The Standard option provides a general-purpose sentence splitter suitable for a wide range of text types. It offers balanced performance for common text processing tasks, making it versatile for various use cases, beyond specialized categories.
SFDC - Optimized for Salesforce data processing, the SFDC option ensures alignment with the structure and formatting of the CRM entries. It helps more accurately parse and analyze customer interactions, sales records, and other business-related text.
MRN - Designed to handle multiple news formats and maintain the contextual integrity of the information, the MRN option ensures precise sentence segmentation for accurate analysis and interpretation by machine learning models.
Labels: Create at least two labels to start. In this example, enter
poultry
andnon-poultry
to classify sentences by whether they contain references to poultry or not.
Caution
You cannot change your labels after creating the Ground Truth.
When you are finished, click Create Ground Truth, as shown in the example screenshot below:
2: Create a Candidate Set#
A candidate set is a set of text extracts that you use to generate your ground truth. Candidate sets help you identify quality text extracts for your ground truth in a large data universe.
All candidate sets within a project are listed in the Candidate Sets section of the Ground Truths page.
Note
By default, Squirro will create a candidate set for you using the name of your ground truth. In this example model build, you can simply edit the default candidate set to start (skip to Step 3 below).
To create a candidate set, follow the steps below:
On the Ground Truths page, click Create a New Candidate Set, as shown in the screenshot below:
Choose between creating a new candidate set, or copying from an existing set, and give it a name.
Click Edit Query.
Enter a search query to define the candidate set. You can use Query Syntax to create a more complex query. For this example, to identify poultry-related documents, use the query
poultry OR chicken OR turkey OR geese OR quail OR "game bird" OR hen OR rooster OR fowl
, as shown in the example screenshot below:
For a binary classification model like this example, you can create an anti set candidate set, but it is not required when performing sentence-level classification. With sentence-level classifications, you can automatically generate your anti set using proximity rules.
Click Proximity Rules.
Click Create a Proximity Rule.
Enter the query
poultry
, as anInclude
rule, for the labelpoultry
, then click Save, as shown in the example screenshot below:
Create two additional rules for
chicken
andturkey
.Click Bulk Label to open the Bulk Label modal window.
Select the label
poultry
, the anti setnot poultry
and click Label Data, as shown in the example screenshot below:
Wait for labeling to bulk labeling finish, then click Build Model.
Caution
If your project is still in the midst of performing bulk labeling and you click Build Model, the model will build with whatever labels have been processed at the time you clicked the button. For best results, allow bulk labeling to finish.
Step 2: Build Model#
With your documents labeled, you are now ready to build your model.
To build your model, follow the steps below:
Click the Build Model button.
Enter a name and description for the model.
Select a template from the list. In this example, select AutoML - Fast, as shown in the example screenshot below:
Remove any labels you do not want to be classified or shown in the model. (Leave as is for the example.)
Click Create Model.
Your model will now begin building. This process can take several minutes to complete.
Note: It will display Training under the Accuracy column of the Models Overview tab while it is building.
Once your model has finished building, you can view its validation metrics by clicking its accuracy score on the Models Overview tab.
For this example, you should see something that looks like the following:
Step 3: Publish Model#
Once you are satisfied with your model, you can publish it to your project by clicking the Publish icon next to the three-dot menu in the Models Overview tab or clicking the Publish button on the validation screen for the model.
Reference: Learn more about the final AI Studio Published step.