KEE Studio Plugin Tutorial#

This tutorial takes you through the steps that are required for creating a Known Entity Extraction (KEE) using the studio plugin in the UI. We assume that you are familiar with loading data into a Squirro project.

The company names in the NASDAQ company list is our structured data (known entities) that we want to match (extract) in unstructured news data. You can use the provided sample items from the New York Times Business RSS feed or load any other (news) items on which you want to apply the KEE.

Download:

Loading Data#

If you have not already setup a Squirro project, it is now a good time to do so and load the data. You can, for example, use the JSON data loader plugin to load the provided sample data. Navigate to the DATA tab in the Squirro Setup space, click the plus icon and select the JSON plugin under Data Import. Upload the file and follow the steps in the data loader plugin.

Define KEE#

In the Squirro Setup space select the AI STUDIO tab and click on Known Entity Extraction. Add a new KEE by clicking the plus icon in the to right corner.

../../_images/kee-selection_kee_studio_plugin-tutorial1.png

A form where you can specify the configuration appears:

../../_images/definition_kee_studio_plugin-tutorial.png

Enter the following data as a new Known Entity Extraction configuration:

Configuration

Value

Name

NASDAQ KEE

KEE data

Upload the file nasdaq_companylist.xlsx

ID field

Symbol

Matching fields

Name

Keywords to assign


industry
Name -> company
Sector -> sector

Minimum score for matches

0.8

Enable fuzzy matching

No

Enable company suffix list

Yes

Enable ngram database

Yes

Click the SAVE button. A new enrichment (pipelet) named NASDAQ KEE has been created and you can now add it to the pipeline.

Add KEE to the Pipeline#

Navigate to the PIPELINE tab and edit the pipeline in which you want to add the KEE step. You can easily find the NASDAQ KEE step by searching it in the pipeline editor. Drag the NASDAQ KEE to the Relate field of the pipeline:

../../_images/pipeline_kee_studio_plugin-tutorial.png

Save the pipeline. Now the KEE step is part of the pipeline and is applied to any newly indexed item that runs through this pipeline. If you want to apply the KEE step to existing items in the index, you have to rerun the KEE step for those items.

Rerun#

In the pipeline editor, edit the pipeline containing the KEE step. Hover over the KEE step, click the menu icon (three dots) and select Rerun from the dropdown menu. Check the Run linked steps option and click Rerun. See Pipeline Reruns for information on rerunning an individual pipeline step.

../../_images/rerun_kee_studio_plugin-tutorial.png

Inspect the Keywords#

You can check the keywords tagged with KEE on the EXPLORE tab and searching for company:*:

../../_images/check-items_kee_studio_plugin-tutorial.png