Labels#

This page explains labels, which are fields within an item that includes structured metadata which describes the item. Labels can be populated directly from the source of data (like a database record or an Excel spreadsheet), or can be added by an enrichment.

Terminology Note#

In the Squirro API, the term labels may vary depending on the context. These labels can also be known as keywords or tags. Additionally, within certain areas of Squirro, particularly in the codebase, they are referred to as facets. For all practical purposes, facets and labels can be considered synonymous and treated equivalently.

A keyword is defined as a key/value pair assigned to an individual item. In this context, it may be referred to as a keyword within the API or as a tag in the user interface.

Using Labels#

This section shows how the different data types of labels affect their usage in queries and dashboards. Labels are typically set up within a project using a facets.json file used by the data loader.

Overview#

Once data is added to a project, labels appear on the left side of the search screen. Selecting a specific label value drills down to display only the items associated with that particular label. Additionally, labels can be visualized using a dashboard widget.

Searching Label Values#

General Information on basic search syntax, as well as the different types of searches supported by Squirro can be found on the Query Processing page.

String Labels#

Searchable String Labels

String labels which have the Searchable property set to True will have all of the values stored within that label available for full text search.

For example, if a document contains the value “United States of America” within a searchable string label, the document will be returned by a search for the term “America”, or a search for “States”.

In addition to full text search, the values in searchable string labels are also available for exact value searches (see below).

Non-Searchable String Labels

While values in non-searchable string labels are not available for full text search, searches for exact label values can be run against values in non-searchable string labels.

Searching for an exact label value is done by specifying the name of the label, then a colon, then the exact name of the label value to search for, with no spaces between. If the value to search for includes spaces, the value must be enclosed in quotes.

Company:Enron
Person:"John Smith"
"First Name":John

In this example, documents which include other variations of Enron, such as “Enron, Inc.”, “Enron Corp”, or any other variation will not be returned as results.

Another consequence of not being searchable is that non-searchable string labels cannot be used for sorting alphabetically based on label values.

Datetime Labels#

Similarly to string labels, you can search for a document which has a specific value within a datetime label.

When using a date string in a search, the date string must be quoted

2016-04-04T00:00:00 --> "2016-04-04T00:00:00"
date_facet:"2016-04-04T00:00:00"

Searches for ranges of values within datetime labels can also be done by using the “>” or “<” operators.

date_facet >= "2016-02-01T00:00:00"

Numeric Labels#

Exact numeric values can be searched for with the usual colon syntax:

int_facet:3
float_facet:12.73

Just like datetime labels, searches for ranges of values within numeric labels can also be done by using the “>” or “<” operators.

int_facet > 2
float_facet <= 20

Using Labels in Dashboards#

Labels are the source for rich dashboard charts and filters. Each label type opens up different capabilities in the dashboards.

String Labels#

When using a string label in a dashboard, the selected visualization will most often be used to show how many documents include each possible value of that label.

Datetime Labels#

For widgets that show changes in information over time, such as line charts, datetime labels can be used in place of the creation date for the document to determine the X-axis position of a result on the line chart.

Numeric Labels#

Numeric labels can be used in dashboard widgets to either visualize the data in the numeric label directly, or visualize an aggregation of the numeric data based on common values within another label.

Visualizing Label Values Directly#

When numeric label values are visualized directly, the visualization behaves similarly to a string label, and the different values within the numeric label are shown by the widget.

In many cases, a better option is to use a pipelet that breaks the numeric label data out into ranges (<1, 1-10, 10-50, 50+, etc.).

These ranges can be stored in a separate string label, then visualized more effectively.

Aggregating Label Values#

When numeric Label values are aggregated, the widget shows an aggregation (sum, average, …) of the values in the numeric label, across all of the documents which have a common value in another label.

The second label used for aggregation is most often a string label.

Both the Table widget and Line chart widget support aggregations of numeric labels based on second labels.

Sorting#

Within a table widget in a dashboard, documents can be sorted based on the values that they contain in one or more labels.

To enable sorting within a given label, click on the table header row for that label.

Managing Labels#

labels offer a range of customizations when setting them up. This page describes how to configure and manage labels using the squirro user interface.

If labels are managed by the squirro dataloader, consult the dataloader configuration documentation.

Data Modelling#

A big part of a Squirro integration is to think about how to model the data in Squirro. This is covered in the separate section Data Modeling.

Creating Labels#

There are a number of ways to create labels in a project:

  • From data: the easiest way is to simply specify a label value when uploading data.

  • Enrichments: similar to loading from data, enrichments can just set a label value where needed.

  • Manually: a label can be configured up-front, before loading data. This is mandatory for more complex labels.

The following names are reserved names and cannot be used for labels:

  • author: reserved for the item author field

  • is: reserved for state fields (read/starred)

  • language: reserved for the item language field

  • provider: reserved for the item provider field

  • sort: reserved by query language for sorting

  • source: reserved for the item source field

  • time_increment: reserved by query language for histogram time bucketing

Label names cannot contain the $-sign.

Labels Cannot be removed from a project once created. Additionally, the name and type of the label cannot be changed.

Indexing Labels#

Labels can be created from any additional info about an item which is included in the source of the information. In a variety of sources, such as database records, excel spreadsheets, email messages, etc., there may be some useful information included in addition to the title and body of the item.

The example below shows a database record which stores an email, along with which fields can be added as labels within the item.

title

sender

recipient

timestamp

body

attachments

Congratulations!

John Smith

David Green

2016-08-12T09:15:44

Hey David, Just heard the good news! Congratulations on the promotion. Best, John

none

Title

Body

Added as Labels

In general, the reason for adding these additional fields as labels is to allow a user to search and find other documents which share similar attributes, such as (in this example)

  • The same sender or recipient

  • Messages sent around the same time

  • Messages which included the same attachment

Labels within an item#

Within a Squirro item, labels are stored within the field ‘keywords’. Within the keywords field, each label is represented by a key-value pair. The name of the label used in the Squirro index serves as the key, and the value is represented by a list of values which that label has.

Note: Label values are always stored as a list, even if only a single value is present.

A simple example:

{
    "title": "Squirro",
    "body": "The Insights Company",
    "keywords": {
        "Office": ["Zurich", "London", "Munich", "Barcelona", "New York"]
    }
}

In the example above, we have created an item with a single label called “Office”. This label has five values: Zurich, London, Munich, Barcelona, and New York.

Any labels that do not exist yet are automatically created. However that only works for string labels.

Creating Labels from an Enrichment#

In addition to being added from a data source, labels can be added to an item during the enrichment process by one or more Pipelets. Pipelets add or change labels by directly modifying the data within the Squirro item. For example, shown below is a code snippet from a pipelet that adds a new label “Multiple Offices” if an item lists more than one office location in the “Offices” label.

...
def consume(item):
    # Protect against missing keywords
    kw = item.setdefault('keywords', {})


    if len(kw.get('Offices', [])) > 1:
        kw['Multiple Offices'] = ['yes']
    else:
        kw['Multiple Offices'] = ['no']

    return item

When modifying labels from within a pipelet, it is important to remember that labels are stored as lists. Setting a label equal to an individual value (string, number, datetime, etc.) will not produce a valid pipelet.

For example the following will result in errors:

# INVALID example
item['keywords']['Multiple Offices'] = 'no'

Configuring Labels Manually#

Manually creating a label is mandatory for any non-string data types. For some more advanced settings you should also create the labels up front.

Within a squirro project, labels can be created and configured from the Labels Page of the project. To get to the labels page, go to the Setup space, click on Data in the top bar, then select Labels in the column at the left.

Labels overview in Squirro interface

Deleting Labels#

In an existing project, individual labels can not be deleted. This is due to the underlying index format, which has no ability to remove index fields - nor change their data type - once they are allocated.

When you need to delete a label or change the data type, there are two ways of addressing this:

  1. Create a new label and hide the old one. In this case re-use the display name (which does not have to be unique) and simply hide the old label from users by using the Visible property.

  2. Reset project. This is a feasible approach, if you can easily recover the label definition and data, e.g. by rerunning a data loader import job.

Label Types#

A given label can store data in any one of the following formats:

Type

Example

Notes

string

“Squirro”

The default data type for new labels.

Can be used to store any sequence of characters.

int

37

Used for storing numeric values (integer and floating point).

Both labels can be used for performing comparison-based searches ( value >= 10 )

Labels with int or float data types are often treated differently by widgets, and can enable new functionality such as aggregations.

float

12.955

Used for storing floating point numeric values.

datetime

2016-08-12T11:31:50

Date/time values. Follows Squirro’s standard Date Format.

Squirro assumes this to be in UTC time zone.

geo_point

“47.37,8.54”

Used to store geographic coordinates, format of field is “latitude,longitude”.

Changing the Label Type#

The data type of a label can not be changed after the label is created. This is due to the same reason that prevents deleting of created labels. See Deleting Labels above for how to work around this.

Label Properties#

Visible#

Controls whether the label appears in the search screen. If unchecked, the label is also hidden from all the widget options.

Display Name#

A label can have both a technical name, used in the index, and a display name, which often appears as a more user-friendly version of the technical name. The system uses the display name on the search screen and throughout the user interface.

Technical names must not include spaces or dollar signs ($). For labels with multiple words, replace spaces with underscores.

For example, a label with the technical name phone_number can have the display name Phone Number.

Searchable#

Making a label searchable enables full text search of the label values.

For example, if an item has the United States of America value in a searchable label, the item would be returned by a search for America, even if the term America is not present in the body of the Squirro item.

Typeahead#

The typeahead setting for a label determines whether or not the values within that label are available for typeahead completion within the Squirro search bar. If the typeahead setting is enabled, both the name of the label itself and the values stored within the label are shown as options within the search bar.

Currently typeahead is only supported for string labels. Enabling typeahead requires that the label is also searchable.

Analyzed#

A label that is not analyzed:

  • can be used for a match.

  • can be used for aggregations.

  • can not be used for sorting.

  • can be used for keyword count.

Label Value Formatting#

It is possible to format the values of numerical and date labels to be displayed in the dashboard widgets. To do so, one can define the label formatting screen in the Labels management section of Squirro.

Numerical Label formatting (int, float)#

The string defined in ‘Format’ section will be used to display the label values in Dashboarding. Any string can be used to that purpose, and a preview of the formatting is visible to the right of the format input field.

Examples of the formatting strings for a float number of 123456.789:

  • PRE%.dSUFF resolves to PRE123.456,79SUFF

  • PRE%.2dSUFF resolves to PRE123.456,79SUFF

  • PRE%,2dSUFF resolves to PRE123,456.79SUFF

  • PRE%’dSUFF resolves to PRE123’456.789SUFF

  • PRE%’,dSUFF resolves to PRE123’456,789SUFF

  • PRE%’,2dSUFF resolves to PRE123’456,79SUFF

For an integer number of 123456:

  • PRE%dSUFF resolves to PRE123456SUFF

  • PRE%.dSUFF resolves to PRE123.456SUFF

  • PRE%,dSUFF resolves to PRE123,456SUFF

  • PRE%’dSUFF resolves to PRE123’456SUFF

“Date” Type Label formatting#

  • To format label values containing dates (and times), Squirro offers three formatting options, defined in the Labels configuration screen.

  • Each option is accompanied with a preview of the resulting formatted value.

  • Detailed documentation can be found in the MomentJS library documentation.

Labels and the Dataloader Command Line Tool#

Label functionality#

Label creation on Squirro Server#

When creating or updating labels in Squirro, the following attributes are taken into account (and subsequently added/updated):

  • name

  • group_name

  • display_name

  • data_type

  • visible

  • searchable

  • format_instr

The attributes used for formatting of the item:

  • default_value

  • data_type

  • input_format_string

  • delimiter

Attribute used for skipping columns:

  • import

Attributes used for access control:

  • auth

  • auth_value_mandatory

  • delimiter

Pivotal attributes:

  • pivotal_group

  • delimiter

Example#

Below is a basic example showing the labels’ configuration file and associated mapping between the source field and the label names.

In this scenario, attributes are added to the file as we describe more and more functionality.

The name attribute is the one that identifies a label in Squirro and not the key of each label dictionary, unless the ``name `` attribute is missing and then the label is identified by the dictionary key.

Note: The key of each label dictionary is the field name from the data source.

config/sample_facets.json

{
    "InteractionSubject": {
        "name": "Interaction Subject",
        "display_name": "Subject",
        "group_name": "Interactions",
        "visible": true,
        "searchable": true
    },
     "InteractionType": {
        "name": "Type of Interaction",
        "display_name": "Type",
        "group_name": "Interactions",
        "visible": true,
        "searchable": true
    },
    "Date": {
        "name": "Date of creation",
        "data_type": "datetime",
        "input_format_string": "%Y-%m-%dT%H:%M:%S",
        "visible": true
    },
    "Notes": {
        "name": "Notes",
        "visible": false,
        "searchable": true
    },
    "InternalAttendees": {
        "name": "Attendees",
        "group_name": "Interactions"
        "visible": true
    },
    "NoAtendees": {
        "name": "Number of Attendees",
        "data_type": "int",
        "visible": true
        "searchable": false
     },
}

And the command line call:

squirro_data_load -v ^
    --token %token% ^
    --cluster %cluster% ^
    --project-id %project_id% ^
    --source-name csv_interactions ^
    --source-type csv ^
    --map-title InteractionSubject ^
    --source-file interaction.csv ^
    --facets-file config/sample_facets.json

Note: The lines have been wrapped with the circumflex ^ at the end of each line. On Mac and Linux you will need to use backslash \ instead.

You can find the loaded labels in the Data >> Labels tab.

You can also see the new labels in the Search tab. Notice that the name of the selected facet is Subject - the display_name attribute of the facet - coming from the InteractionSubject field in the data source which is the key of the facet dictionary.