Labels#

This page explains labels, which are fields within an item that includes structured metadata which describes the item.

Labels can be populated directly from the source of data (like a database record or an Excel spreadsheet), or can be added by an enrichment.

Note

Labels were previously referred to as facets in the Squirro UI. You will still see references to facets in the code, and in some places within the Squirro UI. All facets can be treated as labels.

Terminology Note#

Throughout the Squirro user interface and API, labels may have different names depending on the context. They can also be referred to as keywords or even tags. As noted above, they are also called facets in some places, particularly within Squirro code.

A keyword is a key/value pair that is given to an individual item. Hence, in that context, it’s discussed as a keyword (API) or a tag (User Interface).

Using Labels#

This section shows how the different data types of labels affect their usage in queries and dashboards.

Labels are typically set up within a project using a facets.json file used by the data loader.

Overview#

Once data is added to a project, labels can be found on the left side of the search screen. Hovering over one of the dropdowns will show the different values for that label, and the number of items which have each value. Selecting one of the label values shown performs a drilldown to only show the items which include that specific label value.

image1

Labels can also be visualized by a dashboard widget.

image2

Searching Label Values#

General Information on basic search syntax, as well as the different types of searches supported by Squirro can be found on the Query Processing page.

String Labels#

Searchable String Labels

String labels which have the Searchable property set to True will have all of the values stored within that label available for full text search.

For example, if a document contains the value “United States of America” within a searchable string label, the document will be returned by a search for the term “America”, or a search for “States”.

In addition to full text search, the values in searchable string labels are also available for exact value searches (see below).

Non-Searchable String Labels

While values in non-searchable string labels are not available for full text search, searches for exact label values can be run against values in non-searchable string labels.

Searching for an exact label value is done by specifying the name of the label, then a colon, then the exact name of the label value to search for, with no spaces between. If the value to search for includes spaces, the value must be enclosed in quotes.

Company:Enron
Person:"John Smith"
"First Name":John

In this example, documents which include other variations of Enron, such as “Enron, Inc.”, “Enron Corp”, or any other variation will not be returned as results.

Another consequence of not being searchable is that non-searchable string labels cannot be used for sorting alphabetically based on label values.

Datetime Labels#

Similarly to string labels, you can search for a document which has a specific value within a datetime label.

When using a date string in a search, the date string must be quoted

2016-04-04T00:00:00 --> "2016-04-04T00:00:00"
date_facet:"2016-04-04T00:00:00"

Searches for ranges of values within datetime labels can also be done by using the “>” or “<” operators.

date_facet >= "2016-02-01T00:00:00"

Numeric Labels#

Exact numeric values can be searched for with the usual colon syntax:

int_facet:3
float_facet:12.73

Just like datetime labels, searches for ranges of values within numeric labels can also be done by using the “>” or “<” operators.

int_facet > 2
float_facet <= 20

Using Labels in Dashboards#

Labels are the source for rich dashboard charts and filters. Each label type opens up different capabilities in the dashboards.

String Labels#

When using a string label in a dashboard, the selected visualization will most often be used to show how many documents include each possible value of that label.

Datetime Labels#

For widgets that show changes in information over time, such as line charts, datetime labels can be used in place of the creation date for the document to determine the X-axis position of a result on the line chart.

Numeric Labels#

Numeric labels can be used in dashboard widgets to either visualize the data in the numeric label directly, or visualize an aggregation of the numeric data based on common values within another label.

Visualizing Label Values Directly#

When numeric label values are visualized directly, the visualization behaves similarly to a string label, and the different values within the numeric label are shown by the widget.

In many cases, a better option is to use a pipelet that breaks the numeric label data out into ranges (<1, 1-10, 10-50, 50+, etc.).

These ranges can be stored in a separate string label, then visualized more effectively.

image3

image4

Aggregating Label Values#

When numeric Label values are aggregated, the widget shows an aggregation (sum, average, …) of the values in the numeric label, across all of the documents which have a common value in another label.

The second label used for aggregation is most often a string label.

Both the Table widget and Line chart widget support aggregations of numeric labels based on second labels.

The example below is a table widget which shows an aggregation of a float label (the $ amount of each investment), aggregated by a string label (the investment type).

image5

image6

Sorting#

Within a table widget in a dashboard, documents can be sorted based on the values that they contain in one or more labels.

To enable sorting within a given label, click on the table header row for that label. A colored number will appear indicating that the label is being used for sorting.

image7

Managing Labels#

labels offer a range of customizations when setting them up. This page describes how to configure and manage labels using the squirro user interface.

If labels are managed by the squirro dataloader, consult the dataloader configuration documentation.

Data Modelling#

A big part of a Squirro integration is to think about how to model the data in Squirro. This is covered in the separate section Data Modeling.

Creating Labels#

There are a number of ways to create labels in a project:

  • From data: the easiest way is to simply specify a label value when uploading data.

  • Enrichments: similar to loading from data, enrichments can just set a label value where needed.

  • Manually: a label can be configured up-front, before loading data. This is mandatory for more complex labels.

The following names are reserved names and cannot be used for labels:

  • author: reserved for the item author field

  • is: reserved for state fields (read/starred)

  • language: reserved for the item language field

  • provider: reserved for the item provider field

  • smartfilter: reserved by query language for smartfilters

  • sort: reserved by query language for sorting

  • source: reserved for the item source field

  • time_increment: reserved by query language for histogram time bucketing

Label names cannot contain the $-sign.

Labels Cannot be removed from a project once created. Additionally, the name and type of the label cannot be changed.

Indexing Labels#

Labels can be created from any additional info about an item which is included in the source of the information. In a variety of sources, such as database records, excel spreadsheets, email messages, etc., there may be some useful information included in addition to the title and body of the item.

The example below shows a database record which stores an email, along with which fields can be added as labels within the item.

title

sender

recipient

timestamp

body

attachments

Congratulations!

John Smith

David Green

2016-08-12T09:15:44

Hey David, Just heard the good news! Congratulations on the promotion. Best, John

none

Title

Body

Added as Labels

In general, the reason for adding these additional fields as labels is to allow a user to search and find other documents which share similar attributes, such as (in this example)

  • The same sender or recipient

  • Messages sent around the same time

  • Messages which included the same attachment

Labels within an item#

Within a Squirro item, labels are stored within the field ‘keywords’. Within the keywords field, each label is represented by a key-value pair. The name of the label used in the Squirro index serves as the key, and the value is represented by a list of values which that label has.

Note: Label values are always stored as a list, even if only a single value is present.

A simple example:

{
    "title": "Squirro",
    "body": "The Insights Company",
    "keywords": {
        "Office": ["Zurich", "London", "Munich", "Barcelona", "New York"]
    }
}

In the example above, we have created an item with a single label called “Office”. This label has five values: Zurich, London, Munich, Barcelona, and New York.

Any labels that do not exist yet are automatically created. However that only works for string labels.

Creating Labels from an Enrichment#

In addition to being added from a data source, labels can be added to an item during the enrichment process by one or more Pipelets. Pipelets add or change labels by directly modifying the data within the Squirro item. For example, shown below is a code snippet from a pipelet that adds a new label “Multiple Offices” if an item lists more than one office location in the “Offices” label.

...
def consume(item):
    # Protect against missing keywords
    kw = item.setdefault('keywords', {})


    if len(kw.get('Offices', [])) > 1:
        kw['Multiple Offices'] = ['yes']
    else:
        kw['Multiple Offices'] = ['no']

    return item

When modifying labels from within a pipelet, it is important to remember that labels are stored as lists. Setting a label equal to an individual value (string, number, datetime, etc.) will not produce a valid pipelet.

For example the following will result in errors:

# INVALID example
item['keywords']['Multiple Offices'] = 'no'

Configuring Labels Manually#

Manually creating a label is mandatory for any non-string data types. For some more advanced settings you should also create the labels up front.

Within a squirro project, labels can be created and configured from the Labels Page of the project. To get to the labels page, click on “Data” in the top bar, then select “Labels” in the column at the left.

image11

From this page, labels can be both created and configured manually. To create a new label, select the blue “Add Label” button in the top right of the labels page. On this page, the main properties of the new label can be set. These properties include:

  • Title - The display name of the label.

  • Name - The name of the label within the elasticsearch index and on the API level (Permanent once created).

  • Type - The data type of the label {string, int, float, datetime} (Permanent once created).

  • Group - The group of which the label is a member.

See “Label Properties” below for a full list of configurable label properties.

Additionally, existing labels can be modified by clicking the blue “Edit” button at the right when hovering over a label. As mentioned above, the type and name fields will appear greyed out when editing a label because these values can not be changed for existing labels. For example, in order to change a string label into a datetime label, a new project must be created.

These capabilities are exposed in two additional ways:

  • Data Loader labels to declare the labels in the context of loading the data.

  • Labels API to programmatically manage these settings.

Deleting Labels#

In an existing project, individual labels can not be deleted. This is due to the underlying index format, which has no ability to remove index fields - nor change their data type - once they are allocated.

When you need to delete a label or change the data type, there are two ways of addressing this:

  1. Create a new label and hide the old one. In this case re-use the display name (which does not have to be unique) and simply hide the old label from users by using the Visible property.

  2. Reset project. This is a feasible approach, if you can easily recover the label definition and data, e.g. by rerunning a data loader import job.

Label Types#

A given label can store data in any one of the following formats:

Data Type

Example

Notes

string

“Squirro”

The default data type for new labels.

Can be used to store any sequence of characters.

int

37

Used for storing numeric values (integer and floating point).

Both labels can be used for performing comparison-based searches ( value >= 10 )

Labels with int or float data types are often treated differently by widgets, and can enable new functionality such as aggregations

float

12.955

datetime

2016-08-12T11:31:50

Date/time values. Follows Squirro’s standard Date Format.

Squirro assumes this to be in UTC time zone.

geo_point

“47.37,8.54”

Used to store geographic coordinates, format of field is “latitude,longitude”

Changing the Label Type#

The data type of a label can not be changed after the label is created. This is due to the same reason that prevents deleting of created labels. See Deleting Labels above for how to work around this.

Label Properties#

Visible#

Toggling whether a label is visible or not can be done through the UI from the data → labels screen.

image12

Unchecking the box “Visible” will remove the label from the search screen, and will no longer make the label available for use in widgets. Additional properties can be modified by clicking the edit button at the right side of each label listed

Display Name#

In addition to the true label name used in the index, a label can have a display name, which is often a nicely formatted version of the label name. The display name is used both on the search screen, and whenever the label is used in a widget.

As a best practice, label names should not include spaces, and labels with multiple word names should be separated by underscores. The nicely formatted name of the label (with spaces in place of the underscores) can be used as the display name.

For example, a label with the name “phone_number” can have the display name “Phone Number”.

The screenshots below show the difference between labels with no display names set, and the same labels with display names set.

image13

image14

Group

labels can be placed into groups for organizational purposes. On the search screen, each label within the group will appear nested within the group name.

image15

Searchable#

Making a label searchable enables full text search of the label values.

For example, if an item has the value “United States of America” in a searchable label, the item would be returned by a search for “America”, even if the term America is not present in the body of the item.

Typeahead#

The typeahead setting for a label determines whether or not the values within that label are available for typeahead completion within the Squirro search bar. If the typeahead setting is enabled, both the name of the label itself and the values stored within the label will be shown as options within the Squirro search bar.

NOTE: Currently typeahead is only supported for string labels.

The screenshots below show the difference between having the typeahead setting enabled and disabled for a label “Companies”.

image16image17

Enabling typeahead requires that the label also be searchable.

Analyzed#

A label that is not analyzed

  • can be used for a match

  • can be used for aggregations

  • can not be used for sorting

  • can be used for keyword count

Label Value Formatting#

  • It is possible to format the values of numerical and date labels to be displayed in the dashboard widgets.

  • To do so, one can define the label formatting screen in the Labels management section of Squirro.

Numerical Label formatting (int, float)#

image18

The string defined in ‘Format’ section will be used to display the label values in Dashboarding. Any string can be used to that purpose, and a preview of the formatting is visible to the right of the format input field.

“Date” Type Label formatting#

image19

  • To format label values containing dates (and times), Squirro offers three formatting options, defined in the Labels configuration screen.

  • Each option is accompanied with a preview of the resulting formatted value.

Labels and the Dataloader Command Line Tool#

Label functionality#

Label creation on Squirro Server#

When creating or updating labels in Squirro, the following attributes are taken into account (and subsequently added/updated):

  • name

  • group_name

  • display_name

  • data_type

  • visible

  • searchable

  • format_instr

The attributes used for formatting of the item:

  • default_value

  • data_type

  • input_format_string

  • delimiter

Attribute used for skipping columns:

  • import

Attributes used for access control:

  • auth

  • auth_value_mandatory

  • delimiter

Pivotal attributes:

  • pivotal_group

  • delimiter

Example#

Below is a basic example showing the labels’ configuration file and associated mapping between the source field and the label names.

In this scenario, attributes are added to the file as we describe more and more functionality.

The name attribute is the one that identifies a label in Squirro and not the key of each label dictionary, unless the ``name `` attribute is missing and then the label is identified by the dictionary key.

Note: The key of each label dictionary is the field name from the data source.

config/sample_facets.json

{
    "InteractionSubject": {
        "name": "Interaction Subject",
        "display_name": "Subject",
        "group_name": "Interactions",
        "visible": true,
        "searchable": true
    },
     "InteractionType": {
        "name": "Type of Interaction",
        "display_name": "Type",
        "group_name": "Interactions",
        "visible": true,
        "searchable": true
    },
    "Date": {
        "name": "Date of creation",
        "data_type": "datetime",
        "input_format_string": "%Y-%m-%dT%H:%M:%S",
        "visible": true
    },
    "Notes": {
        "name": "Notes",
        "visible": false,
        "searchable": true
    },
    "InternalAttendees": {
        "name": "Attendees",
        "group_name": "Interactions"
        "visible": true
    },
    "NoAtendees": {
        "name": "Number of Attendees",
        "data_type": "int",
        "visible": true
        "searchable": false
     },
}

And the command line call:

squirro_data_load -v ^
    --token %token% ^
    --cluster %cluster% ^
    --project-id %project_id% ^
    --source-name csv_interactions ^
    --source-type csv ^
    --map-title InteractionSubject ^
    --source-file interaction.csv ^
    --facets-file config/sample_facets.json

Note: The lines have been wrapped with the circumflex ^ at the end of each line. On Mac and Linux you will need to use backslash \ instead.

You can find the loaded labels in the Data >> Labels tab.

You can also see the new labels in the Search tab. Notice that the name of the selected facet is Subject - the display_name attribute of the facet - coming from the InteractionSubject field in the data source which is the key of the facet dictionary.

The two distinct values of the subject field are shown: Tesla and PayPal.

image29