Significant Terms Widget#

The Significant Terms widget provides visual information about terms that are significant in a dataset, compared to another dataset.

The widget allows you to reveal the uncommonly common. This means, that it shows which terms show a significant different value distribution in a foreground-dataset when compared to a background-dataset.

This requires a certain minimum amount of documents to work. The more terms there are in a label, the more documents are needed to get a meaningful answer.

In Squirro, the background-dataset is what is defined in an unmodified dashboard. The foreground-dataset is constructed from the background-dataset and includes the current selection. If there is no selection, meaning that the foreground- and the background-dataset are equal, the term frequency is shown (except if the facet is body, title, or summary where this operation is too costly).

Significant terms work very well on facets with few values, meaning if computed on a body, title, or summary field, there are much more documents needed to get a significant term to show up. One workaround for this restriction is to use phrase- or term detection and index those phrases/terms in a separate facet field. This has been shown to improve the results vastly while also not requiring a lot of documents.

When configuring over a special content field (body, title, summary), one is asked also to provide language and the maximum number of results returned.

The maximum number of results field affects performance, so exercise caution when increasing the limit.