Data Modeling#

When indexing data from other data sources into Squirro, the data is transformed into the Squirro Item Format. In this process, keywords are used to add structured and semi-structured information to the items.

This section talks about the considerations when thinking of the keywords to use on Squirro items.

Note

None of the guidelines on this page are binding. They are simply best practices that the Squirro team uses when creating projects. Outside of performance considerations, the Squirro product doesn’t enforce any specific way of working with labels.

Considerations#

Performance#

Performance considerations come into play when working with labels. Every label that’s maintained adds a bit of overhead, especially memory consumption. When returning the label selection list in the search screen, Squirro and the underlying Elasticsearch server, need to look at every result and count the occurrences of each label value.

To improve performance, the following changes can be made:

  • Remove the “analyzed” option from labels that don’t need it. Without analyzed a label is only there for filtering, no searching and no typeahead is possible.

  • Only add labels to Squirro that are really needed.

  • Hide any label that doesn’t need to be used in filtering. Dashboards which have been configured with a label that’s now hidden will continue to work.

  • For labels where type-ahead is not needed, that option should be unchecked.

  • Only make labels searchable that really need to be.

Completeness#

It’s tempting to initially import any and every field from the source data and add them as a label. While this is often a sensible approach in a PoC or exploratory phase, for production this should be avoided.

Only labels should be imported that are actually used in dashboards, filtering and search. If a label needs to be added at a later stage that is not a problem and can always be done.

Usability#

Generally, it’s best to use label names internally that don’t have spaces and are lower case. The display name is then used to give them a user-friendly name.

Tip: Squirro recommends grouping labels into user-friendly groups and making use of the provided formatting options, such as date format.