3.5.8 LTS Release Notes

3.5.8 LTS Release Notes#

Warning

End of Life Release

This LTS release reached end of life in 2024 and is no longer supported. This release no longer receives security updates or bug fixes.

For the current supported LTS release and upgrade instructions, see Release Notes.

Squirro 3.5.8 long-term support (LTS) was released on 08 Jul 2022. It highlights the progress since the previous LTS release 3.4.7.

Most of these additions were already introduced in the intermediate releases of the 3.5.x series and have been documented in those release notes as well.

As a long-term support (LTS) release, Squirro 3.5.8 will receive updates for security issues and important bug fixes for the next two years. See Squirro Release Process for details on Squirro’s versioning.

New Features#

Cognitive Search Revamp#

Over the last few years Squirro’s Cognitive Search solution has been deployed at many small and large organizations and we have learned a lot about users’ expectations towards search and knowledge management, from those projects. Given this, we went back to the drawing board and reflected on what a modern Cognitive Search experience should consist of.

The result, a true back to front re-think that is strikingly visible in this particular release. A lot of usability improvements have been implemented throughout and a new dashboard functionality makes it much easier to access search results and community information from anywhere. Question answering and updated query processing workflows improve the quality of the search results for users. And, just as important, the result looks beautiful and is enjoyable to work with every single day.

To access more screenshots and discover the revamped search experience, please check the following documentation: 3.5.8 LTS - Cognitive Search Improvements.

In the background these improvements were driven by the following platform additions:

New Items widget
New Search Bar widget including its Advanced Search dialog used in Global Search
Contextual Concept search
Question Answering
Similar Searches
Improved query processing workflow
Community 360 Dashboard
Native Display of Office Documents

Some of these are outlined in more detail below, to find out about the others, please see the linked documentation pages above.

Microsoft Outlook#

With the Microsoft Outlook add-in, Squirro can be integrated directly into the Microsoft Outlook mailbox. This works for Microsoft 365 installations, as well as on-premises Microsoft Exchange setups.

With this add-in any Squirro dashboard can be exposed to your users in Microsoft Outlook. A typical use case for this add-in is to enable searches that are linked to current messages.

For information on how to enable and deploy this functionality please see: Microsoft Outlook.

Native Display of Office Documents#

The Cognitive Search application is often connected to document management systems where all types of documents are stored. This most prominently includes office documents, such as: Microsoft Word, Excel, or PowerPoint.

With this release, the aforementioned document types are now displayed better and look exactly like they would, if they were to have been opened in their original application.

For information on how this is achieved, and how this can be enabled in a Squirro project very quickly, please see: Indexing of Office documents.

Question Answering#

The best search experience does not expect users to scroll through large number of search results. Instead, what users really want is a solution that provides them with the right answers to their questions. This has been Squirro’s vision since the first product version.

This release includes a new functionality that highlight further progression towards this vision: Question Answering.

By providing short summarized answers on top of the search results, users are able to get their answers more quickly.

Concept Search#

Concept Search allows users to retrieve search results without having to formulate a complex query. This is done by giving the system some input, from which it can learn a “concept”. While Squirro has always supported this using Smart Filters, this version now introduces the same functionality in a very easy to use manner for end users. This new functionality is Contextual Concept Search and is made available to users using a search icon whenever they highlight some text.

When this search is run the result is a concept which is displayed in the search bar.

Community 360 Dashboard#

Communities are a central component of any Squirro Cognitive Search project and these communities allow extremely easy content consumption of specific topics. Common community types are: a company’s products, clients, or research topics.

With this release, we are introducing the Community 360 Dashboard, a feature that makes it even easier for users to find the data associated with a given community.

Combined with the Dashboarding capabilities this lets Squirro users design compact but powerful dashboards that present all essential information centrally. The new Communities Banner widget was also introduced for this purpose.

Please see to learn how to start using this functionality in Squirro projects.

New Widgets & Moving to React#

The Items widget has been introduced. The Items widget unifies their functionalities in one widget and provides a few additional features (such as the ability to show starred items or last read items).

The Search Bar widget has also been completely overhauled and provides a much more enriched type-ahead experience.

This is part of a technical platform change in the platform going on in the background. The user interface is moving to a new technology called React (previously Squirro was fully built on Backbone.js). With this change, the development of Custom widgets for Frontend engineers is becoming much easier and is now based on a modern framework.

For more information on this transition and how to use the new React-based widgets, please see the separate document page: React Custom Widgets.

Priorities#

Not all data is equal. Some data sources should be shown in Squirro with a higher priority than others. Additionally, some pipeline processing steps are less urgent and it’s more relevant to quickly show the initial processing result to the users.

To satisfy these use cases, Squirro now provides more control over data priorities, as it flows through the Squirro pipeline. This can be done at ingestion time, or during processing, in the pipeline.

For information on how to use this new functionality, please refer to Pipeline Prioritization.

Shared Ground Truth Labeling#

Labeling of Ground Truths for AI Studio model training should be shared with domain experts who know most about the concepts being trained. To facilitate this, we have introduced the ability to share a Ground Truth labeling view with end users. This allows Squirro Model Creators and Data Scientists working with AI Studio to quickly get high quality labels from their users.

MaaS & MLflow#

Model-as-a-Service (MaaS) based on microservices opens the Squirro platform to custom machine learning (ML) models and also accelerates the prototyping phase for ML projects in Squirro through standardization and decoupling. The process is based on MLflow models standardization, a standard format for packaging machine learning models that can be used in a variety of downstream tools.

To make use of that feature, self-trained models or already existing (pre-trained) models need to be converted into the structure of an MLflow model, uploaded via squirro assets or via scp to a Squirro instance, deployed and added to a workflow to be used in the Squirro platform.

The framework created by MaaS thus unifies the deployment process of external ML models, significantly reducing the complexity of current and future ML projects. Furthermore, by incorporating a variety of new ML models, MaaS is an integral part of taking the AI Studio beyond its current capabilities.

Improvements#

Next to the bigger new additions listed above, we have also worked on a lot of smaller improvements.

Gather#

The 1-click Google Drive Connector now accepts a folder ID to limit the data that is retrieved.
The 1-click Microsoft OneDrive Connector and the 1-click Microsoft SharePoint Connector support handling of file deletions in incremental loading.
The data loading user interface allows mapping of the thumbnail URL from a connector field.
Processing in the data-processing-pipeline was made more resilient by retrying the initial retrieval of batches. This protects against some of the internal Services not being run. The maximum number of those retries is controlled by the server config option ingester.stream.max-dequeue-retries (default value is 3).

Understand#

A new option Force OCR was added to the PDF OCR pipeline step. This runs the text extraction even if existing text was already found in the PDF file. This is useful to extract text for documents that have a mix of machine-readable content as well as scanned pages. It also allows extracting of additional text from embedded charts.
The PDF OCR now supports password-less encrypted PDFs.
Known Entity Extraction that has been set up from Communities (see How to Set Up Communities Using KEE) is now regularly refreshed to maintain consistency between the updated communities and the KEE configuration.

AI Studio#

Overall usability improvements on the AI Studio screens.
When giving feedback to an entity, the label select menu now shows all labels with the model name for the same tagging level.
In the AI Studio model templates the fastText templates have been changed to use model compression by default. This reduces the storage footprint by a magnitude but comes with the cost of longer training duration. More information is available in the Model compression section of the fastText web site.

libNLP / Machine Learning Workflows#

For proximity filter rules it is now possible to define a rule_field field which will contain the rule that triggered the match. If no rule matches, it will save the value of the no_rule_matched_label field which will then default to “NO_RULE_MATCHED”.

Search#

The Query Processing Workflow functionality underwent many improvements that aid the search quality:
- The query strategy can be changed per project using a project configuration setting (topic.search.query-strategy). The default query pipeline can not be edited or deleted anymore, instead it has to be cloned to customize it. In turn, the default query pipeline is automatically updated with new Squirro releases, thus always reflecting the current state of the art.
- The default search behavior no longer requires that all query terms are found in a document. Instead, a minimum-should-match strategy has been implemented that requires a minimum number of words to match based on the query length.
- Improve search precision for large documents by merging relevant chunks of terms into loose phrases. For example this detects in the user’s query, “will EU extend Brexit deadline” that Brexit and deadline often come together. It will reformulate this as eu extend ("brexit deadline"~15 OR (brexit deadline)). This makes sure that results where “brexit” and “deadline” are found within 15 words are scored better than if they simply appear anywhere in the document.
- Significantly faster search speed by disabling some functionalities that do not add any value in modern Squirro projects. This includes the deactivation of near duplicate merging, searching on summary, and PDF searching functionalities in non-PDF projects. Handling of synonym matching was also improved.
- The query processing workflow is now aware of term-sequences and term-phrases to allow fine-grained tuning of the user’s query text.
Popular Query Suggestions default to suggesting popular queries only from the current project. The scope can be changed with the server:topic.typeahead.popular.enabled configuration option.
As a performance improvement, the scan API (used to navigate large query result sets) defaults to returning results unsorted. This can be changed using the preserve_scroll_order argument to return results in sorted order.

Dashboards & Widgets#

Searching for widgets in the dashboard editor also respects additional widget keywords. For example, searching for “link” will show the “Actions” widget. This was introduced to make the transition to some of the new widget names easier for experienced project creators.
Added option to hide dashboard layers when the widgets inside are empty.
Added “View all” buttons to many widgets. Where available, this can be linked to a separate dashboard that lets users explore the same data set in more detail. For example, a horizontal community list on the homepage can use the View all option to link to the communities dashboard.
The Bar Chart widget has a new mode for stacked bar charts.
The Communities widget can be changed to show relevant communities for the current result set; in that mode it also has a new horizontal visualization mode.
The Favorites widget horizontal view mode was extended to a carousel view like other horizontal modes. The widget also has a new option that lets you decide if you want to show the number of filters, or not.
The Tabs widget stores its selected tab in the page’s URL and also retrieves it from there on load. This is very helpful in combination with the View all functionality to deep link to specific parts of a dashboard. This also allows retaining of the tab when the page is reloaded or when a link to the dashboard is shared.
Added search bar to the Reference Screen documents selector.

Miscellaneous#

Atlassian Jira issue collectors can be configured for the Send Feedback option in the help menu.
Detailed information for individual communities is exposed in the API. This new endpoints includes information such as the number of followers and the number of items belonging to the community.

Admin and Operations#

The Python version used by Squirro has been upgraded from Python 3.6 to Python 3.8.
The Monitoring Plugin now supports project and source statistics.
Single Sign-On integrations can now return session data as well as user data. This is used in SAML SSO for Squirro to store additional user profile data. One current use case is the storage of Microsoft Exchange IDs for the Microsoft Outlook integration.
PostgreSQL is now supported as a storage backend for Squirro’s metadata database. Setups can now choose MariaDB, MySQL, or PostgreSQL as the database backend. Setup and upgrading of Squirro systems with a PostgreSQL database is currently manual. If this is something you need for your environment please reach out to Squirro Support.
Extend possible package size sending to MariaDB server. This resolves problems where large configuration data would break the connection. Please see MySQL Big Data Management for more information.
Simplified synonym handling to support synonyms on managed ElasticSearch. For this, the configuration was moved from being file-based to ElasticSearch’s inline settings.
Uploaded files are de-duplicated using Unix hardlinks. This reduces the disk space consumed when processing documents.

Breaking Changes#

Python Version#

The biggest potentially breaking change comes form the upgrade to Python 3.8. This will affect all Python-based plugins, such as Pipelets or Data Loader Plugins. Especially if custom dependencies have been installed into Squirro’s Python environment, custom steps need to be taken. For plugins that have their dependencies declared using requirements.txt a re-install of the dependencies is automatically attempted.

!Be aware! that you should carefully verify whether all of the plugins still work after the upgrade.

Query Handling#

A few changes were made to Squirro’s default Query Syntax. These changes may cause issues for users who are used to the previous syntax or if they have been configured as default dashboard or widget queries.

The relevant changes are:

The lowercase terms “and”, “or”, and “not” no longer have any special significance. To use the boolean operators the terms need to be written in full uppercase: “AND”, “OR”, “NOT”.
Phrase searches, such as "squirro product" no longer require those words to be in the exact order, nor directly next to each other. Instead they are converted into proximity searches, and the words merely have to appear closely together. This change was introduced as it results in better recall for most users. Users can revert to the old behavior by manually entering a proximity search ("squirro product"~1). Project creators can change the behavior in the project’s Query Strategy configuration (see Query Processing).

Widgets & User Experience#

The Calendar widget was removed. Existing dashboards containing this widget should be edited and the existing “Calendar” widgets deleted.

Labels in item detail views are now displayed before the actual document, not at the end. This is in preparation for a larger upcoming change where document-level AI Studio feedback will implemented by interacting with those labels.

The term Labels is now used throughout Squirro to refer to any of the concepts otherwise known as: keywords, entities, facets, etc. The documentation has not yet been fully updated to reflect this, so the terms are sometimes going to be used interchangeably. This also changes the names of a few widgets in the dashboard editor.

Activity Tracking#

Activity tracking is used more actively throughout Squirro especially for some of the new search features (such as popular queries and similar queries). For this a few changes were required on the activity logging:

The /activity API endpoint now enforces authorization.
Delete all indexed items in the projects where activity sources with a non-analyzed query facet exist.
Activity files are no longer stored in the same folder as the frontend log files but in a sub-folder which includes the server name. This allows better handling of multi-node setups where the activity log files may be stored on a shared file system.

Miscellaneous#

Other changes have been implemented that should have no affect on most uses:

Removed the –facet-delimiter parameter from the data loader command line. This may cause existing command line load scripts to fail.
In the AI Studio a proximity filter rule can not contain more than 20 words. This was introduced to avoid performance problems with very large proximity filters.
The pipeline steps Content Augmentation and Content Extraction now only process the first file found in the files list of any processed item. This does not affect any standard setup, as no built-in Squirro data loader or pipeline step would ever result in items with multiple files attached. Custom data loader plugins or pipelets could however have resulted in such a scenario.

Bug Fixes#

See the intermediate release notes for a list of all the bugs fixed since the last LTS release:

Installation and Upgrade#

For new installations, please follow the Installing Squirro on Linux instructions.

To upgrade an existing installation, please consult Upgrading Squirro.

3.5.8 LTS Release Notes

Contents

3.5.8 LTS Release Notes#

New Features#

Cognitive Search Revamp#

Microsoft Outlook#

Native Display of Office Documents#

Question Answering#

Concept Search#

Similar Searches#

Community 360 Dashboard#

New Widgets & Moving to React#

Priorities#

Shared Ground Truth Labeling#

MaaS & MLflow#

Improvements#

Gather#

Understand#

AI Studio#

libNLP / Machine Learning Workflows#

Search#

Dashboards & Widgets#

Miscellaneous#

Admin and Operations#

Breaking Changes#

Python Version#

Query Handling#

Widgets & User Experience#

Activity Tracking#

Miscellaneous#

Bug Fixes#

Installation and Upgrade#