3.8.6 LTS Release Notes#

Squirro 3.8.6 LTS was released October 16, 2023.

Note

On November 03, 2023 a patch was released for Squirro 3.8.6 LTS. For details, see the 3.8.7 LTS Release Notes.

Reference: Learn more about the Squirro Release Process.

Caution

This release includes several breaking changes. See the Breaking Changes section at the end of this page to learn more.

What’s New#

Squirro 3.8.6 LTS includes significant platform improvements, including the following:

  • A redesigned and improved AI Studio.

  • New search features, including new scoring profiles, query syntax features, and the introduction of semantic Hybrid search.

  • Several Cognitive Search user interface improvements.

  • A new Documents connector for uploading local files to Squirro, replacing the ZIP connector.

  • Built-in forms functionality for Project Settings configuration options.

  • Many more general platform improvements!

SquirroGPT#

Although not part of this specific release, Squirro also introduced several major new SquirroGPT features and improvements over the past few months.

Reference: To learn more about Squirro’s generative AI offering, see Chat.

Redesigned and Improved AI Studio#

AI Studio has been completely reworked, with a redesigned UI, new features, and improved performance. Creating no-code machine learning models has never been easier. You can now create, train, and deploy models in three steps.

AI Studio Three Steps

New or improved features include:

  • The introduction of AutoML templates (Fast, Balanced, and Accurate), which will train multiple models with automated hyperparameter tuning, model ensemble, and automatically deliver the best-performing model for the dataset.

  • AutoML-supported libraries, models, and algorithms include LightGBM, CatBoost, XGBoost, Fasttext, RandomForest, ExtraTrees, KNeighbour, NeuralNetTorch and FastText.

  • Proximity-rule-based bulk labeling for sentence-level classification models.

  • Query-based bulk labeling for document-level classification models. This enables importing training data into the ground truth using Squirro Item labels. Additionally, for more advanced users, this feature enables the creation of training data utilizing the full potential of Squirro’s Query Syntax.

  • Unpublish models from the publish screen, which will delete the published workflow from a Squirro project.

  • Added label selection during deploy of a model within AI Studio.

  • Added support for custom templates in AI Studio and improved template handling generally.

  • Deploy or undeploy a published model directly into/from a pipeline from the AI Studio, without having to open the Pipeline Editor. This will automatically place (or remove) the model as a pipelet within the classification step of a selected Squirro pipeline.

  • A completely redesigned and improved interface, as shown in the example screenshot below:

AI Studio new interface

Technical improvements to AI Studio in this release include the following:

  • Labeling focus mode has been reworked and now delivers faster results.

  • Added a help dropdown for AI Studio pages.

  • Added an option to limit bulk labeling items.

  • Implemented document-level Bulk Labeling.

  • Added a more informative error message when trying to delete a ground truth that is being used by a model.

  • Reworked the sentences highlighting logic when fetching all groundtruth items to be the same as for an individual item (this fixed discrepancies for sentences).

  • Query labeling templates now accept a query_count parameter to limit the number of items retrieved for bulk labeling.

  • Added relationships from groundtruth to model and from model to publish in the machinelearning database. ground_truth_name in model and publish will now automatically update with changes to the ground_truth, model_name will now automatically update with changes to the model. Also migrated all machinelearning service migrations to use the new Redis functionality to remember if it has been run before.

  • Added the include_pipeline_workflows parameter to get requests for published AI Studio models, which tells the ML service to retrieve the datasource pipeline workflows where the published models are being used.

  • GET requests to endpoints for getting data about published models now enrich each model with the pipeline workflows they are used in.

  • There is a new integration endpoint for query-based bulk labeling.

  • Disabled balancing dataset on AutoML templates to avoid downsampling.

Reference: To learn more, see AI Studio.

A New Editable Explore Dashboard in the Setup Space#

The Explore dashboard in the Squirro Setup space is now editable thanks to the creation of a new Explore dashboard designation included with all new projects.

New Explore Designation

As part of this update, general Explore screen performance was improved, including the Items widget performance within the dashboard.

Reference: To learn more about this new dashboard designation, see the Explore Dashboard documentation.

Cognitive Search User Interface Improvements#

Cognitive search has undergone a number of UI and UX improvements in this release. This includes a fresh new look and feel when searching, as well as a number of new features and improvements, including the ability to choose document highlighting color.

Choose document highlighting color in Settings

Also, document labels can now be viewed in a popover by clicking the label icon in the Items widget, allowing you to browse labels without leaving the search results page, as shown in the example image below:

Document label popover

Other new Cognitive Search features and improvements include:

  • Global search now breaks queries into individual chips, making it easier to edit queries after the initial search. Through frontend.userapp.query-chips-enabled in Project Settings this feature can be enabled or disabled. Learn more about Search Bar Chips.

  • You can now configure the text highlight color via the Setup space App & Nav Bar Settings.

  • Upgrades to the Google Drive, SharePoint, and Dropbox 1-click connectors, including the input label, placeholder, and helper text.

  • Data sources now have an affiliated picture. They are initialized with the picture from the Dataloader plugin they are created from.

  • There is now support for loading images (TIFF, PNG, JPEG) into Squirro as PDFs.

  • Squirro now supports deploying a model to multiple pipelines at once.

  • Large PDFs are auto-scaled down to the viewer size.

  • Enabled typeahead for PDF inline search inside the Item Detail modal.

  • PDF inline search, which appears within the item detail screen, now uses the same query processing as global search. This significantly improves search quality inside the item detail screen and allows for future improvements on the backend.

  • Squirro now supports Speech to Text for the global search bar within Chrome browsers.

  • Added a Similar Items tab to the Items Detail widget.

  • The Items widget now has a configuration to specify which labels are shown when creating a new note using the Note Taking feature.

  • Added the Community Info Card widget to the platform.

A New Documents Connector#

A new Documents connector replaces the ZIP connector. In addition to handling ZIP files, this connector supports uploading individual, non-zipped documents (a single PDF file, for example). The range of supported documents was also expanded, including ODT, ODP, ODS, CSV, and RTF file formats.

Document Import

Built-In Forms for Configuration Options#

Under Project Settings, intuitive built-in forms are now available for editing some configuration settings instead of plain text JSON. See the example topic.typeahead.content.configuration settings screenshot below:

JSON Forms config ediotr

Other New Platform Features and Major Improvements#

  • A new sessions mode with two default timeouts, session timeout and maximum session duration. Learn more at Reducing Session Lifetime.

  • A new and improved Squirro Monitoring dashboard. To learn more about manually upgrading an existing project dashboard, see Upgrading Dashboards Manually. (New Squirro releases update widgets automatically. Dashboards must be manually updated.)

  • Studio plugins using project-scoped endpoints now support token-based authentications. Token-based authentication is not supported for non-project-scoped endpoints for security reasons.

  • Added a new endpoint to expose SharePoint metadata in the SAML SSO Plugin.

  • Exposed priority levels in squirro_data_load.

  • Added a new libNLP step that truncates the fields of the given documents down to the first X words.

  • Storybook was updated to the latest version, add full widgets to storybook and community list widget tests

  • The plumber service can now emit structured logs. Structured logs generated during data ingestion (e.g. from the execution of a pipelet in a pipeline workflow) can now be included in the Data Ingestion Logs dashboard of the Squirro Monitoring project. Authors of pipelets can use the new attribute self.slog to emit structured logs from within their pipelets.

  • Added the indexmanager service.

  • Admins can now access query-processing results in dashboard query templates using the queryContext variable.

  • There is a new tag for highlighting answer context.

  • Introduced a new pipeline step called Archive Extraction that extracts archive files. At the moment only ZIP files are supported. Each file in the archive becomes an individual item, and the archive itself is removed.

  • Created a new index manager service for Elasticsearch maintenance.

  • Introduced a new pipeline step called MIME Type Detection that identifies the MIME type of file-based items. This step can serve as a central way to detect MIME types in the pipeline, instead of replicating this logic in various dataloader plugins in cases where the MIME type is either not provided or is not trusted because it may be incorrect.

  • Created the option to display only unique items within the Timeline and Metric widgets.

  • Included XlsxWriter as an optional package on the Squirro mirror.

  • Added GlobalSearch query to EntitiesListWidget query.

  • Now users can use searchbar.empty and searchbar.value in dashboard query templates.

  • Added a new type of content streamer called QueueFileSystemStreamer. This streamer extends the existing FileSystemStreamer with the idea that each data source has its own distinct queue. It can be enabled by modifying the /etc/squirro/common.ini file, by setting the stream_type option of the content section to filesystem-queue from filesystem.

  • Added documentation for the content streamers of the data processing pipeline. See Content Streamers.

  • Added an autogluon classifier to the libNLP classification steps.

  • You can now store metadata information on sub-items (pages) in the same way you can for top-level items. This metadata information can be used for internal operations and is not exposed via the Facet Aggregation API, which only considers top-level items.

Improvements#

Note

Bug fixes are included with individual biweekly non-LTS releases. If you are looking for bug fix information, see specific Squirro 3.8.x Release Notes.

The following general platform improvements are included with this LTS release:

Dashboards and Widgets#

  • The Items Table widget was migrated to React.

  • Show more than X number of results in Items widget when approximation is used on the backend.

  • General Items card improvements.

  • Items card in ItemsWidget now has a larger font and there is answer context highlighting in the abstract.

  • Global Search and Community 360 dashboards are no longer loaded on top of other dashboards.

  • Changed the default font size for the Chip component to 13px.

  • Changed yellow highlighting to bold font for highlighting inside item preview, item detail sidebar and HTML-based items. Yellow highlighting will still be used in the PDF highlighting.

  • Improved Tabs performance.

  • Updated default colors and themes based on the latest Cognitive Search design.

  • Users scroll to the first highlighted section within a document after clicking on a QA widget answer.

  • Added translate and compare tooltips to the ItemsWidget card.

Data Loading and Processing#

  • The three cloud connectors (Dropbox, Google Drive, OneDrive) now support zip files.

  • Added support for zip files in the Microsoft Exchange plugin.

  • The data sources in the Setup space now have a tooltip on their names in case their display name is clipped.

  • Ensured all connections have a 20-second timeout, including the /_internal/status endpoint.

  • Enabled placeholders in dataloader inputs.

  • Added an integration test for the synchronous batch processing feature.

  • Improvements to the Data Ingestion Logs dashboard in the Squirro Monitoring project. Specifically, batch-level start/end logs have been added for the pipeline steps operating in parallel execution mode. All batch-level start/end logs are now logged at the info logging level, while all item-level start/end logs (which provide more detail, supported for steps with parallel execution mode) are logged at the debug logging level. Therefore, from now on, start/end logs will be displayed by default in the dashboard without the need to modify the logging level to debug. Finally, structured logs have been added in a few places to assist with understanding.

  • The pdfconversion service can now emit structured logs, which are linked to the rest of the data ingestion logs. Therefore, structured logs from the pdfconversion service will now be included in the Data Ingestion Logs dashboard of the Squirro Monitoring project.

  • Increased and synchronized the timeout options related to pipelets involved in data ingestion.

  • The default pipeline workflows for Dropbox, Google Drive, and Microsoft OneDrive connectors now include the pdf-conversion and webshot steps. This means that MS Office documents found in these services will be converted to PDFs and a thumbnail will be created for them.

  • Improved the reliability of PDF conversions when multiple conversions take place in parallel.

Communities#

  • The CommunityItems widget was ported to React and a custom widget API was implemented.

  • Added feedback to the Communities List widget.

  • Community augmentation is now easier to implement by making all the returned fields optional. Name, photo, and properties are now used if present but don’t have to be returned.

  • Photos returned by community augmentation can now be absolute URLs as well.

  • Squirro now includes community types in the response after subscribing to a community.

  • Community subscriptions now reflect the state of applied filters and the search bar query.

  • Now the horizontal mode is available without relevant communities in the Communities List widget.

  • Added a widget query to the Communities List widget.

  • Added query to community subscriptions collection, allowing for community subscription filtering via query. This is currently available in the Items Table widget.

Other Improvements#

  • Now, squirro_status works even if a service is not installed.

  • Suppressed the Instantiate SkipTimeoutConnectionClass message in squirro_status output.

  • Reduced noisy logging in frontend.log by the monitoring plugin.

  • Store read item IDs on the frontend to show the correct status immediately to the user.

  • Enabled http2 for all resources.

  • Various extensions and improvements to the structured logging system.

  • File-based Squirro items (i.e. items that include the files attribute) returned by the API will now include an original: true flag to indicate the file from which the item is derived.

  • Updates to 1-Click Connector configuration forms.

  • Moved the nlp service config to the configuration service so that the config is displayed in a user-friendly way.

  • Added config to enable/disable balancing of dataset inside balancer step.

  • Activity Insights now has higher scheduling frequency and separate backoff time for long-lived sessions with 15min (dashboard & browsing session) and short-lived query session (3 minutes).

  • Added validation to the configuration service. Now, dictionary settings defined in the configuration service may include a schema definition that is then used for validation, e.g. provided text instead of number, value not within specified range etc.

  • Upgraded Filebeat to version 8.8.1.

  • Limited the number of max_features used on SKLearn transformer to avoid too much memory consumption when a dataset is too large.

Breaking Changes#

See the following three categories for breaking changes included with Squirro 3.8.6 LTS:

General#

  • The Items widget was redesigned. While the API remains the same, verify that any customizations you’ve made within your projects still work correctly.

  • Result List and Cards widgets have been removed from the UI. Both are still supported if they already exist within a project, though they are slated for full depreciation during the next LTS release.

  • The public methods enqueue and enqueue_data of the FileSystemStreamer will now return the absolute path to the enqueued data file.

  • The plumber service can no longer be forked as forking was causing degraded service performance.

  • In the App and Nav Bar Settings, project colors now support only the HEX color schema. Admins should update project colors accordingly.

AI Studio#

  • Dropped the ground_truth_name column in the models table.

  • Deprecated the ability to change the id and name of the groundtruth in the models table.

  • Dropped the gt_id, gt_name, and model_name columns in the publish table.

  • Adding a document-level classification model to a pipeline now requires the document_label argument that defines the label category used for the model to output its predictions.

  • If you are using custom templates within AI Studio, they must be handled using the new structure.

Search#

  • The Smart Filters feature is now fully deprecated.

  • The perform_on_languages option in the QA widget configuration no longer supports the None value. To find an answer regardless of the language of the question, set this option to *.

  • The terminate_after option in the content typeahead configuration no longer supports boolean values. To stop query execution from terminating early, set this option to 0.

  • Changed the default value of track_total_hits from true to 500,000 to optimize query performance. This means that Elasticsearch will now stop counting documents when the value exceeds this threshold. To change this behavior, see the track_total_hits option in the topic.search.search-settings configuration service option.

Note

For help with breaking changes, contact Squirro Support.

Installation and Upgrade#

For new installations, find step-by-step instructions in Install and Manage Squirro with Ansible (recommended) or Installing Squirro on Linux.

To upgrade an existing installation, see Upgrading Squirro.