3.8.4 - Release Notes

3.8.4 - Release Notes#

Squirro 3.8.4 was released September 18, 2023.

Reference: Learn more about the Squirro Release Process.

What’s New#

AI Studio#

AI Studio has been completely reworked, with a redesigned UI, new features, and improved performance. Creating no-code machine learning models has never been easier. You can now create and train models in three steps.

New or improved features include:

The introduction of AutoML templates (Fast, Balanced, and Accurate), which will train multiple models with automated hyperparameter tuning, model ensemble, and automatically deliver the best-performing model for the dataset.
AutoML-supported libraries, models, and algorithms include LightGBM, CatBoost, XGBoost, Fasttext, RandomForest, ExtraTrees, KNeighbour, NeuralNetTorch and FastText.
Proximity-rule-based bulk labeling for sentence-level classification models.
Query-based bulk labeling for document-level classification models. This enables importing training data into the ground truth using Squirro Item labels. Additionally, for more advanced users, this feature enables the creation of training data utilizing the full potential of Squirro’s Query Syntax.
Unpublish models from the publish screen, which will delete the published workflow from a Squirro project.
Deploy or undeploy a published model directly into/from a pipeline from the AI Studio, without having to open the Pipeline Editor. This will automatically place (or remove) the model as a pipelet within the classification step of a selected Squirro pipeline.

Technical improvements achieved during this specific release include the following:

Labeling focus mode has been reworked and now delivers blazing fast results.
Added a help dropdown AI Studio pages.
Added an option to limit bulk labeling items.
Added a more informative error message when trying to delete a ground truth that is being used by a model.
Reworked the sentences highlighting logic when fetching all groundtruth items to be the same as for an individual item (this fixed discrepancies for sentences).
Query labeling template now accepts a query_count parameter to limit the number of items retrieved for bulk labeling.
Added relationships from groundtruth to model and from model to publish in the machinelearning database. ground_truth_name in model and publish will now automatically update with changes to the ground_truth, model_name will now automatically update with changes to the model. Also migrated all machinelearning service migrations to use the new Redis functionality to remember if it has been run before.
Added the include_pipeline_workflows parameter to get requests for published AI Studio models, which tells the ML service to retrieve the datasource pipeline workflows where the published models are being used.
GET requests to endpoints for getting data about published models now enrich each model with the pipeline workflows they are used in.

Reference: To learn more, see AI Studio.

Search#

Added new Squirro Syntax, which allows proximity search using Span Near ES queries. An example of such syntax is "auction of gas"~(3) which searches for documents where the terms inside quotes occur within 3 words in any order and "auction of gas"[3] which is similar, but only considers documents where the items occur in the order provided.
Contextualized PDF Inline Search: Additional widget query template can now evaluate the parsed query-context (to activate semantic search vs keyword search).
Add new scoring-plugins for recommendations: on user-item-history (recommend_on_user_items) & item id’s (recommend_on_item).

Other New Additions to Squirro#

You can now configure the text highlight color via the Setup space App & Nav Bar Settings.
There is a new tag for highlighting answer context.
Data sources now have an affiliated picture. They are initialized with the picture from the Dataloader plugin they are created from.
There is now support for loading images (TIFF, PNG, JPEG) into Squirro as PDFs.
Large PDFs are auto-scaled down to the viewer size.
Studio plugins using project-scoped endpoints now support token-based authentications. Token-based authentication is not supported for non-project-scoped endpoints for security reasons.
Added a new endpoint to expose SharePoint metadata in the SAML SSO Plugin.
Created an Explore Dashboard for every new project and added to existing projects. Squirro now supports the new dashboard role explore.
Exposed priority levels in squirro_data_load.
Added a new libNLP step that truncates the fields of the given documents down to the first X words.
Added the Community Info Card widget to the platform.
Squirro now supports deploying a model to multiple pipelines at once.
CommunityItems widget was ported to React and a custom widget API was implemented.
The Items widget now has a configuration to specify which labels are shown when creating a new note using the Note Taking feature.
Global search now breaks queries into individual chips, making it easier to edit queries after the initial search. This feature can be enabled or disabled through frontend.userapp.query-chips-enabled in Project Configuration.

Improvements#

Improvements to the Google Drive and Dropbox 1-click connectors, including the input label, placeholder, and helper text.
Enabled http2 for all resources.
Squirro now uses Orjson for Elasticsearch serialization and deserialization. This improves performance as Orjson is much faster than the simple JSON used by the Elasticsearch client.
Now, Orjson is used for the SquirroClient. It improves the performance of serializing and deserializing data.
General Items card improvements.
Items card in ItemsWidget now has a larger font and there is answer context highlighting in the abstract.
Various extensions and improvements to the structured logging system.
Improved the Explore screen performance.
Added an option to specify sampler for search aggregations. It can be used within instances with a large number of documents to drastically improve performance.
Added an option to quantize embeddings. This allows for storing embeddings as bytes in the Elasticsearch index.
File-based Squirro items (i.e. items that include the files attribute) returned by the API will now include an original: true flag to indicate the file from which the item is derived.
The three cloud connectors (Dropbox, Google Drive, OneDrive) now support zip files.
Updates to 1-Click Connector configuration forms.
Improved Tabs performance.
Now, when a phrase or proximity filter is applied to the query, the whole phrase is highlighted, not individual terms.
Community augmentation is now easier to implement by making all the returned fields optional. Name, photo, and properties are now used if present but don’t have to be returned.
Squirro now returns the number of sampled documents by Elasticsearch. This allows the UI to display how many documents were sampled when using the random or aggregation sampler.
Increased and synchronized the timeout options related to pipelets involved in data ingestion.
Photos returned by community augmentation can now be absolute URLs as well.
Squirro now includes community types in the response after subscribing to a community.
The pdfconversion service can now emit structured logs, which are linked to the rest of the data ingestion logs. Therefore, structured logs from the pdfconversion service will now be included in the Data Ingestion Logs dashboard of the Squirro Monitoring project.
Moved the nlp service config to the configuration service so that the config is displayed in a user-friendly way.
The data sources in the Setup space now have a tooltip on their names in case their display name is clipped.
Added support for zip files in the Microsoft Exchange plugin.
Community subscriptions now reflect the state of applied filters and the search bar query.
Changed yellow highlighting to bold font for highlighting inside item preview, item detail sidebar and HTML-based items. Yellow highlighting will still be used in the PDF highlighting.
Within the typeahead API, suggestions may contain the new field chips if requested (options.with_tokenized_suggestion:bool). These additional chip tokens are used to render the chips in the global search bar.
Paragraphs are now stored in a separate Elasticsearch index.
Introduced a new highlighting tag for quick answers, with key answer terms bolded.
Stopwords are no longer highlighted in abstracts. Highlights are stripped from Elasticsearch and the abstract is highlighted by Squirro based on “cleaned” terms.

Bug Fixes#

Fixed an issue with the Monitoring plugin not working with certain elastic paths.
Fixed a bug where the Transform Input step (which performs the item transformation in the pipeline) fails if the schema retrieved by a successfully connected plugin is empty. This was seen on a multi-node installation with the Squirro Activity Log source, where the activity logs could not be reached by the plugin during ingestion because the logs were stored in a different node (and the logs were not mounted in shared storage accessible by the processing node).
Fixed an issue with 1-click connector buttons and tip-box display and improved alert box styling.
The default values of the config options of type=code and syntax=json for built-in steps are now rendered correctly in the Pipeline Editor.
Fixed the signature of useQueryEvaluator.
Fixed an issue where when a parse request fails some firewall rules may block all query requests.
Fixed missing entity properties and aggregations when querying nonstring keywords.
Fixed the mishandling of boolean value in the machinelearning service endpoint.
Fixed an issue with the debug log breaking when a query is not passed (the query should be optional).
If entities is added to the fields parameters of the query endpoint, the explanation field is not returned.
Fixed a scroll-to-page issue in the PDF viewer.
Various document labels popover fixes.
Fixed an issue with communities not unsubscribing correctly.
Fixed an issue with sort order missing in dashboard filters.
Fixed an issue with the Items widget not re-evaluating its additional query value on search change.
Fixed an issue with Items widget entity highlighting not working correctly.
Multiple fixes to chips with dropdown and improvements to the Items widget.
Fixed an issue with the QA widget not adding additionalQuery to requests.
Migrated the KeywordsWidget to FacetTableWidget to fix an issue with broken dashboards that were using the old widget type.
Fixed an issue with FacetListWidget not clearing from widgets of the same type.
Applied a fix that ensures only one Item Detail opens at a time when reloading Squirro with an Item Detail modal window open.
Fixed an issue with long tab titles in the vertical mode of the Tabs widget overlapping each other.
Fixed an issue with facet filters being broken by a query applied by the Tabs widget surrounded with double quotation marks.
Fixed an issue where the results of a query in the AI Studio Candidate Set Creation Screen did not match the results of a “regular” query when custom keywords/facets were set as searchable.
Fixed number of labels in AI Studio groundtruth table being capped at 10,000.
Changing the ground_truth_name will no longer lead to discrepancies between the model and publish steps in AI Studio.
Fixed a bug where the backend throws an exception when an AI Studio model does not have a groundtruth_id associated with it.
Fixed an issue in AI Studio where updating a candidate set with an empty query ("") was throwing an exception due to a typing error.

Breaking Changes#

Dropped the ground_truth_name column in the models table.
Deprecated the ability to change the id and name of the groundtruth in the models table.
Dropped the gt_id, gt_name, and model_name columns in the publish table.

Installation and Upgrade#

For new installations, find step-by-step instructions in Install and Manage Squirro with Ansible (recommended) or Installing Squirro on Linux.

To upgrade an existing installation, see Upgrading Squirro.