3.8.4 - Release Notes
3.8.4 - Release Notes#
Squirro 3.8.4 was released September 18, 2023. It is the current biweekly release.
Reference: Learn more about the Squirro Release Process.
AI Studio has been completely reworked, with a redesigned UI, new features, and improved performance. Creating no-code machine learning models has never been easier. You can now create and train models in three steps.
New or improved features include:
The introduction of AutoML templates (Fast, Balanced, and Accurate), which will train multiple models with automated hyperparameter tuning, model ensemble, and automatically deliver the best-performing model for the dataset.
AutoML-supported libraries, models, and algorithms include LightGBM, CatBoost, XGBoost, Fasttext, RandomForest, ExtraTrees, KNeighbour, NeuralNetTorch and FastText.
Proximity-rule-based bulk labeling for sentence-level classification models.
Query-based bulk labeling for document-level classification models. This enables importing training data into the ground truth using Squirro Item labels. Additionally, for more advanced users, this feature enables the creation of training data utilizing the full potential of Squirro’s Query Syntax.
Unpublish models from the publish screen, which will delete the published workflow from a Squirro project.
Deploy or undeploy a published model directly into/from a pipeline from the AI Studio, without having to open the Pipeline Editor. This will automatically place (or remove) the model as a pipelet within the classification step of a selected Squirro pipeline.
Technical improvements achieved during this specific release include the following:
Labeling focus mode has been reworked and now delivers blazing fast results.
Added a help dropdown AI Studio pages.
Added an option to limit bulk labeling items.
Added a more informative error message when trying to delete a ground truth that is being used by a model.
Reworked the sentences highlighting logic when fetching all groundtruth items to be the same as for an individual item (this fixed discrepancies for sentences).
Query labeling template now accepts a
query_countparameter to limit the number of items retrieved for bulk labeling.
Added relationships from groundtruth to model and from model to publish in the machinelearning database.
ground_truth_namein model and publish will now automatically update with changes to the
model_namewill now automatically update with changes to the model. Also migrated all
machinelearningservice migrations to use the new Redis functionality to remember if it has been run before.
getrequests for published AI Studio models, which tells the ML service to retrieve the
datasourcepipeline workflows where the published models are being used.
GETrequests to endpoints for getting data about published models now enrich each model with the pipeline workflows they are used in.
Reference: To learn more, see AI Studio.
Added new Squirro Syntax, which allows proximity search using Span Near ES queries. An example of such syntax is
"auction of gas"~(3)which searches for documents where the terms inside quotes occur within 3 words in any order and
"auction of gas"which is similar, but only considers documents where the items occur in the order provided.
Contextualized PDF Inline Search: Additional widget query template can now evaluate the parsed query-context (to activate semantic search vs keyword search).
Add new scoring-plugins for recommendations: on user-item-history (
recommend_on_user_items) & item id’s (
Other New Additions to Squirro#
You can now configure the text highlight color via the Setup space App & Nav Bar Settings.
There is a new tag for highlighting answer context.
Data sources now have an affiliated picture. They are initialized with the picture from the Dataloader plugin they are created from.
There is now support for loading images (TIFF, PNG, JPEG) into Squirro as PDFs.
Large PDFs are auto-scaled down to the viewer size.
Studio plugins using project-scoped endpoints now support token-based authentications. Token-based authentication is not supported for non-project-scoped endpoints for security reasons.
Added a new endpoint to expose SharePoint metadata in the SAML SSO Plugin.
Created an Explore Dashboard for every new project and added to existing projects. Squirro now supports the new dashboard role
Exposed priority levels in
Added a new libNLP step that truncates the fields of the given documents down to the first
Added the Community Info Card widget to the platform.
Squirro now supports deploying a model to multiple pipelines at once.
CommunityItems widget was ported to React and a custom widget API was implemented.
The Items widget now has a configuration to specify which labels are shown when creating a new note using the Note Taking feature.
Global search now breaks queries into individual chips, making it easier to edit queries after the initial search. This feature can be enabled or disabled through
frontend.userapp.query-chips-enabledin Project Configuration.
Improvements to the Google Drive and Dropbox 1-click connectors, including the input label, placeholder, and helper text.
http2for all resources.
Squirro now uses Orjson for Elasticsearch serialization and deserialization. This improves performance as Orjson is much faster than the simple JSON used by the Elasticsearch client.
Now, Orjson is used for the SquirroClient. It improves the performance of serializing and deserializing data.
General Items card improvements.
Items card in ItemsWidget now has a larger font and there is answer context highlighting in the abstract.
Various extensions and improvements to the structured logging system.
Improved the Explore screen performance.
Added an option to specify
samplerfor search aggregations. It can be used within instances with a large number of documents to drastically improve performance.
Added an option to quantize embeddings. This allows for storing embeddings as bytes in the Elasticsearch index.
File-based Squirro items (i.e. items that include the
filesattribute) returned by the API will now include an
original: trueflag to indicate the file from which the item is derived.
The three cloud connectors (Dropbox, Google Drive, OneDrive) now support zip files.
Updates to 1-Click Connector configuration forms.
Improved Tabs performance.
Now, when a phrase or proximity filter is applied to the query, the whole phrase is highlighted, not individual terms.
Community augmentation is now easier to implement by making all the returned fields optional. Name, photo, and properties are now used if present but don’t have to be returned.
Squirro now returns the number of sampled documents by Elasticsearch. This allows the UI to display how many documents were sampled when using the random or aggregation sampler.
Increased and synchronized the timeout options related to pipelets involved in data ingestion.
Photos returned by community augmentation can now be absolute URLs as well.
Squirro now includes community types in the response after subscribing to a community.
pdfconversionservice can now emit structured logs, which are linked to the rest of the data ingestion logs. Therefore, structured logs from the
pdfconversionservice will now be included in the Data Ingestion Logs dashboard of the Squirro Monitoring project.
nlp serviceconfig to the configuration service so that the config is displayed in a user-friendly way.
The data sources in the Setup space now have a tooltip on their names in case their display name is clipped.
Added support for zip files in the Microsoft Exchange plugin.
Community subscriptions now reflect the state of applied filters and the search bar query.
Changed yellow highlighting to bold font for highlighting inside item preview, item detail sidebar and HTML-based items. Yellow highlighting will still be used in the PDF highlighting.
Within the typeahead API, suggestions may contain the new field
chipsif requested (
options.with_tokenized_suggestion:bool). These additional
chiptokens are used to render the chips in the global search bar.
Paragraphs are now stored in a separate Elasticsearch index.
Introduced a new highlighting tag for quick answers, with key answer terms bolded.
Stopwords are no longer highlighted in abstracts. Highlights are stripped from Elasticsearch and the abstract is highlighted by Squirro based on “cleaned” terms.
Fixed an issue with the Monitoring plugin not working with certain elastic paths.
Fixed a bug where the Transform Input step (which performs the item transformation in the pipeline) fails if the schema retrieved by a successfully connected plugin is empty. This was seen on a multi-node installation with the
Squirro Activity Logsource, where the activity logs could not be reached by the plugin during ingestion because the logs were stored in a different node (and the logs were not mounted in shared storage accessible by the processing node).
Fixed an issue with 1-click connector buttons and tip-box display and improved alert box styling.
The default values of the config options of
syntax=jsonfor built-in steps are now rendered correctly in the Pipeline Editor.
Fixed the signature of
Fixed an issue where when a parse request fails some firewall rules may block all query requests.
Fixed missing entity properties and aggregations when querying nonstring keywords.
Fixed the mishandling of boolean value in the
Fixed an issue with the debug log breaking when a query is not passed (the query should be optional).
entitiesis added to the
fieldsparameters of the query endpoint, the
explanationfield is not returned.
Fixed a scroll-to-page issue in the PDF viewer.
Various document labels popover fixes.
Fixed an issue with communities not unsubscribing correctly.
Fixed an issue with sort order missing in dashboard filters.
Fixed an issue with the Items widget not re-evaluating its additional query value on search change.
Fixed an issue with Items widget entity highlighting not working correctly.
Multiple fixes to chips with dropdown and improvements to the Items widget.
Fixed an issue with the QA widget not adding
FacetTableWidgetto fix an issue with broken dashboards that were using the old widget type.
Fixed an issue with
FacetListWidgetnot clearing from widgets of the same type.
Applied a fix that ensures only one Item Detail opens at a time when reloading Squirro with an Item Detail modal window open.
Fixed an issue with long tab titles in the vertical mode of the Tabs widget overlapping each other.
Fixed an issue with facet filters being broken by a query applied by the Tabs widget surrounded with double quotation marks.
Fixed an issue where the results of a query in the AI Studio Candidate Set Creation Screen did not match the results of a “regular” query when custom keywords/facets were set as searchable.
Fixed number of labels in AI Studio groundtruth table being capped at 10,000.
ground_truth_namewill no longer lead to discrepancies between the model and publish steps in AI Studio.
Fixed a bug where the backend throws an exception when an AI Studio model does not have a groundtruth_id associated with it.
Fixed an issue in AI Studio where updating a candidate set with an empty query (
"") was throwing an exception due to a typing error.
ground_truth_namecolumn in the
Deprecated the ability to change the
nameof the groundtruth in the
model_namecolumns in the
Installation and Upgrade#
To upgrade an existing installation, see Upgrading Squirro.