3.9.1 Release Notes#

Squirro 3.9.1 was released on November 10, 2023.

Reference: Learn more about the Squirro Release Process.

What’s New#

  • Monitoring projects are no longer shown in the project list accessed by clicking the current project title. They are accessible via Monitoring in the Squirro Spaces menu.

Accessing Monitoring via Squirro Spaces menu
  • A new Semantic Search default pipeline was created, available as a preset pipeline workflow within the Pipeline Editor.

New default semantic search preset pipeline
  • The dashboard Search Bar widget now also has query chips enabled. Learn more about Search Bar Chips.

  • Dashboard facets and facets applied by URL are now converted to search chips and shown in the global search and dashboard search widget with the ability to remove them.

  • Added a new Topic API endpoint for retrieving batches of GT labels to enable clients to retrieve large numbers of labels without timing out.

  • Studio Plugins can now access a database (the studio database) and define and manage their own tables via SQLAlchemy.

  • Added availability for moving all failed batches of a given source back to processing via the SquirroClient (Python SDK). This requires specifying project and source identifiers. If a batch identifier is specified, only that specific batch will be moved. If a batch priority level is specified, only batches with that priority level will be moved.

  • The PDF Conversion step will now pass the MIME type of the file to convert to the pdfconversion service. In cases where the MIME type has not been provided, the pdfconversion service will try to detect it by content. It is possible to enable filename-based detection first via the server config option pdfconversion.detect-mime-type-only-by-content.

  • The MIME Type Detection step now includes a configuration option to detect MIME types solely based on content, disregarding the filename. This can be useful in scenarios where the data ingested may have misleading extensions. For example, if a file with the filename test.pdf is in fact a TIFF file and not a PDF.

  • Added new API overrides to the item detail: getSidepanelItem and getSidepanelWidth.

Improvements#

  • Added the option to do keyword search on paragraphs using the query endpoint by specifying the search_scope: paragraph option.

  • Added fullscreen and export options for the heatmap chart in the validation screen of AI Studio.

  • Added the new method get_groundtruth_labels_batched that returns a ground truth label generator, which uses the new batched endpoint from Topic API.

  • The Semantic Search documentation now includes a new section on :ref:``.

  • After successfully importing a project, you will now be taken directly to that project’s welcome page.

  • JPEG 2000 files are now supported in the filesystem (Documents) plugin.

  • Implemented a unified welcome endpoint under /welcome/project_id.

  • Small improvements to the support for image to PDF conversions: the pdfconversion service will attempt to convert to PDF any image/* file. Conversion to RGB mode will happen for any mode that is not tested for saving to PDF.

  • Moved the embedding pipelet to the native pipeline step.

  • Adjusted the squirro_groundtruth_loader step to use newly added client method for retrieving GT labels using batches under the hood. This should enable ML workflows to load large amounts of GT labels without the request to the backend timing out.

  • Created the project start endpoint app/project_start/<project_id>.

  • Now, thumbnails are shrunk within Items when the abstract is hidden.

  • Added the option to retrieve paragraphs from the query endpoint using the response_format: paragraph option.

Bug Fixes#

  • Fixed an issue with datetime facet formatting options for Items and FacetList.

  • Fixed item fields being lowercase in the Items Table.

  • Made automatic PDF rescaling more stable. In some cases, minor issues with scaling were causing an infinite loop.

  • Fixed an issue with unnecessary arrows showing up on certain dashboards.

  • Restored the Show labels on top and Show sort options configuration options within the Items widget.

  • Fixed input size and padding inconsistencies.

  • Fixed an issue with Items widget cards not having the same height in horizontal mode.

  • Now the filebeat scriptlet uses the Squirro configuration for filebeat after an upgrade.

Breaking Changes#

The Paragraph Embedder pipelet is no longer supported and all semantic search pipelines should replace it with the new Paragraph Embedding step.

Installation and Upgrade#

For new installations, find step-by-step instructions in Install and Manage Squirro with Ansible (recommended) or Installing Squirro on Linux.

To upgrade an existing installation, see Upgrading Squirro.