3.11.0 Release Notes#

Squirro 3.11.0 was released on August 19, 2024.

Learn more about the Squirro Release Process.

CentOS 7 end of support

As of June 30, 2024, CentOS 7 has officially reached its end of life and is no longer supported by Squirro. To ensure the security and stability of your systems, Squirro strongly recommends migrating to a supported operating system as soon as possible. For further details on the end of support and guidance on migration options, contact Squirro Support.

Note for administrators

This release introduces important changes that may require adjustments to your existing setup.

What’s New#

  • Added the possibility of injecting the necessary configuration into the genai service via environment variables.

  • When the queue-based filesystem streamer is used, the time required to retrieve the number of queued items of the data sources has been significantly improved. This improvement will be most evident on instances with many queued batches.

  • Added the project date formatting configuration, reused across the whole project.

  • The Label Lookup NLP step is automatically enabled during the query processing and applies potential label filters during Search Intent detection, improving search results.

  • The Logs option of a data source has been renamed to Dataloader Logs to more accurately reflect its content. Additionally, a new option called Pipeline Logs has been added which allows users to view the data ingestion logs specific to this data source.

Improvements#

  • Chat with Document feature rework, engage in conversations with individual Squirro items more effectively.

  • Properly purge dangling paragraphs when an item is re-indexed with fewer paragraphs.

  • The API now returns projects sorted by project-type, ensuring the default project is not the monitoring project.

  • When rerunning from the index using the UI, the system returns a link to directly access the Data Ingestion Logs dashboard of the Squirro Monitoring project, to check the progress of the triggered rerun job. When rerunning from the index using the SquirroClient (Python SDK), a rerun_job_id is returned in the response.

  • Support UFN in the data source next run.

  • Apply the hardcoded limit for the chat only to the crawler data source.

  • Updating or deleting item endpoints now waits for the updating or deleting of paragraphs instead of doing it in a the background.

  • For data loading, default to the data schema determined from the file and use the plugin schema as a fallback option.

  • Add the opentelemetry sqlalchemy` extension, adding tracing for the sqlalchemy` operations in the timing reports.

  • Avoid showing 400 errors (already paused or resumed) on data source pause or resume all.

  • Auto LabelLookup for query-intent-detection. The overall labels popularity is now used to rescore label-value` candidates. For example, how often a specific label is tagged on documents within a project contributes to the suggested label-auto-filter.

  • Clear the global search when the advanced search is cleared.

  • Update the data schema to include the nested fields (for example, for the JSON data plugin).

  • When QFSS is configured, the ingester builds the index of queues using a background thread to avoid delaying the service bootup.

Bug Fixes#

  • Removed fallback data schema when retrieving schema using the data plugin method to prevent duplicate validation.

  • Item detail no longer opens on every dashboard selection change.

  • Fixed the max width issue with the keywords table popup in the Items widget.

  • Custom datetime facets are formatted correctly in the ItemsTable widget.

  • The items table columns are resizable.

  • Correctly handle an invalid query_context.parse value.

  • The pdfconversion service no longer fails to convert files with long filenames of more than 150 characters.

  • Assume some query parts can be functions for item query construction.

  • The Explore button on data sources works as expected.

Breaking Changes#

  • The default value for retriever in the SquirroGPT configuration is now squirro_retriever, as vectorsearch_retriever is no longer supported. The Squirro instances using vectorsearch_retriever must change this value at the project or server-level configuration.

  • The FileSystemStreamer.get_number_of_batches_per_project and FileSystemStreamer.get_number_of_batches_per_source methods have been removed. The FileSystemStreamer.get_number_of_batches method, in combination with the entity_type and entity_id parameters, now serves the same purpose. The entity_type parameter specifies whether the entity_id should be treated as a project or source ID.

  • API changes for the FileSystemStreamer.get_number_of_batches public method: the data_dirname_pattern parameter was removed, as the method now defaults to counting the number of batches for data directories. To count the number of failed batches, the newly added only_failed parameter must be used. Finally, the method now takes an entity_id parameter and constructs a pattern based on it.

  • The Paragraph Embedding pipeline step, requires now a position associated with the text chunks to compute embeddings. This should be considered as a breaking change only if a custom step is used to perform text chunking. If the built-in Text Chunking step is used, no action is required.

Installation and Upgrade#

For new installations, find step-by-step instructions in the Install and Manage Squirro with Ansible page (recommended) and Installing Squirro on Linux pages.

To upgrade an existing installation, see the Upgrading Squirro page.