3.10.6 (Latest LTS Release)#

Squirro 3.10.6 was released on July 26, 2024.

Learn more about the Squirro Release Process.

Caution

This release includes breaking changes. See the Breaking Changes section at the end of this page to learn more.

What’s New#

  • New agent framework for agent-driven LLM endpoints.

    Agent Framework
  • Redesigned Community List Widget for improved usability.

    Community list widget redesign
  • SquirroGPT project templates for new projects with OpenAI and Mixtral LLMs are accessible from the sidebar.

    New default template for Mixtral LLM
  • New filter widget for My Communities added to enhance user navigation.

  • New special chip for the favorite filter in GlobalSearch that allows editing the chip and expanding the saved query.

  • The pipeline step Thumbnail Extraction has a new config option for setting the format of the generated thumbnail image. It supports JPEG and PNG formats. The default format has changed from PNG to JPEG.

  • Improved MUI theme customization.

  • SquirroGPT suggests relevant documents when it cannot find a proper answer.

  • Added the possibility to inject extra information during query processing, allowing implementation of custom query processing steps that add data to the query context then consumed by the studio plugins.

  • Upgrade to Python 3.11. All Python-based microservices now run on Python 3.11, and all Squirro-related Python dependencies are located in the /opt/squirro/virtualenv311/ virtual environment (can be activated using squirro_activate as before). Please note that Python 3.11 support applies to RHEL 8 (and derivatives) deployments.

  • Introduced a special chip feature for users to easily identify their subscribed communities.

  • Added a configuration option to use the --no-dependencies flag with pip when installing packages from a requirements.txt file for Squirro Assets.

  • Added a new compute_statistics parameter to the Labels API, allowing specification of which label should be used for Search Intent. If set to True, the label is included in the lookup index.

  • Introduced a new Time Selection Widget in React.

  • New baseline intent detector that supports automatic drilling down the query by filtering on known facets that match a detected entity, and performing date-time filtering based on natural language dates.

  • Added a paragraph-level tagging, allowing keywords to be extended from the paragraph level.

  • (beta) Aggregations on facets.

  • Pipeline steps executed in batch execution mode now feature a timeout configuration option. It enforces a limit (in seconds), after which a timeout error appears.

  • Add an endpoint for acquiring user session information with a caching mechanism.

  • Restrict access to sessions flagged as deleted.

  • Add a response checker chain to flag if the generated answer from the GenAI service is satisfactory.

  • Add a new chain for keyword-based query formulation.

  • Add a new toggle to switch between editing a model with JsonForms or with the ACE editor in the server and project configuration menus (only available for configurations with a defined schema).

  • The plumber service can now be deployed as a Docker container, see the Deploying the Plumber service in a Docker container page.

  • Added option to the Communities List widget to hide communities without matching items.

  • Replaced Jaegar tooling with OpenTelemetry.

  • Added a redo OCR configuration option to the PDF OCR pipeline step.

  • Scoring Plugins can now inject Elastic DSL rescore query clauses.

Improvements#

AI#

  • Improved SquirroGPT retrieval, reducing the occurrences of irrelevant document sources.

  • Added the context length to the sqgpt configuration. Users can manually set the context length sent to the LLM. This overwrites the context length estimation and may lead to issues if wrongly set.

  • Improved context length guessing by adding the newest GPT 4 models and updating legacy models.

  • The embedding models process switches to a lazy-loading technique. This change is part of the GenAI embedding models deprecation process, but it still ensures functionality for older projects using the genai service retriever.

  • Introduced the SQGPTConfig schema to the configuration service for easier setting and validation. The service can now interpolate values from .ini files in sections or keys with dots (for example, ${sqgpt\.settings.api_key} for api_key under the sqgpt.settings section).

  • The answer is initially highlighted and scrolled to when a user opens the item detail.

  • The answer matched in the item detail sidebar is marked as answer in the title.

  • Added the automatic context length determination to the summarization endpoint.

  • GenAI Squirro Retriever now only uses query_context to run queries, for better copilot results. It deprecates the use of the query parameter.

  • Add caching to fetch user grants for SquirroGPT, which can lead to a 30% increase in performance if a project is affected by grant leakage.

  • Exclude the semantic_chunks field from the payload sent to the ML service.

  • Reworked the SquirroGPT project and server-level configurations. The genai.sqgpt.default-settings configuration was introduced at the server level to set defaults for all the existing SquirroGPT projects. These values override the values of the genai.sqgpt.settings configuration of each project. The values at the project-level configuration that are set to null or to Default to Server Setting fallback to the server-level configuration.

  • Included a temporal anchor in the system prompt (UTC now).

  • Passed the search_mode parameter to the underlying LangChain component.

  • Removed citing parent_document_id for inline citations.

  • Enforced strict contextual following in user prompts instead of system messages.

  • The Wikipedia tool is disabled by default.

  • Add the client kwargs to squirro_retriever. It enables parsing information such as user_information, auth_service, auth_user to the squirro client in the squirro_retriever in the genai service.

  • Batch Q&A new feedback mechanism, improved question editing, ability to delete the first question, filtering capabilities, adjusted the regenerate summary tooltip, summary cleared when the clear all option is selected, new export modal when sources are enabled, added the export of source links.

  • Gracefully handle empty retriever resultsets in the genai service by returning a hint message.

  • Automatic mapping of /var/lib/squirro/plugins and /etc/squirro/genai.d/deployments into the genai container.

Platform#

  • Multiple improvements to the Tabs widget.

  • Improved security

  • Mixtral project templates updated to include the Cognitive Search dashboards as well as the Chat with Results and Chat with Document widgets.

  • Support the Retry Failed Items and Rerun from Raw Data features for the QFSS system.

  • Item Detail widget rework.

  • Optimized loading of the entities and PDFs.

  • Upgraded several Python dependencies to the newer versions.

  • Updated the step to offer options for compressing converted PDF documents containing images, reducing file sizes and ensuring linearization of converted PDFs.

  • End of support for the Gluster shared file system.

  • Various improvements to Squirro Ansible, including support for build_mode, control for cache settings in common.ini, and support for hourly and daily snapshots.

  • Easily track the pipeline steps using the rerun_job_id label in the Data Ingestion Logs dashboard of the Squirro Monitoring project. Find the ID of the job triggered in the datasource service logs at the INFO level (for example, using the following command tail -f /var/log/squirro/datasource.log | grep 'rerun_from_index').

  • The Index pipeline step now offers options to index Squirro items or associated paragraphs. Rerun from index now includes sub-items by default, controlled by the datasource.rerun.index.include-sub-items setting. Override this in the Python SDK with the include_sub_items parameter in the rerun_pipeline_workflow method.

  • Squirro Ansible ensures consistent tooling by installing Python virtualenv, Elasticsearch client, requests, and some JSON packages on storage-only nodes.

  • Improved Squirro Ansible documentation.

  • Enhanced Redis integration as the caching backend for the @cached decorator, now supporting prefixes and namespaces, auto-expiring keys, and asynchronous cache insertion. Enhanced efficiency of individual backend requests by enabling multiple caches by default, complete with a proper cache invalidation strategy. These caches pertain to authentication handling, project metadata retrieval (including facets data), and user information (including project permissions).

  • Optimized memory usage and increased speed during project export that includes items.

  • Added overrides and documentation for the CommunityItem widget.

  • Query context and paragraph id/field mapping have been added to the genai service.

  • Upgraded the redis-py client from version 4.3.4 to 5.0.2.

  • Created the RPM packages for the NLP services required by the semantic search projects, allowing the NLP services to run on the Squirro node server for fast semantic search proof of value (PoV).

  • The configuration options for the rerun_from_index queue were split from the queues_local section into their own section called queues_rerun_from_index.

  • Added support for all the tab tooltips in the tabs widget.

  • Added the query_context parameter to the /starter-questions endpoint of the GenAI Studio plugin, which allows filtering and limiting the set of Squirro Items used to generate the questions.

  • Improved editable chip targetability.

  • The squirro_lxml_html_extractor option in the common.ini file no longer has any effect and has been removed from the packaged common.ini file. The lxml parser is now used by default.

  • Re-enable project translations for the widget title in the React widgets.

  • The ingester service can be configured without any processors and solely serve its web API. Previously, it would exit immediately with a vague message when no processors were configured.

  • In addition to the timeout option, the Text Chunking step now exposes an option to specify the max item body size. This prevents the step from processing items with a very large body.

  • Removed the ingester.priorities.enabled server configuration option and made it the only way of operation for ingester. The number of processors for each priority can still be adjusted by the ingester.priorities.pool-*-processors server configuration options.

  • The configuration option processors in the ingester.ini file is now deprecated and is planned for removal in an upcoming release.

  • An active dashboard store is not cleared anymore when the global search is cleared.

  • Respected open state in the item detail side panel icon.

  • Added an action button to open the list of all favorites from the create favorite modal success snackbar.

  • Added support for Unicode filenames in the X-Filename HTTP header. The filenames are now properly encoded using the percent-encoding standard to ensure compatibility with various web browsers and HTTP clients.

  • Parse SSO information from the proxy to the genai service.

  • Do not pass None values as a query to the query transformers, but instead pass an empty string.

  • When building the sparse query, the active query processor workflow is used if available. If not, it defaults to the query_default query processor workflow.

  • Added the configuration option labels widget to disable the dropdown menu when empty.

  • The new expiration handling of user sessions is now enabled by default (security.expiring-sessions).

  • Expose an option to specify what item fields to export when exporting a Squirro project.

  • Remove card padding on the Community List widget when the card styling is turned off in the configuration.

  • Added clamping to the Item Detail title.

  • Expose the GET endpoint of /sessions/<session_id> in SquirroClient via get_session().

  • Added the Grid mode to the Items widget.

  • Added the search button control to the Item Detail toolbar.

  • Marked split view icon in the Item Detail toolbar active if it is open.

  • Overlay the side panel in the Item Detail on small screens (900px wide).

  • Make tooltips consistent with the item cards in the Item Detail.

  • Improve the performance for retrieving communities.

  • Updated the icon fonts to the latest version and changed all icons to the rounded type.

  • GenAI now persist code change in the container when restarted. However, these changes are overwritten once a new image is used. The volume is mounted on /var/lib/squirro/docker/volumes.

  • Improved performance when closing a modal.

  • Changed the MySQL driver from the PyMySQL to the MariaDB connector for improved performance.

  • Include nltk punkt and spacy model dependencies in the docker image.

  • Disable previous and next buttons in Item Detail for the first and last items.

  • Remove entity_query from the item model for improved performance.

  • Improved the grid spacing for the Items widget for improved layout.

  • NLTK stopwords and punkt packages are now baked into the GenAI and Plumber Docker images, helping with on-premise deployments without an internet access. The genai` service image now uses amazonlinux as a base image for improved security.

  • Studio plugins now run in a separate process, avoiding blocking the frontend service with long-running plugin tasks.

  • Changed the default value for boosting popular items from the top 100 to the top 30.

  • Enabled the lazy loading technique for the Show all labels mode for the LabelsWidget.

  • Added an option to hide empty placeholder images in the non-grid modes for ItemWidget.

  • Limited the crawler plugin to load 1,000 pages from a website, preventing overloading while indexing large sites. This limit can be adjusted in the data source configuration.

  • Added a new response_format option to the API for running ML workflows. This option allows changing the response format. The supported formats include standard (current format with values wrapped in a list) and plain (response without wrapping).

  • Enhanced the Activity data loader plugin to support an Activity file prefix. By default, it only loads activities from files matching the pattern activity.<date>.jsonl. Setting a prefix allows loading files matching the <prefix>activity.<date>.jsonl pattern.

  • Added health status for the studio service when doing squirro_status.

  • Added shortcuts to reference sections on cards. Clicking goes directly to the subsection inside the split view.

  • Improved items widget card sizing in grid layout to fill the free space.

  • Reintroduced document-level label feedback in the new Item Detail design.

  • When Items widget keywords are manually selected, the system keeps the order from the widget config and does not sort them alphabetically.

  • Removed the overwrite_files option from all the default buckets in the /etc/squirro/storage.ini file. The option is still available but disabled by default.

  • Added an option for the Confluence connector to only load pages tagged with specific labels. Set default Body format to view to properly display code blocks and tags.

  • Upgraded xlrd to the latest version (2.0.1).

Breaking Changes#

  • AutoML models trained in Squirro versions prior 3.10.4 cannot be loaded in versions 3.10.4 and later.

  • Removal of the task that previously deleted duplicate items each time project data was fetched. Consequently, the following parameters in the topic.ini configuration file are now obsolete: enable_duplicate_item_deletion, duplicate_item_deletion_shard_size, and duplicate_item_deletion_size.

  • Implementation of a change to halt the initiation of a service if its configuration file is inaccessible to the service user. This modification is designed to highlight issues when incorrect permissions lead to inconsistent or malfunctioning service behavior. The error is logged in /var/log/squirro/{service_name}/stderr.log.

  • The SquirroGPT custom widget is no longer supported by the GenAI Studio Plugin. To get the full functionality, please adjust your dashboards to use the bulit-in Chat widget.

  • The Nginx configuration for SquirroGPT Studio plugin has been moved from /etc/nginx/conf.d/frontend.conf to /etc/nginx/conf.d/studio.conf. Any other custom changes made for the Studio plugins need to be applied to the studio.conf file.

  • The topic.search.communities.exclude-results-without-matched-items project configuration has been removed. Use the widget itself to filter out communities without matched items.

  • Removed the cache_document and document_cache_expiration configurations from the NLP steps. They no longer have any effect.

  • Changed the response shape of the /parse endpoint so that it no longer wraps values in a list. It affects the usage of Query Contexts in dashboard queries. A migration script is provided to update the most common pattern used in Hybrid Search dashboards queryContext?.parse?.type?.length > 0 && queryContext?.parse?.type[0] to the new format queryContext?.parse?.type. Other uses of the Query Context in dashboard and widget queries must be manually migrated.

  • Improved performance for large synonym lists. The default synonyms list for the body and the title is no longer automatically applied to searchable labels. The searchable labels expand on their explicitly configured synonym lists.

  • Removed Result List and Cards widget from the UI. They are no longer supported. Custom Results List and custom Cards widgets are also no longer supported.

Installation and Upgrade#

For new installations, find step-by-step instructions in the install-ansible (recommended) and Installing Squirro on Linux pages.

To upgrade an existing installation, see the Upgrading Squirro page.