3.10.6 LTS#
Squirro 3.10.6 was released on July 26, 2024.
Learn more about the Squirro Release Process.
Caution
This release includes breaking changes. See the Breaking Changes section at the end of this page to learn more.
What’s New#
New agent framework for agent-driven LLM endpoints.
Redesigned Community List Widget for improved usability.
SquirroGPT project templates for new projects with OpenAI and Mixtral LLMs are accessible from the sidebar.
New filter widget for My Communities added to enhance user navigation.
New special chip for the favorite filter in GlobalSearch that allows editing the chip and expanding the saved query.
The pipeline step
Thumbnail Extractionhas a new config option for setting the format of the generated thumbnail image. It supportsJPEGandPNGformats. The default format has changed fromPNGtoJPEG.Improved MUI theme customization.
SquirroGPT suggests relevant documents when it cannot find a proper answer.
Added the possibility to inject extra information during query processing, allowing implementation of custom query processing steps that add data to the query context then consumed by the studio plugins.
Upgrade to Python 3.11. All Python-based microservices now run on Python 3.11, and all Squirro-related Python dependencies are located in the
/opt/squirro/virtualenv311/virtual environment (can be activated usingsquirro_activateas before). Please note that Python 3.11 support applies to RHEL 8 (and derivatives) deployments.Introduced a special chip feature for users to easily identify their subscribed communities.
Added a configuration option to use the
--no-dependenciesflag withpipwhen installing packages from arequirements.txtfile for Squirro Assets.Added a new
compute_statisticsparameter to the Labels API, allowing specification of which label should be used for Search Intent. If set toTrue, the label is included in the lookup index.Introduced a new Time Selection Widget in React.
New baseline intent detector that supports automatic drilling down the query by filtering on known facets that match a detected entity, and performing date-time filtering based on natural language dates.
Added a paragraph-level tagging, allowing keywords to be extended from the paragraph level.
(beta) Aggregations on facets.
Pipeline steps executed in
batchexecution mode now feature atimeoutconfiguration option. It enforces a limit (in seconds), after which a timeout error appears.Add an endpoint for acquiring user session information with a caching mechanism.
Restrict access to sessions flagged as deleted.
Add a response checker chain to flag if the generated answer from the GenAI service is satisfactory.
Add a new chain for keyword-based query formulation.
Add a new toggle to switch between editing a model with JsonForms or with the ACE editor in the server and project configuration menus (only available for configurations with a defined schema).
The
plumberservice can now be deployed as a Docker container, see the Deploying the Plumber service in a Docker container page.Added option to the Communities List widget to hide communities without matching items.
Replaced Jaegar tooling with OpenTelemetry.
Added a redo OCR configuration option to the PDF OCR pipeline step.
Scoring Plugins can now inject Elastic DSL rescore query clauses.
Improvements#
AI#
Improved SquirroGPT retrieval, reducing the occurrences of irrelevant document sources.
Added the context length to the
sqgptconfiguration. Users can manually set the context length sent to the LLM. This overwrites the context length estimation and may lead to issues if wrongly set.Improved context length guessing by adding the newest GPT 4 models and updating legacy models.
The embedding models process switches to a lazy-loading technique. This change is part of the GenAI embedding models deprecation process, but it still ensures functionality for older projects using the
genaiservice retriever.Introduced the
SQGPTConfigschema to the configuration service for easier setting and validation. The service can now interpolate values from.inifiles in sections or keys with dots (for example,${sqgpt\.settings.api_key}forapi_keyunder thesqgpt.settingssection).The answer is initially highlighted and scrolled to when a user opens the item detail.
The answer matched in the item detail sidebar is marked as answer in the title.
Added the automatic context length determination to the summarization endpoint.
GenAI Squirro Retriever now only uses
query_contextto run queries, for better copilot results. It deprecates the use of thequeryparameter.Add caching to fetch user grants for SquirroGPT, which can lead to a 30% increase in performance if a project is affected by grant leakage.
Exclude the
semantic_chunksfield from the payload sent to the ML service.Reworked the SquirroGPT project and server-level configurations. The
genai.sqgpt.default-settingsconfiguration was introduced at the server level to set defaults for all the existing SquirroGPT projects. These values override the values of thegenai.sqgpt.settingsconfiguration of each project. The values at the project-level configuration that are set tonullor toDefault to Server Settingfallback to the server-level configuration.Included a temporal anchor in the system prompt (UTC now).
Passed the
search_modeparameter to the underlying LangChain component.Removed citing
parent_document_idfor inline citations.Enforced strict contextual following in user prompts instead of system messages.
The Wikipedia tool is disabled by default.
Add the client
kwargstosquirro_retriever. It enables parsing information such asuser_information,auth_service,auth_userto the squirro client in thesquirro_retrieverin thegenaiservice.Batch Q&A new feedback mechanism, improved question editing, ability to delete the first question, filtering capabilities, adjusted the regenerate summary tooltip, summary cleared when the clear all option is selected, new export modal when sources are enabled, added the export of source links.
Gracefully handle empty retriever resultsets in the
genaiservice by returning a hint message.Automatic mapping of
/var/lib/squirro/pluginsand/etc/squirro/genai.d/deploymentsinto thegenaicontainer.
Search#
Facets are highlighted in
ItemCardandLabelsWidget.Typeahead suggestions redesign.
Add autocomplete for facet values.
Enabled the query auto-submit after the chips deletion for Global Search and the search widget.
Add the item title typeahead suggestions which opens
ItemDetailwith the item directly from global search.Ability to parse the new editable chips into separated chips.
Auto-focus the modal search input in the item detail.
Use the byte embeddings type by default for both the Paragraph Embedding pipeline step and the Semantic Search Scoring plugin. The change is backward compatible, so the projects already using semantic search use the float type by default.
Indicate the active search in
ItemDetail.Add information about the server used for the GPU vs CPU experiment described in the Semantic and Hybrid Search page.
Reset the search query when changing items in the item detail.
Convert the whole query to a favorite chip after creating it from the global search.
The semantic search can now be applied only to a subset of documents that match specific phrases, improving precision.
The behavior of the favorite query chips has been improved.
The community names and pictures are now displayed in the search and global search widgets when the community is selected in the My Communities widget.
Migrated
query_contexttofilter_query.Added a custom SpacyNormalizer NER rule to create a
PRODUCTentity tag based on aNOUNfollowed by aNUM.Enhanced
QueryContextto support providingIntentLookupwithinParsedQuery(response of query-processing). The backend applies a list of active intents as defined by the calling function, which can be modified by the user via the dashboard, allowing intents to beenabledor the application type changed viaapply_as: boost / filter.Added the
exact_knnplugin to perform exact embedding based paragraph search. This plugin can be set within the query-syntax, or enabled as the default scoring profile at at the project level.The prefix_match plugin now has a
min_token_lengthoption to avoid using too short tokens for prefix matching.Added the
semantic_rescoreplugin. Use it, for exmaple, to apply embedding-based rescoring on the top 200 matching paragraphs (per shard), use the according plugin likeprofile:{semantic_rescore window_size:200}.
Platform#
Multiple improvements to the
Tabswidget.Improved security
Mixtral project templates updated to include the Cognitive Search dashboards as well as the Chat with Results and Chat with Document widgets.
Support the
Retry Failed ItemsandRerun from Raw Datafeatures for the QFSS system.Item Detail widget rework.
Optimized loading of the entities and PDFs.
Upgraded several Python dependencies to the newer versions.
Updated the step to offer options for compressing converted PDF documents containing images, reducing file sizes and ensuring linearization of converted PDFs.
End of support for the Gluster shared file system.
Various improvements to Squirro Ansible, including support for
build_mode, control for cache settings incommon.ini, and support for hourly and daily snapshots.Easily track the pipeline steps using the
rerun_job_idlabel in the Data Ingestion Logs dashboard of the Squirro Monitoring project. Find the ID of the job triggered in thedatasourceservice logs at theINFOlevel (for example, using the following commandtail -f /var/log/squirro/datasource.log | grep 'rerun_from_index').The Index pipeline step now offers options to index Squirro items or associated paragraphs. Rerun from index now includes sub-items by default, controlled by the
datasource.rerun.index.include-sub-itemssetting. Override this in the Python SDK with theinclude_sub_itemsparameter in the rerun_pipeline_workflow method.Squirro Ansible ensures consistent tooling by installing Python virtualenv, Elasticsearch client, requests, and some JSON packages on storage-only nodes.
Improved Squirro Ansible documentation.
Enhanced Redis integration as the caching backend for the
@cacheddecorator, now supporting prefixes and namespaces, auto-expiring keys, and asynchronous cache insertion. Enhanced efficiency of individual backend requests by enabling multiple caches by default, complete with a proper cache invalidation strategy. These caches pertain to authentication handling, project metadata retrieval (including facets data), and user information (including project permissions).Optimized memory usage and increased speed during project export that includes items.
Added overrides and documentation for the
CommunityItemwidget.Query context and paragraph id/field mapping have been added to the
genaiservice.Upgraded the
redis-pyclient from version 4.3.4 to 5.0.2.Created the RPM packages for the NLP services required by the semantic search projects, allowing the NLP services to run on the Squirro node server for fast semantic search proof of value (PoV).
The configuration options for the
rerun_from_indexqueue were split from thequeues_localsection into their own section calledqueues_rerun_from_index.Added support for all the tab tooltips in the
tabswidget.Added the
query_contextparameter to the/starter-questionsendpoint of the GenAI Studio plugin, which allows filtering and limiting the set of Squirro Items used to generate the questions.Improved editable chip targetability.
The
squirro_lxml_html_extractoroption in thecommon.inifile no longer has any effect and has been removed from the packagedcommon.inifile. Thelxmlparser is now used by default.Re-enable project translations for the widget title in the React widgets.
The
ingesterservice can be configured without any processors and solely serve its web API. Previously, it would exit immediately with a vague message when no processors were configured.In addition to the
timeoutoption, theText Chunkingstep now exposes an option to specify the max item body size. This prevents the step from processing items with a very large body.Removed the
ingester.priorities.enabledserver configuration option and made it the only way of operation foringester. The number of processors for each priority can still be adjusted by theingester.priorities.pool-*-processorsserver configuration options.The configuration option
processorsin theingester.inifile is now deprecated and is planned for removal in an upcoming release.An active dashboard store is not cleared anymore when the global search is cleared.
Respected open state in the item detail side panel icon.
Added an action button to open the list of all favorites from the create favorite modal success snackbar.
Added support for Unicode filenames in the
X-FilenameHTTP header. The filenames are now properly encoded using the percent-encoding standard to ensure compatibility with various web browsers and HTTP clients.Parse SSO information from the proxy to the
genaiservice.Do not pass
Nonevalues as a query to the query transformers, but instead pass an empty string.When building the sparse query, the active query processor workflow is used if available. If not, it defaults to the
query_defaultquery processor workflow.Added the configuration option labels widget to disable the dropdown menu when empty.
The new expiration handling of user sessions is now enabled by default (
security.expiring-sessions).Expose an option to specify what item fields to export when exporting a Squirro project.
Remove card padding on the Community List widget when the card styling is turned off in the configuration.
Added clamping to the Item Detail title.
Expose the GET endpoint of
/sessions/<session_id>in SquirroClient viaget_session().Added the Grid mode to the Items widget.
Added the search button control to the Item Detail toolbar.
Marked split view icon in the Item Detail toolbar active if it is open.
Overlay the side panel in the Item Detail on small screens (900px wide).
Make tooltips consistent with the item cards in the Item Detail.
Improve the performance for retrieving communities.
Updated the icon fonts to the latest version and changed all icons to the rounded type.
GenAI now persist code change in the container when restarted. However, these changes are overwritten once a new image is used. The volume is mounted on
/var/lib/squirro/docker/volumes.Improved performance when closing a modal.
Changed the MySQL driver from the PyMySQL to the MariaDB connector for improved performance.
Include
nltk punktandspacymodel dependencies in the docker image.Disable previous and next buttons in Item Detail for the first and last items.
Remove
entity_queryfrom the item model for improved performance.Improved the grid spacing for the Items widget for improved layout.
NLTK
stopwordsandpunktpackages are now baked into the GenAI and Plumber Docker images, helping with on-premise deployments without an internet access. Thegenai`service image now usesamazonlinuxas a base image for improved security.Studio plugins now run in a separate process, avoiding blocking the frontend service with long-running plugin tasks.
Changed the default value for boosting popular items from the top 100 to the top 30.
Enabled the lazy loading technique for the Show all labels mode for the
LabelsWidget.Added an option to hide empty placeholder images in the non-grid modes for
ItemWidget.Limited the crawler plugin to load 1,000 pages from a website, preventing overloading while indexing large sites. This limit can be adjusted in the data source configuration.
Added a new
response_formatoption to the API for running ML workflows. This option allows changing the response format. The supported formats includestandard(current format with values wrapped in a list) andplain(response without wrapping).Enhanced the Activity data loader plugin to support an
Activity file prefix. By default, it only loads activities from files matching the patternactivity.<date>.jsonl. Setting a prefix allows loading files matching the<prefix>activity.<date>.jsonlpattern.Added health status for the studio service when doing
squirro_status.Added shortcuts to reference sections on cards. Clicking goes directly to the subsection inside the split view.
Improved items widget card sizing in grid layout to fill the free space.
Reintroduced document-level label feedback in the new Item Detail design.
When Items widget keywords are manually selected, the system keeps the order from the widget config and does not sort them alphabetically.
Removed the
overwrite_filesoption from all the default buckets in the/etc/squirro/storage.inifile. The option is still available but disabled by default.Added an option for the Confluence connector to only load pages tagged with specific labels. Set default
Body formattoviewto properly display code blocks and tags.Upgraded
xlrdto the latest version (2.0.1).
Breaking Changes#
AutoML models trained in Squirro versions prior 3.10.4 cannot be loaded in versions 3.10.4 and later.
Removal of the task that previously deleted duplicate items each time project data was fetched. Consequently, the following parameters in the
topic.iniconfiguration file are now obsolete:enable_duplicate_item_deletion,duplicate_item_deletion_shard_size, andduplicate_item_deletion_size.Implementation of a change to halt the initiation of a service if its configuration file is inaccessible to the service user. This modification is designed to highlight issues when incorrect permissions lead to inconsistent or malfunctioning service behavior. The error is logged in
/var/log/squirro/{service_name}/stderr.log.The SquirroGPT custom widget is no longer supported by the GenAI Studio Plugin. To get the full functionality, please adjust your dashboards to use the bulit-in
Chatwidget.The Nginx configuration for SquirroGPT Studio plugin has been moved from
/etc/nginx/conf.d/frontend.confto/etc/nginx/conf.d/studio.conf. Any other custom changes made for the Studio plugins need to be applied to thestudio.conffile.The
topic.search.communities.exclude-results-without-matched-itemsproject configuration has been removed. Use the widget itself to filter out communities without matched items.Removed the
cache_documentanddocument_cache_expirationconfigurations from the NLP steps. They no longer have any effect.Changed the response shape of the
/parseendpoint so that it no longer wraps values in a list. It affects the usage of Query Contexts in dashboard queries. A migration script is provided to update the most common pattern used in Hybrid Search dashboardsqueryContext?.parse?.type?.length > 0 && queryContext?.parse?.type[0]to the new formatqueryContext?.parse?.type. Other uses of the Query Context in dashboard and widget queries must be manually migrated.Improved performance for large synonym lists. The
default synonyms listfor the body and the title is no longer automatically applied to searchable labels. The searchable labels expand on their explicitly configured synonym lists.Removed Result List and Cards widget from the UI. They are no longer supported. Custom Results List and custom Cards widgets are also no longer supported.
Installation and Upgrade#
For new installations, find step-by-step instructions in the install-ansible (recommended) and Installing Squirro on Linux pages.
To upgrade an existing installation, see the Upgrading Squirro page.