3.10.6 (Latest LTS Release)#
Squirro 3.10.6 was released on July 26, 2024.
Learn more about the Squirro Release Process.
Caution
This release includes breaking changes. See the Breaking Changes section at the end of this page to learn more.
What’s New#
New agent framework for agent-driven LLM endpoints.
Redesigned Community List Widget for improved usability.
SquirroGPT project templates for new projects with OpenAI and Mixtral LLMs are accessible from the sidebar.
New filter widget for My Communities added to enhance user navigation.
New special chip for the favorite filter in GlobalSearch that allows editing the chip and expanding the saved query.
The pipeline step
Thumbnail Extraction
has a new config option for setting the format of the generated thumbnail image. It supportsJPEG
andPNG
formats. The default format has changed fromPNG
toJPEG
.Improved MUI theme customization.
SquirroGPT suggests relevant documents when it cannot find a proper answer.
Added the possibility to inject extra information during query processing, allowing implementation of custom query processing steps that add data to the query context then consumed by the studio plugins.
Upgrade to Python 3.11. All Python-based microservices now run on Python 3.11, and all Squirro-related Python dependencies are located in the
/opt/squirro/virtualenv311/
virtual environment (can be activated usingsquirro_activate
as before). Please note that Python 3.11 support applies to RHEL 8 (and derivatives) deployments.Introduced a special chip feature for users to easily identify their subscribed communities.
Added a configuration option to use the
--no-dependencies
flag withpip
when installing packages from arequirements.txt
file for Squirro Assets.Added a new
compute_statistics
parameter to the Labels API, allowing specification of which label should be used for Search Intent. If set toTrue
, the label is included in the lookup index.Introduced a new Time Selection Widget in React.
New baseline intent detector that supports automatic drilling down the query by filtering on known facets that match a detected entity, and performing date-time filtering based on natural language dates.
Added a paragraph-level tagging, allowing keywords to be extended from the paragraph level.
(beta) Aggregations on facets.
Pipeline steps executed in
batch
execution mode now feature atimeout
configuration option. It enforces a limit (in seconds), after which a timeout error appears.Add an endpoint for acquiring user session information with a caching mechanism.
Restrict access to sessions flagged as deleted.
Add a response checker chain to flag if the generated answer from the GenAI service is satisfactory.
Add a new chain for keyword-based query formulation.
Add a new toggle to switch between editing a model with JsonForms or with the ACE editor in the server and project configuration menus (only available for configurations with a defined schema).
The
plumber
service can now be deployed as a Docker container, see the Deploying the Plumber service in a Docker container page.Added option to the Communities List widget to hide communities without matching items.
Replaced Jaegar tooling with OpenTelemetry.
Added a redo OCR configuration option to the PDF OCR pipeline step.
Scoring Plugins can now inject Elastic DSL rescore query clauses.
Improvements#
AI#
Improved SquirroGPT retrieval, reducing the occurrences of irrelevant document sources.
Added the context length to the
sqgpt
configuration. Users can manually set the context length sent to the LLM. This overwrites the context length estimation and may lead to issues if wrongly set.Improved context length guessing by adding the newest GPT 4 models and updating legacy models.
The embedding models process switches to a lazy-loading technique. This change is part of the GenAI embedding models deprecation process, but it still ensures functionality for older projects using the
genai
service retriever.Introduced the
SQGPTConfig
schema to the configuration service for easier setting and validation. The service can now interpolate values from.ini
files in sections or keys with dots (for example,${sqgpt\.settings.api_key}
forapi_key
under thesqgpt.settings
section).The answer is initially highlighted and scrolled to when a user opens the item detail.
The answer matched in the item detail sidebar is marked as answer in the title.
Added the automatic context length determination to the summarization endpoint.
GenAI Squirro Retriever now only uses
query_context
to run queries, for better copilot results. It deprecates the use of thequery
parameter.Add caching to fetch user grants for SquirroGPT, which can lead to a 30% increase in performance if a project is affected by grant leakage.
Exclude the
semantic_chunks
field from the payload sent to the ML service.Reworked the SquirroGPT project and server-level configurations. The
genai.sqgpt.default-settings
configuration was introduced at the server level to set defaults for all the existing SquirroGPT projects. These values override the values of thegenai.sqgpt.settings
configuration of each project. The values at the project-level configuration that are set tonull
or toDefault to Server Setting
fallback to the server-level configuration.Included a temporal anchor in the system prompt (UTC now).
Passed the
search_mode
parameter to the underlying LangChain component.Removed citing
parent_document_id
for inline citations.Enforced strict contextual following in user prompts instead of system messages.
The Wikipedia tool is disabled by default.
Add the client
kwargs
tosquirro_retriever
. It enables parsing information such asuser_information
,auth_service
,auth_user
to the squirro client in thesquirro_retriever
in thegenai
service.Batch Q&A new feedback mechanism, improved question editing, ability to delete the first question, filtering capabilities, adjusted the regenerate summary tooltip, summary cleared when the clear all option is selected, new export modal when sources are enabled, added the export of source links.
Gracefully handle empty retriever resultsets in the
genai
service by returning a hint message.Automatic mapping of
/var/lib/squirro/plugins
and/etc/squirro/genai.d/deployments
into thegenai
container.
Search#
Facets are highlighted in
ItemCard
andLabelsWidget
.Typeahead suggestions redesign.
Add autocomplete for facet values.
Enabled the query auto-submit after the chips deletion for Global Search and the search widget.
Add the item title typeahead suggestions which opens
ItemDetail
with the item directly from global search.Ability to parse the new editable chips into separated chips.
Auto-focus the modal search input in the item detail.
Use the byte embeddings type by default for both the Paragraph Embedding pipeline step and the Semantic Search Scoring plugin. The change is backward compatible, so the projects already using semantic search use the float type by default.
Indicate the active search in
ItemDetail
.Add information about the server used for the GPU vs CPU experiment described in the Semantic and Hybrid Search page.
Reset the search query when changing items in the item detail.
Convert the whole query to a favorite chip after creating it from the global search.
The semantic search can now be applied only to a subset of documents that match specific phrases, improving precision.
The behavior of the favorite query chips has been improved.
The community names and pictures are now displayed in the search and global search widgets when the community is selected in the My Communities widget.
Migrated
query_context
tofilter_query
.Added a custom SpacyNormalizer NER rule to create a
PRODUCT
entity tag based on aNOUN
followed by aNUM
.Enhanced
QueryContext
to support providingIntentLookup
withinParsedQuery
(response of query-processing). The backend applies a list of active intents as defined by the calling function, which can be modified by the user via the dashboard, allowing intents to beenabled
or the application type changed viaapply_as: boost / filter
.Added the
exact_knn
plugin to perform exact embedding based paragraph search. This plugin can be set within the query-syntax, or enabled as the default scoring profile at at the project level.The prefix_match plugin now has a
min_token_length
option to avoid using too short tokens for prefix matching.Added the
semantic_rescore
plugin. Use it, for exmaple, to apply embedding-based rescoring on the top 200 matching paragraphs (per shard), use the according plugin likeprofile:{semantic_rescore window_size:200}
.
Platform#
Multiple improvements to the
Tabs
widget.Improved security
Mixtral project templates updated to include the Cognitive Search dashboards as well as the Chat with Results and Chat with Document widgets.
Support the
Retry Failed Items
andRerun from Raw Data
features for the QFSS system.Item Detail widget rework.
Optimized loading of the entities and PDFs.
Upgraded several Python dependencies to the newer versions.
Updated the step to offer options for compressing converted PDF documents containing images, reducing file sizes and ensuring linearization of converted PDFs.
End of support for the Gluster shared file system.
Various improvements to Squirro Ansible, including support for
build_mode
, control for cache settings incommon.ini
, and support for hourly and daily snapshots.Easily track the pipeline steps using the
rerun_job_id
label in the Data Ingestion Logs dashboard of the Squirro Monitoring project. Find the ID of the job triggered in thedatasource
service logs at theINFO
level (for example, using the following commandtail -f /var/log/squirro/datasource.log | grep 'rerun_from_index'
).The Index pipeline step now offers options to index Squirro items or associated paragraphs. Rerun from index now includes sub-items by default, controlled by the
datasource.rerun.index.include-sub-items
setting. Override this in the Python SDK with theinclude_sub_items
parameter in the rerun_pipeline_workflow method.Squirro Ansible ensures consistent tooling by installing Python virtualenv, Elasticsearch client, requests, and some JSON packages on storage-only nodes.
Improved Squirro Ansible documentation.
Enhanced Redis integration as the caching backend for the
@cached
decorator, now supporting prefixes and namespaces, auto-expiring keys, and asynchronous cache insertion. Enhanced efficiency of individual backend requests by enabling multiple caches by default, complete with a proper cache invalidation strategy. These caches pertain to authentication handling, project metadata retrieval (including facets data), and user information (including project permissions).Optimized memory usage and increased speed during project export that includes items.
Added overrides and documentation for the
CommunityItem
widget.Query context and paragraph id/field mapping have been added to the
genai
service.Upgraded the
redis-py
client from version 4.3.4 to 5.0.2.Created the RPM packages for the NLP services required by the semantic search projects, allowing the NLP services to run on the Squirro node server for fast semantic search proof of value (PoV).
The configuration options for the
rerun_from_index
queue were split from thequeues_local
section into their own section calledqueues_rerun_from_index
.Added support for all the tab tooltips in the
tabs
widget.Added the
query_context
parameter to the/starter-questions
endpoint of the GenAI Studio plugin, which allows filtering and limiting the set of Squirro Items used to generate the questions.Improved editable chip targetability.
The
squirro_lxml_html_extractor
option in thecommon.ini
file no longer has any effect and has been removed from the packagedcommon.ini
file. Thelxml
parser is now used by default.Re-enable project translations for the widget title in the React widgets.
The
ingester
service can be configured without any processors and solely serve its web API. Previously, it would exit immediately with a vague message when no processors were configured.In addition to the
timeout
option, theText Chunking
step now exposes an option to specify the max item body size. This prevents the step from processing items with a very large body.Removed the
ingester.priorities.enabled
server configuration option and made it the only way of operation foringester
. The number of processors for each priority can still be adjusted by theingester.priorities.pool-*-processors
server configuration options.The configuration option
processors
in theingester.ini
file is now deprecated and is planned for removal in an upcoming release.An active dashboard store is not cleared anymore when the global search is cleared.
Respected open state in the item detail side panel icon.
Added an action button to open the list of all favorites from the create favorite modal success snackbar.
Added support for Unicode filenames in the
X-Filename
HTTP header. The filenames are now properly encoded using the percent-encoding standard to ensure compatibility with various web browsers and HTTP clients.Parse SSO information from the proxy to the
genai
service.Do not pass
None
values as a query to the query transformers, but instead pass an empty string.When building the sparse query, the active query processor workflow is used if available. If not, it defaults to the
query_default
query processor workflow.Added the configuration option labels widget to disable the dropdown menu when empty.
The new expiration handling of user sessions is now enabled by default (
security.expiring-sessions
).Expose an option to specify what item fields to export when exporting a Squirro project.
Remove card padding on the Community List widget when the card styling is turned off in the configuration.
Added clamping to the Item Detail title.
Expose the GET endpoint of
/sessions/<session_id>
in SquirroClient viaget_session()
.Added the Grid mode to the Items widget.
Added the search button control to the Item Detail toolbar.
Marked split view icon in the Item Detail toolbar active if it is open.
Overlay the side panel in the Item Detail on small screens (900px wide).
Make tooltips consistent with the item cards in the Item Detail.
Improve the performance for retrieving communities.
Updated the icon fonts to the latest version and changed all icons to the rounded type.
GenAI now persist code change in the container when restarted. However, these changes are overwritten once a new image is used. The volume is mounted on
/var/lib/squirro/docker/volumes
.Improved performance when closing a modal.
Changed the MySQL driver from the PyMySQL to the MariaDB connector for improved performance.
Include
nltk punkt
andspacy
model dependencies in the docker image.Disable previous and next buttons in Item Detail for the first and last items.
Remove
entity_query
from the item model for improved performance.Improved the grid spacing for the Items widget for improved layout.
NLTK
stopwords
andpunkt
packages are now baked into the GenAI and Plumber Docker images, helping with on-premise deployments without an internet access. Thegenai`
service image now usesamazonlinux
as a base image for improved security.Studio plugins now run in a separate process, avoiding blocking the frontend service with long-running plugin tasks.
Changed the default value for boosting popular items from the top 100 to the top 30.
Enabled the lazy loading technique for the Show all labels mode for the
LabelsWidget
.Added an option to hide empty placeholder images in the non-grid modes for
ItemWidget
.Limited the crawler plugin to load 1,000 pages from a website, preventing overloading while indexing large sites. This limit can be adjusted in the data source configuration.
Added a new
response_format
option to the API for running ML workflows. This option allows changing the response format. The supported formats includestandard
(current format with values wrapped in a list) andplain
(response without wrapping).Enhanced the Activity data loader plugin to support an
Activity file prefix
. By default, it only loads activities from files matching the patternactivity.<date>.jsonl
. Setting a prefix allows loading files matching the<prefix>activity.<date>.jsonl
pattern.Added health status for the studio service when doing
squirro_status
.Added shortcuts to reference sections on cards. Clicking goes directly to the subsection inside the split view.
Improved items widget card sizing in grid layout to fill the free space.
Reintroduced document-level label feedback in the new Item Detail design.
When Items widget keywords are manually selected, the system keeps the order from the widget config and does not sort them alphabetically.
Removed the
overwrite_files
option from all the default buckets in the/etc/squirro/storage.ini
file. The option is still available but disabled by default.Added an option for the Confluence connector to only load pages tagged with specific labels. Set default
Body format
toview
to properly display code blocks and tags.Upgraded
xlrd
to the latest version (2.0.1).
Breaking Changes#
AutoML models trained in Squirro versions prior 3.10.4 cannot be loaded in versions 3.10.4 and later.
Removal of the task that previously deleted duplicate items each time project data was fetched. Consequently, the following parameters in the
topic.ini
configuration file are now obsolete:enable_duplicate_item_deletion
,duplicate_item_deletion_shard_size
, andduplicate_item_deletion_size
.Implementation of a change to halt the initiation of a service if its configuration file is inaccessible to the service user. This modification is designed to highlight issues when incorrect permissions lead to inconsistent or malfunctioning service behavior. The error is logged in
/var/log/squirro/{service_name}/stderr.log
.The SquirroGPT custom widget is no longer supported by the GenAI Studio Plugin. To get the full functionality, please adjust your dashboards to use the bulit-in
Chat
widget.The Nginx configuration for SquirroGPT Studio plugin has been moved from
/etc/nginx/conf.d/frontend.conf
to/etc/nginx/conf.d/studio.conf
. Any other custom changes made for the Studio plugins need to be applied to thestudio.conf
file.The
topic.search.communities.exclude-results-without-matched-items
project configuration has been removed. Use the widget itself to filter out communities without matched items.Removed the
cache_document
anddocument_cache_expiration
configurations from the NLP steps. They no longer have any effect.Changed the response shape of the
/parse
endpoint so that it no longer wraps values in a list. It affects the usage of Query Contexts in dashboard queries. A migration script is provided to update the most common pattern used in Hybrid Search dashboardsqueryContext?.parse?.type?.length > 0 && queryContext?.parse?.type[0]
to the new formatqueryContext?.parse?.type
. Other uses of the Query Context in dashboard and widget queries must be manually migrated.Improved performance for large synonym lists. The
default synonyms list
for the body and the title is no longer automatically applied to searchable labels. The searchable labels expand on their explicitly configured synonym lists.Removed Result List and Cards widget from the UI. They are no longer supported. Custom Results List and custom Cards widgets are also no longer supported.
Installation and Upgrade#
For new installations, find step-by-step instructions in the install-ansible (recommended) and Installing Squirro on Linux pages.
To upgrade an existing installation, see the Upgrading Squirro page.