Squirro Cognitive Search#
Introduction#
Accessing the correct information quickly is increasingly difficult in a world filled with siloed data. End users can’t search all of their connected data sources easily, and their search results aren’t always presented in intuitive, relevant ways.
Squirro Cognitive search addresses these problems by improving upon the shortcomings of traditional enterprise search (sometimes called workplace search) using AI and Machine Learning (ML) technologies.
What Is Cognitive Search?#
Cognitive Search is a smarter, better, more personalized version of traditional enterprise search.
Its goal is to help people find the best, most relevant information across all of their connected data sources, and to present that information as quickly and intuitively as possible.
The table below outlines some of the important differences between traditional enterprise search and Squirro Cognitive Search:
Traditional Enterprise Search |
Squirro Cognitive Search |
---|---|
Limited to keyword only search. No ability to search based upon context. |
Uses semantic (vector) search combined with keyword search to provide more relevant results. |
Doesn’t seek to understand user intent. Relies solely on keyword-matching strategies. |
Seeks to understand and interpret user queries, providing more accurate and relevant results. |
Limited ability to analyze or interpret unstructured data. |
Analyzes and interprets all forms of unstructured data, including images and video transcriptions. |
Has static, limited functionalities, with no ability to self-learn. |
Learns and adapts over time, improving its accuracy and relevance the more it is used. |
Cannot be fine-tuned or heavily customized by end users. |
Highly customizable and configurable, both in terms of underlying query processing and visualizations. |
Connectivity and Scalability#
It’s important to remember that the quality of your search results is directly linked to the quality of your underlying data: the better and more comprehensive the underlying data, the better and more comprehensive the search.
Squirro Cognitive Search is built on top of the Squirro Insight Engine and its underlying technologies. At the heart of these technologies are two foundational features: connectivity and scalability.
Squirro makes it easy to connect to almost any data source and quickly scale to meet the needs of organizations with large volumes of connected data (i.e. millions of documents.)
Traditional Enterprise Search |
Squirro Cognitive Search |
---|---|
Limited ability to connect to multiple data sources. Often requires command line tools to do so. |
Built to connect to almost any type of structured or unstructured data source using an intuitive browser UI. |
Doesn’t scale easily, especially when ingesting large volumes of data. |
Scales quickly and easily using Elasticsearch. |
Security#
Squirro also provides access-control list security by way of query templates, which are appended to incoming Squirro queries to implement access-control logic. To learn more, see Query Templates.
Reference: Squirro also takes GPDR compliance seriously. See Squirro’s Privacy Policy.
Features#
Cognitive Search is not a single feature; it’s a collection of many features that Squirro Engineers, Developers, and Data Scientists are continually adding to and improving.
It’s also highly customizable and scalable, meaning that you can select and optimize the features that fit you and your organization’s needs most effectively.
The following sections provide an overview of Squirro’s basic and advanced search features.
Basic Features#
Keyword Search#
Keyword search is the first building block of Cognitive Search. When a user enters a search query in the global search bar, Squirro looks for matching terms within project documents and items.
Reference: For a detailed technical look at how Squirro’s default term matching configuration works, see How To Handle User Query Terms Correctly.
Note: Keyword search on its own has many limitations. The value of Cognitive Search is in the numerous ways Squirro enhances simple keyword search using AI and ML techniques.
Note
By default, Squirro uses a combination of keyword and semantic search when users enter search queries in the global search bar. See the Semantic Search section later on this page for more information about semantic search.
Federated Search#
Also known as “Multi-Source Search”, this is a core feature that differentiates Squirro from other Enterprise Search solutions.
Squirro connects to almost any structured or unstructured data source through its UI, with many one-click data connectors available out of the box.
When you perform a query search within Squirro Cognitive Search, you search all connected data sources, and all item types (unless you’ve chosen to exclude certain groups). This allows you to simultaneously search PDFs, excel sheets, recorded calls, videos, images, and any other connected data.
Reference: For more information, see Data Loading.
In-Document Search#
Squirro helps unlock your data by making all parts of your connected documents searchable, no matter the format.
You’ll also have the option to highlight relevant text within documents in many situations, making it easy to quickly locate relevant information. This applies not just to search queries, but also to ML-generated insights.
In the example below, a document triggered project insights relating to Bankruptcy and Equity Raised signals. The relevant text sections are highlighted within the document text.
Synonym Search#
Synonym lists can be powerful search tools, especially within projects containing many industry-specific terms or jargon.
Your project creator can upload synonym lists to define relationships that would otherwise not be obvious or easily discoverable through ML techniques. This can be a useful feature when dealing with document sets containing heavy industry-specific jargon.
Example: For a project that indexes less formal financial news and blog sites, the project creator may want to set synonyms such as ankle biter => small cap investment
, scrip issue, free share issue => bonus issue
, and Cable => USD/GBP
, as examples.
You can also instruct Squirro to selectively apply synonym lists to specific groups of documents.
Labeled Search#
Squirro applies labels to documents to help categorize them and allow for advanced searching and visualization.
Labels can be any one of the following types:
String
Integer or float
Datetime
Geo_point
Labels allow you to group, sort, and visualize documents by a predefined category such as language, aggregate document data using tables, sort and filter documents by the date and time they were either created or read, and visualize documents based upon geographic location.
Reference: For more information, see Labels.
When viewing search results, you can click the label icon in the bottom left corner of the search result to view all labels associated with that document, as shown in the example screenshot below:
Note
Labels specifically related to the applied search query will be highlighted in blue.
Combined Keyword and Labeled Search#
Combining keyword and labeled search is a powerful means of improving search precision.
You can use labels to narrow your search down to a specific subset of documents. In the example below, the project has different label categories, including Company, Country, Language, Sector, and Sources.
Example: By selecting the Company label Twitter, an end user can filter their search to only return results that contain references to the company Twitter, as shown in the following screenshot:
PDF Text Recognition#
During document ingestion from your connected data sources, Squirro can run optical character recognition (OCR) on all ingested PDF document images to make them easily searchable. This is particularly relevant for scanned PDF documents that would otherwise not be searchable.
You also have the option to force OCR even if some preexisting text is detected. The example below illustrates a situation where this becomes relevant.
Example: Some PDF documents containing text also contain tables that are not in text format. Forcing OCR will convert those tables to searchable text as well, meaning that users can search information from the global search bar. Without OCR, the tables would need to be manually reviewed, likely resulting in that data being overlooked.
Project creators can add this as a step within the default pipeline workflow in the UI Pipeline Editor as shown in the example screenshot below.
Communities#
Communities is a feature that allows end users to personalize their Squirro experience by following topics of interest to them.
Each community belongs to a Community Type, which the project creator can create, edit, and delete.
After initial configuration by the project creator, project documents associated with communities are then displayed in a feed to users.
Once users have followed a community, they’ll see options to dive into a community page for relevant results and see personalized document feeds with new community content as new documents are added to a project.
Example: In the image shown below, Food Products is a community type containing communities like Alcoholic Beverages and Bakery Products that users can follow to receive updates and prioritized search results.
Reference: For more information, see Communities.
Boolean Operators#
You can improve search results by using boolean operators to combine and filter search results using simple boolean commands within the search bar.
The primary boolean terms are AND, NOT, and OR.
Example: A search query in a typical project for the keyword Switzerland
is likely to return multiple results relating to Switzerland’s largest city Zurich. If you want to exclude results relating to Zurich, you can search for Switzerland NOT Zurich
instead.
Advanced Query Syntax#
In addition to boolean search, you can use advanced query techniques to further refine searches within the search bar.
There are several techniques available, including:
Grouping
Phrase Search
Wildcard Search
Boosting
Sorting
See Query Syntax for further details, or download the Advanced Query Syntax Cheat Sheet
shown in the image below:
Spell Checking#
Squirro provides a spell checking widget out of the box that identifies misspelled search queries. The correct spellings of those queries are then suggested as new (clickable) search terms in the spell check widget.
Clicking the suggested term will re-run the search with the correct spelling.
Typeahead#
This feature presents search suggestions to you in real time as you type within the search bar.
Depending on your project setup and specific search query, two types of Typeahead suggestions may present:
Document-generated suggestions based on document content, document labels, or associated communities.
Activity-generated suggestions based on the specific user’s saved searches and search history, or other popular queries across their project.
Reference: For more information, see Typeahead Suggestions.
Search Chips#
To help users understand how Squirro processes their search queries, Squirro converts search queries into search chips.
In the following examples, boolean operators are used to search food allergy or allergen-related content that excludes peanuts or nuts from the search results.
In the example image below, the query is ready to be submitted and no chips are visible:
After submitting the search query, the query is broken into chips, as shown in the example below:
Reference: To learn more about how to use search chips, see Search Bar Chips.
Timeline#
You can visualize project search results using the out-of-the-box Timeline widget. In most projects, this is accessed by clicking the chart icon that appears in the upper-right corner of your search results.
The feature allows you to visualize results over time and narrow your search results to specific periods.
Advanced Features#
Semantic Search#
Semantic search is a powerful feature that allows you to search for documents based on their meaning, rather than just keywords. It’s also known as vector search and is the default search method in Squirro Cognitive Search.
To illustrate the power of semantic search, see the screenshot below that contrasts the search results for the question what is the film Public Housing about
using semantic search and traditional keyword search.
Squirro’s semantic search understands the natural language meaning behind the query, and when the query is phrased in the form of a question, it highlights the answer to the question (if found within the project’s connected documents).
Internally, Squirro uses a technique called hybrid search to deliver semantic search results using Scoring Profile technology.
Reference: To learn more, see Semantic and Hybrid Search.
Concept Search#
This feature allows you to search using a text block, rather than keywords, as your input. The text block can be very large (i.e. an entire document’s worth of text.)
The concept search feature evaluates an identified text block and searches for contextually similar results. You can perform a concept search either by pasting a block of text from your clipboard into the search bar or by selecting text in a document within the Squirro UI and clicking the magnifying glass icon that appears.
The screenshot below illustrates how the concept search icon appears when a block of document text is selected.
Other People Ask#
Previously referred to as People Also Searched For, this feature provides suggestions for relevant search terms on other people’s search queries within your project.
Other People Ask goes beyond suggesting superficially similar terms, however. It looks at the meaning behind the original search term and provides semantically-related suggestions.
The feature is accessed by clicking the person icon after performing a search.
Example: In the screenshot below, the search query arsenic water
triggered three Other People Ask suggestions. It is a useful tool for understanding what others on your project are searching for and to help generate project context around a search query.
Reference: For more information, see Other People Ask.
Question Answering#
This feature presents a direct answer to a user search question separately from the search results list inside of its own widget.
Not all questions have direct answers, however. This feature will not always apply, but it will provide a direct answer to a question where possible.
The feature is accessed by clicking the dual message icon after performing a search.
Example: In the screenshot below, querying what does FSVP stand for
and clicking the Question Answering icon triggers the widget to provide an answer highlighted in the appropriate project document. Clicking the answer inside the widget will take you into the document and highlight the associated text in-document.
Reference: For more information, see Question Answering.
Relevance Tuning#
Out of the box, Squirro provides a quality baseline document relevancy scoring model (BM25). However, Squirro also offers you the ability to finetune relevancy scoring to your specific project and user needs.
Relevancy tuning can be done on the document level, or as a query enrichment. The graphic below shows the relationships between data sources, the data ingestion pipeline, the data index (Elasticsearch), and the query parsing pipeline.
Reference: For more information, see Document Relevancy.
Query Processing#
Query processing improves a user’s search experience by providing more relevant search results.
Squirro achieves this improvement by running each user query through a customizable query-processing workflow that parses, filters, enriches, and expands queries before performing the actual search and presenting the search results to the user, including the following default steps:
Language detection
Language-specific spaCy analysis
Part of speech booster and filer.
Query modifier (enrichment)
Reference: For more information, see Query Processing.
Scoring Profiles#
In Squirro Cognitive Search, Scoring Profiles and Scoring Roles are used to improve relevancy. There are two parts to the feature: Scoring Profiles and Scoring Roles.
Scoring Profiles define how additional ranking query clauses are built.
Scoring Roles define what profiles are applied based on the current user.
You can add scoring profiles and roles whenever the default scoring algorithm based solely on text matching doesn’t meet your project needs.
Reference: For more information, see How to Use Scoring Profiles to Customize Document Relevancy Scoring.
Troubleshooting#
If you encounter any issues using Squirro Cognitive Search, you can browse these docs or contact the friendly folks at Squirro Support.