Ctrl+K
Squirro Documentation - Home

Overview

  • Getting Started
  • Squirro Products
    • Chat
    • Search
    • Insight Engine

Getting Squirro

  • Installation and Upgrade
    • Install and Manage Squirro with Ansible
      • Ansible Quickstart Examples
      • Role Variables
      • Automate the Backup of Elasticsearch
      • Secure Elasticsearch with SSL/TLS
      • Leveraging NVMe SSDs for Elasticsearch speed
    • Deploying Squirro using Containers
      • Squirro Minimal Container Image
      • Container Reference
    • Cluster Expansion
    • Elasticsearch Migration
    • Installing Squirro on Linux
    • System Requirements
    • Upgrading
    • Upgrading Configuration Conflicts
    • Upgrading Earlier Versions
  • Squirro Self-Service
    • How-To Guides
      • Squirro Chat Data Guide
      • Squirro Chat Web Guide
      • How to Connect Squirro Chat to a Third-Party LLM
      • How to Connect Squirro Chat to a Microsoft Azure LLM
      • Install an Application
      • Open a Demo
      • Launch a Test Drive
      • Register for a Squirro ID
    • Cluster Roles and Permissions
    • Email Domain-Based Permissions
    • Instances
    • Upgrading an Instance
  • Release Notes
    • 3.13.2 (latest)
      • 3.13.0 Release Notes
      • 3.13.1 Release Notes
    • 3.12.1 LTS
    • 3.11.6 LTS
      • Squirro 3.11.x Release Notes
        • 3.11.5 Release Notes
        • 3.11.4 Release Notes
        • 3.11.3 Release Notes
        • 3.11.2 Release Notes
        • 3.11.1 Release Notes
        • 3.11.0 Release Notes
    • 3.10.6 LTS
      • Squirro 3.10.x Release Notes
        • 3.10.0 - Release Notes
        • 3.10.1 - Release Notes
        • 3.10.2 - Release Notes
        • 3.10.3 - Release Notes
        • 3.10.4 - Release Notes
        • 3.10.5 - Release Notes
    • 3.9.7 LTS
      • 3.9.7 LTS Release Notes
      • SquirroGPT Release Highlights
      • Squirro 3.9.x Release Notes
        • 3.9.0 Release Notes
        • 3.9.1 Release Notes
        • 3.9.2 Release Notes
        • 3.9.3 Release Notes
        • 3.9.4 Release Notes
        • 3.9.5 Release Notes
    • 3.8.7 LTS
      • 3.8.7 LTS Release Notes
      • 3.8.6 LTS Release Notes
      • Squirro 3.8.x Release Notes
        • 3.8.0 - Release Notes
        • 3.8.1 - Release Notes
        • 3.8.2 - Release Notes
        • 3.8.3 - Release Notes
        • 3.8.4 - Release Notes
        • 3.8.5 - Release Notes
    • 3.7.6 LTS
      • 3.7.6 LTS - Release Notes
      • 3.7.5 LTS Release Notes
      • Squirro 3.7.x Release Notes
        • 3.7.1 - Release Notes
        • 3.7.2 - Release Notes
        • 3.7.3 - Release Notes
        • 3.7.4 - Release Notes
    • 3.6.8 LTS
      • 3.6.8 LTS Release Notes
      • Squirro 3.6.8 LTS - Cognitive Search Improvements
      • Squirro 3.6.x Release Notes
        • 3.6.0 - Release Notes
        • 3.6.1 - Release Notes
        • 3.6.2 - Release Notes
        • 3.6.3 - Release Notes
        • 3.6.4 - Release Notes
        • 3.6.5 - Release Notes
        • 3.6.6 - Release Notes
    • 3.5.8 LTS
      • 3.5.8 LTS Release Notes
      • 3.5.8 LTS - Cognitive Search Improvements
      • 3.5.9 LTS - Release Notes
      • 3.5.x Release Notes
        • 3.5.7 Release Notes
        • 3.5.6 Release Notes
        • 3.5.5 Release Notes
        • 3.5.4 Release Notes
        • 3.5.3 Release Notes
        • 3.5.2 Release Notes
        • 3.5.1 Release Notes
        • 3.5.0 Release Notes
    • 3.4.7 LTS
      • 3.4.7 LTS Release Notes
        • 3.4.X Release Notes
      • 3.4.X Release Notes
        • 3.4.0 Release Notes
        • 3.4.1 Release Notes
        • 3.4.2 Release Notes
        • 3.4.3 Release Notes
        • 3.4.4 Release Notes
        • 3.4.5 Release Notes
        • 3.4.6 Release Notes
    • Squirro Release Process

Technical Docs

  • Administration
    • Configuration
      • Configuration Files
        • cluster.ini
        • common.ini
        • configuration.ini
        • convert.ini
        • datasource.ini
        • digestmailer.ini
        • emailsender.ini
        • filtering.ini
        • fingerprint.ini
        • frontend.ini
        • genai.ini
        • ingester.ini
        • topic.ini
        • /etc/sysconfig
      • Configuration Service
      • Default Language
      • Email Templates
      • Enable Proxy Server for Outgoing Calls
      • High Performance Configuration
      • Interpolation
      • Securing Configuration Files
      • Secure Configuration Guide
      • Securing Redis Instance over SSL
      • SSL Certificates
    • External Authentication
      • Architecture for Authentication
      • Query Templates
      • User Management in the UI
      • SAML SSO for Squirro
      • - ADFS Setup
      • - Azure Setup
      • - Google Setup
      • - Okta Setup
    • Operations
      • Accessing Servers
      • Activity Tracking
      • Business Continuity Planning
      • Cluster Status
      • Database Backup and Restore
      • Reindexing Elasticsearch
      • Elasticsearch Management
      • Fixing MySQL/MariaDB Replication
      • How Squirro Scales
      • Monitoring
      • Services
      • How to Set Up Oracle Database as Squirro’s Metadata Storage
      • Deploying the Plumber service in a Docker container
    • Troubleshooting
      • Internet Explorer Compatibility
      • Investigating Performance Issues
      • MySQL Big Data Management
      • MySQL Too Many Connections
      • Flushing Caches
      • Python Performance Profiling Using PySpy
  • Agent Framework
  • AI Guardrails
  • AI Studio
    • How-To Guides
      • Bulk Labeling
      • Create Document-Level Models
      • Create Sentence-Level Models
      • Integrating a Custom ML Classifier
      • Interact with Squirro Using Jupyter Notebook
    • Step 1: Ground Truth
    • Step 2: Models
    • Step 3: Publish and Deploy
    • Share Labeling Permissions
    • Bulk Labeling
    • ML Enrichments for Pipeline Workflows
    • Squirro Client Reference
  • APIs and SDKs
    • SquirroClient (Python SDK)
      • APIs by Topic
        • CollectionsMixin
        • CommunitiesMixin
        • CommunitySubscriptionsMixin
        • CommunityTypesMixin
        • ConfigurationMixin
        • ContributingRecordsMixin
        • DashboardsMixin
        • EmailTemplatesMixin
        • EnrichmentsMixin
        • EntitiesMixin
        • FacetsMixin
        • FileUploadMixin
        • GlobalTempMixin
        • MLCandidateSetMixin
        • MLGroundTruthMixin
        • MLModelsMixin
        • MLPublishMixin
        • MLSentenceSplitterMixin
        • MLTemplatesMixin
        • MLUserFeedbackMixin
        • MachineLearningMixin
        • NotesMixin
        • ObjectsMixin
        • PipelineSectionsMixin
        • PipelineStatusMixin
        • PipelineWorkflowMixin
        • ProjectGuideFilesMixin
        • ProjectTranslationsMixin
        • ProjectsMixin
        • SavedSearchesMixin
        • SmartAnswersMixin
        • SmartfiltersMixin
        • SourcesMixin
        • SubscriptionsMixin
        • SuggestImageMixin
        • SynonymsMixin
        • TasksMixin
        • ThemesMixin
        • TopicApiBaseMixin
        • TrendDetectionMixin
        • WidgetsAndAssetsMixin
      • DocumentUploader Class
      • GenAI service API
        • GenAIApiMixin
      • Installation
      • ItemUploader Class
      • Setup Class
      • SquirroClient Tutorial
      • User Management
        • UserApiMixin
    • Squirro Toolbox
      • Bulk Exporter
      • Install In a Python Environment
      • squirro_asset CLI Reference
      • Toolbox Differences Between macOS / Linux / Windows
    • Authentication
    • Catalyst Data Model
    • Common Headers
    • Common Status Codes
    • Data Modeling
    • Item Format
    • JavaScript SDK
    • Plugin Repository
    • Custom Sections API
  • Architecture
    • Architecture Diagrams
    • General Concepts
  • Collections
  • Communities
    • Augmentation
    • Auto-Subscription
    • Communities Tutorial
    • CSV and Excel Formatting for Upload
    • Managing Communities
    • Setting Up Community Types
    • How To Set Up Using KEE
    • Visualizing Communities
  • Dashboards
    • Dashboard Editor
    • Dashboard Store
    • Dashboard Types
      • Community 360 Dashboard
      • Explore Dashboard
      • Global Search Dashboard
      • Layout Master Dashboard
    • Layers
    • Mobile Dashboard Editor
    • Permissions
    • Query Parameterization
    • Upgrading Dashboards Manually
    • Visibility Conditions
  • Data Loading
    • How-To Guides
      • CLI Tool Tutorial
      • Creating a Custom Connector
      • Creating a Custom Plugin
      • Labels Tutorial
      • UI Data Loading Guide
    • Command Line Tool
    • Connectors
      • Built-In Data Connectors
        • CSV Connector
        • Documents Connector
        • Excel Connector
        • Squirro Connector
        • Web Crawler Connector
      • Cloud Connectors
        • Dropbox Connector
        • Google Drive Connector
        • Microsoft One Drive Connector
      • Enterprise Connectors
        • Atlassian Confluence Connector
        • Atlassian Jira Connector
        • Google Gmail Connector
        • Microsoft Exchange Connector
        • Microsoft Sharepoint Connector
        • Salesforce Sales Cloud Connector
        • Salesforce Service Cloud Connector
        • Webz.io Connector
    • Data Types
    • Date Format
    • Data Loader Reference
    • Format Strings
    • Labels
    • Plugins
      • API for Caching and Custom State Management
      • Boilerplate
      • Configuration
      • DataSource Class Reference
      • Dependencies
      • Example Data Loader Plugin
      • Incremental Loading
      • Preview
      • Templates
    • Troubleshooting
    • UI Data Loader
  • Data Processing Pipeline
    • Built-In Steps
      • Load
      • Enrich Steps
        • Unshorten Link
        • Duplicate Detection
        • Content Augmentation
        • Content Extraction
        • PDF OCR
        • Noise Removal
        • Near Duplicate Detection
        • Thumbnail Extraction
        • Language Detection
        • PDF Conversion
      • Relate
      • Discover
      • Classify
      • Predict
      • Recommend
      • Automate
      • Index Steps
        • Content Standardization
        • Cache Cleaning
        • Indexing
        • Search Tagging and Alerting
      • Flow
    • Custom Steps
      • Pipelets (Custom Step)
      • Squirro Scripts
    • Indexing Common Formats
    • Content Streamers
    • Pipeline Editor
    • Pipeline Overview
    • Prioritization
    • Processing Errors
    • Reset Project
    • Reruns
    • Trend Detection
  • Data Virtualization
  • Integrations
    • Atlassian (Confluence & Jira)
    • Embedding Dashboards
    • Microsoft Dynamics, Sharepoint, and Office
    • Newsletters
    • OpenSearch API
    • PegaSystems - Pega
    • Outlook
    • Qlik Sense
    • Salesforce
    • ServiceNow Insights
    • Tableau
    • Installation Guides
      • Squirro for Salesforce Installation
      • Salesforce Federated Search Installation
      • Microsoft Outlook Installation
  • Known Entity Extraction
    • Command Line Tool
    • Command Line Tool Tutorial
    • Configuration
    • Studio Plugin
    • Studio Plugin Tutorial
    • Testing
    • Tokenizers and Filters
  • libNLP
    • Base Types
    • How to Create Custom Query-Processing Steps
    • Query Processing
      • CustomSpacyNormalizer
      • IntentDetector
      • LanguageDetection
      • LemmaExpander
      • POSBooster
      • QueryClassifier
      • QueryModifier
      • QuerySyntaxParser
      • QueryZeroShotClassifier
      • SemanticTermExpander
    • Step Types
      • Standard Types
      • Analyzers Package
        • make_analyzer
        • Analyzer
        • ProximityValidation
      • Classifiers Package
        • make_classifier
        • AutoMLClassifier
        • BERTSentiment
        • Classifier
        • CosineSimilarityClassifier
        • FastTextClassifier
        • KFoldValidation
        • SklearnClassifier
        • SmartfilterClassifier
        • VaderSentiment
      • Debuggers Package
        • make_debugger
        • Debugger
        • LogFieldsDebugger
      • Embedders Package
        • make_embedder
        • BowEmbedder
        • Embedder
        • SklearnTFIDFEmbedder
        • TermsExtractionEmbedder
        • TransformerEmbedder
      • External Package
        • make_step
        • EndpointStep
        • MlflowMaasEndpointStep
        • RemoteQuestionAnswering
        • RemoteSpacy
      • Filters Package
        • make_filter
        • AggregateFilter
        • BlacklistFilter
        • CoalesceFilter
        • CopyFilter
        • DocJoinFilter
        • DocSplitFilter
        • EmptyFilter
        • Filter
        • JoinFilter
        • MergeFilter
        • ProximityFilter
        • RegexFilter
        • SplitFilter
        • SquirroEntityFilter
        • ThresholdFilter
        • VoteFilter
        • WhitelistFilter
      • Flow Package
        • make_flow_step
        • ConditionStep
      • Loaders Package
        • make_loader
        • CSVLoader
        • FileLoader
        • JSONLoader
        • Loader
        • SquirroGroundtruthLoader
        • SquirroItemLoader
        • SquirroQueryLoader
      • Normalizers Package
        • make_normalizer
        • CharacterNormalizer
        • EmailParseNormalizer
        • HTMLNormalizer
        • LowercaseNormalizer
        • Normalizer
        • PunctuationNormalizer
        • SentimentTermNormalizer
        • SpacyNormalizer
        • StopwordsNormalizer
      • Projectors Package
        • make_projector
        • Projector
        • SklearnProjector
      • Runtimes Package
        • make_runtime_step
        • ONNXRuntime
        • Runtime
      • Savers Package
        • make_saver
        • CSVSaver
        • JSONSaver
        • Saver
        • SquirroEntityCleaner
        • SquirroItemSaver
      • Tokenizers Package
        • make_tokenizer
        • HtmlTokenizer
        • PdfSentencesTokenizer
        • SentencesNLTKTokenizer
        • SpacesTokenizer
        • Tokenizer
    • Troubleshooting & FAQ
    • Utils
      • Standard Types
      • Cache Package
        • get_model_cache_client
        • Cache
        • CacheWithExpiration
        • ExpiringCache
        • MemoryCache
        • ModelCacheClient
      • Model Management Package
        • InMemModel
        • ModelNotInMemoryException
      • Transformers Package
        • get_transformer
        • Huggingface_Transformers
        • SentenceTransformer
        • Tensorflow_Transformer
        • Transformer
  • Machine Learning
    • Example MLFlow Model
    • How To Publish ML Models Using the Squirro Client
    • Model-as-a-Service
    • Significant Terms Extraction
  • Pipelets
    • How-To Guides
      • Pipelets Tutorial
      • How to Access File Contents in Pipelets
      • How to Use Pipelets With the Squirro Data Loader
    • Additional Labeling Pipelet
    • Development Workflow
    • Pipelets and the Dataloader
    • Pipelets Reference
    • Rerunning a Pipelet
    • Scaling Pipelet Execution
    • Time Formats Pipelet
    • Troubleshooting and FAQ
    • Writing Pipelets
  • Privacy Layer
  • Project Templates
    • Configuration
    • Export and Import
    • Overview
  • Retrieval Augmented Generation
  • Search
    • Features
      • Concept Search
      • Other People Ask
      • Popular Query Suggestions
      • Question Answering
      • Query Syntax
      • Search Bar Chips
      • Semantic and Hybrid Search
      • Spellchecking
      • Typeahead Suggestions
      • Favorite Filters
    • How-To Guides
      • Best-Bets Labels
      • Cognitive Search Set Up
      • Configure Other People Ask
      • Configure Question Answering
      • Configure Scoring Profiles for Document Relevancy
      • Handle Query Terms
    • Language Support
    • Document Relevancy
      • Scoring Profiles and Roles
      • Scoring Plugins
        • Recommend
        • Rerank
        • Retrieve
        • View
    • Query Processing
    • Recommendations
    • Searchable File Types
    • Synonyms
  • Synaptica
  • User Interface
    • App and Nav Bar Styling
    • Connecting to Squirro
    • Note Taking
    • Overview
    • Project Browser
    • Reference Screen
    • Roles & Permissions
    • Spaces
    • Squirro Monitoring
    • Studio
      • ML Workflows
      • ML Jobs
    • Themes
  • Widgets
    • Built-In Widgets
      • Action Widget
      • Bar Chart Widget
      • Batch Q&A
      • Breadcrumbs Widget
      • Chat Widget
      • Communities Widget
      • Communities Banner Widget
      • Divider Widget
      • Engagement Map Widget
      • Entities Widget
      • Favorites Widget
      • Heat Map Widget
      • HTML Editor Widget
      • iFrame Widget
      • Item Detail Widget
      • Items Table Widget
      • Items Widget
      • Labels Widget
      • Line Chart Widget
      • Metrics Widget
      • Navigation Chips Widget
      • Pie Chart Widget
      • QA Widget
      • Region Map Widget
      • Reset Filters Widget
      • Rich Text Editor Widget
      • Search Bar Widget
      • Spelling Correction Widget
      • Table Widget
      • Tabs Widget
      • Time Selection Widget
      • Timeline Widget
      • Word Cloud Widget
      • World Map Widget
    • Custom Widgets
    • How to Create a Custom Widget
    • Managing Widgets in the UI
    • Squirro Widget SDK
      • Core Widgets
      • Properties
      • Factories
      • Utils
    • React Custom Widgets
      • How-Tos and Walkthroughs
        • How To Opt In To React Widgets
        • How To Style React Widgets
        • Items Widget Styling Walkthrough
        • Adding Tooltips to Tabs Widget Walkthrough
      • Available React Hooks
      • Available Libraries
      • Creating Custom Widgets
      • Customizing Widgets
      • Getting Started with React
      • Squirro Storybook
      • Styling React Components
    • Backbone Custom Widgets
      • Custom Widgets Life Cycle
      • Custom Widgets for Dashboards
      • Squirro Widget CLI Reference
      • Custom Widgets and the Dashboard Store
      • Example Card or Result List Custom Widget
      • Tutorials
        • Getting Started
        • Accessing Data
      • Common Knowledge
        • FAQ
        • Debugging
        • Models, Views, & Collections
        • JavaScript
  • Workspaces

Engage

  • Forum
  • Support
  • Redefining AI Podcast

Other

  • Security
    • Security Advisories
      • CVE-2021-27945 - Cross-Site Scripting
      • CVE-2021-44228 - log4j Security Vulnerability
  • Squirro Profiles
    • System Administrator/DevOps Engineer
    • Project Creator
    • Data Scientist
    • Model Creator
    • Search Engineer
    • Python Engineer
    • Frontend Developer
    • Cognitive Search User
  • Squirro A to Z
    • Squirro Glossary
    • Glossary of Industry Terms
    • Index
  • squirro.com
  • Squirro Academy
  • Terms & Conditions

Search

Contents

  • Introduction
  • What Is Cognitive Search?
  • Connectivity and Scalability
  • Security
  • Features
    • Basic Features
      • Keyword Search
      • Federated Search
      • In-Document Search
      • Synonym Search
      • Labeled Search
      • Combined Keyword and Labeled Search
      • PDF Text Recognition
      • Communities
      • Boolean Operators
      • Advanced Query Syntax
      • Spell Checking
      • Typeahead
      • Search Chips
      • Timeline
    • Advanced Features
      • Semantic Search
      • Concept Search
      • Other People Ask
      • Question Answering
      • Relevance Tuning
        • Query Processing
        • Scoring Profiles
  • Troubleshooting

Search#

Introduction#

Accessing the correct information quickly is increasingly difficult in a world filled with siloed data. End users can’t search all of their connected data sources easily, and their search results aren’t always presented in intuitive, relevant ways.

Squirro Cognitive search addresses these problems by improving upon the shortcomings of traditional enterprise search (sometimes called workplace search) using AI and Machine Learning (ML) technologies.

What Is Cognitive Search?#

Cognitive Search is a smarter, better, more personalized version of traditional enterprise search.

Its goal is to help people find the best, most relevant information across all of their connected data sources, and to present that information as quickly and intuitively as possible.

The table below outlines some of the important differences between traditional enterprise search and Squirro Cognitive Search:

Traditional Enterprise Search

Squirro Cognitive Search

Limited to keyword only search. No ability to search based upon context.

Uses semantic (vector) search combined with keyword search to provide more relevant results.

Doesn’t seek to understand user intent. Relies solely on keyword-matching strategies.

Seeks to understand and interpret user queries, providing more accurate and relevant results.

Limited ability to analyze or interpret unstructured data.

Analyzes and interprets all forms of unstructured data, including images and video transcriptions.

Has static, limited functionalities, with no ability to self-learn.

Learns and adapts over time, improving its accuracy and relevance the more it is used.

Cannot be fine-tuned or heavily customized by end users.

Highly customizable and configurable, both in terms of underlying query processing and visualizations.

Connectivity and Scalability#

It’s important to remember that the quality of your search results is directly linked to the quality of your underlying data: the better and more comprehensive the underlying data, the better and more comprehensive the search.

Squirro Cognitive Search is built on top of the Squirro Insight Engine and its underlying technologies. At the heart of these technologies are two foundational features: connectivity and scalability.

Squirro makes it easy to connect to almost any data source and quickly scale to meet the needs of organizations with large volumes of connected data (i.e. millions of documents.)

Traditional Enterprise Search

Squirro Cognitive Search

Limited ability to connect to multiple data sources. Often requires command line tools to do so.

Built to connect to almost any type of structured or unstructured data source using an intuitive browser UI.

Doesn’t scale easily, especially when ingesting large volumes of data.

Scales quickly and easily using Elasticsearch.

Security#

Squirro is fully ISO 27001 compliant and certified. The ISO 27001 standard ensures your information assets are stored and handled securely.

The Squirro team trains regularly on security and safety issues, using a renowned third-party provider, Cobalt.io, to perform application penetration tests to keep Squirro safe and secure.

For more information, see the ISO/IEC 27001 Information Security Management Standard.
iso

Squirro’s official certificate is available upon request.

Squirro also provides access-control list security by way of query templates, which are appended to incoming Squirro queries to implement access-control logic. To learn more, see Query Templates.

Reference: Squirro also takes GPDR compliance seriously. See Squirro’s Privacy Policy.

Features#

Cognitive Search is not a single feature; it’s a collection of many features that Squirro Engineers, Developers, and Data Scientists are continually adding to and improving.

It’s also highly customizable and scalable, meaning that you can select and optimize the features that fit you and your organization’s needs most effectively.

The following sections provide an overview of Squirro’s basic and advanced search features.

Basic Features#

Keyword Search#

Keyword search is the first building block of Cognitive Search. When a user enters a search query in the global search bar, Squirro looks for matching terms within project documents and items.

Keyword search example

Reference: For a detailed technical look at how Squirro’s default term matching configuration works, see How To Handle User Query Terms Correctly.

Note: Keyword search on its own has many limitations. The value of Cognitive Search is in the numerous ways Squirro enhances simple keyword search using AI and ML techniques.

Note

By default, Squirro uses a combination of keyword and semantic search when users enter search queries in the global search bar. See the Semantic Search section later on this page for more information about semantic search.

Federated Search#

Also known as “Multi-Source Search”, this is a core feature that differentiates Squirro from other Enterprise Search solutions.

Squirro connects to almost any structured or unstructured data source through its UI, with many one-click data connectors available out of the box.

Keyword search example

When you perform a query search within Squirro Cognitive Search, you search all connected data sources, and all item types (unless you’ve chosen to exclude certain groups). This allows you to simultaneously search PDFs, excel sheets, recorded calls, videos, images, and any other connected data.

Reference: For more information, see Data Loading.

In-Document Search#

Squirro helps unlock your data by making all parts of your connected documents searchable, no matter the format.

You’ll also have the option to highlight relevant text within documents in many situations, making it easy to quickly locate relevant information. This applies not just to search queries, but also to ML-generated insights.

In the example below, a document triggered project insights relating to Bankruptcy and Equity Raised signals. The relevant text sections are highlighted within the document text.

Insight highlighting example

Synonym Search#

Synonym lists can be powerful search tools, especially within projects containing many industry-specific terms or jargon.

Your project creator can upload synonym lists to define relationships that would otherwise not be obvious or easily discoverable through ML techniques. This can be a useful feature when dealing with document sets containing heavy industry-specific jargon.

Example: For a project that indexes less formal financial news and blog sites, the project creator may want to set synonyms such as ankle biter => small cap investment, scrip issue, free share issue => bonus issue, and Cable => USD/GBP, as examples.

You can also instruct Squirro to selectively apply synonym lists to specific groups of documents.

Labeled Search#

Squirro applies labels to documents to help categorize them and allow for advanced searching and visualization.

Labels can be any one of the following types:

  • String

  • Integer or float

  • Datetime

  • Geo_point

Labels allow you to group, sort, and visualize documents by a predefined category such as language, aggregate document data using tables, sort and filter documents by the date and time they were either created or read, and visualize documents based upon geographic location.

Reference: For more information, see Labels.

When viewing search results, you can click the label icon in the bottom left corner of the search result to view all labels associated with that document, as shown in the example screenshot below:

Document label popover

Note

Labels specifically related to the applied search query will be highlighted in blue.

Combined Keyword and Labeled Search#

Combining keyword and labeled search is a powerful means of improving search precision.

You can use labels to narrow your search down to a specific subset of documents. In the example below, the project has different label categories, including Company, Country, Language, Sector, and Sources.

Example: By selecting the Company label Twitter, an end user can filter their search to only return results that contain references to the company Twitter, as shown in the following screenshot:

https://s3.amazonaws.com/download.squirro.net/docs/technical/products/label-search.png

PDF Text Recognition#

During document ingestion from your connected data sources, Squirro can run optical character recognition (OCR) on all ingested PDF document images to make them easily searchable. This is particularly relevant for scanned PDF documents that would otherwise not be searchable.

You also have the option to force OCR even if some preexisting text is detected. The example below illustrates a situation where this becomes relevant.

Example: Some PDF documents containing text also contain tables that are not in text format. Forcing OCR will convert those tables to searchable text as well, meaning that users can search information from the global search bar. Without OCR, the tables would need to be manually reviewed, likely resulting in that data being overlooked.

Project creators can add this as a step within the default pipeline workflow in the UI Pipeline Editor as shown in the example screenshot below.

Adding PDF OCR to the Squirro default pipeline workflow

Communities#

Communities is a feature that allows end users to personalize their Squirro experience by following topics of interest to them.

Each community belongs to a Community Type, which the project creator can create, edit, and delete.

After initial configuration by the project creator, project documents associated with communities are then displayed in a feed to users.

Once users have followed a community, they’ll see options to dive into a community page for relevant results and see personalized document feeds with new community content as new documents are added to a project.

Example: In the image shown below, Food Products is a community type containing communities like Alcoholic Beverages and Bakery Products that users can follow to receive updates and prioritized search results.

https://s3.amazonaws.com/download.squirro.net/docs/technical/communities/communities.png

Reference: For more information, see Communities.

Boolean Operators#

You can improve search results by using boolean operators to combine and filter search results using simple boolean commands within the search bar.

The primary boolean terms are AND, NOT, and OR.

Example: A search query in a typical project for the keyword Switzerland is likely to return multiple results relating to Switzerland’s largest city Zurich. If you want to exclude results relating to Zurich, you can search for Switzerland NOT Zurich instead.

Advanced Query Syntax#

In addition to boolean search, you can use advanced query techniques to further refine searches within the search bar.

There are several techniques available, including:

  • Grouping

  • Phrase Search

  • Wildcard Search

  • Boosting

  • Sorting

See Query Syntax for further details, or download the Advanced Query Syntax Cheat Sheet shown in the image below:

Image of the Advanced Query Syntax Cheat Sheet

Spell Checking#

Squirro provides a spell checking widget out of the box that identifies misspelled search queries. The correct spellings of those queries are then suggested as new (clickable) search terms in the spell check widget.

Clicking the suggested term will re-run the search with the correct spelling.

Example spell check return of the proper spelling of the word salmonella.

Typeahead#

This feature presents search suggestions to you in real time as you type within the search bar.

Example search-as-you-type examples showing multiple different types of suggestions.

Depending on your project setup and specific search query, two types of Typeahead suggestions may present:

  • Document-generated suggestions based on document content, document labels, or associated communities.

  • Activity-generated suggestions based on the specific user’s saved searches and search history, or other popular queries across their project.

Reference: For more information, see Typeahead Suggestions.

Search Chips#

To help users understand how Squirro processes their search queries, Squirro converts search queries into search chips.

In the following examples, boolean operators are used to search food allergy or allergen-related content that excludes peanuts or nuts from the search results.

In the example image below, the query is ready to be submitted and no chips are visible:

Search query no chips

After submitting the search query, the query is broken into chips, as shown in the example below:

Search query with chips

Reference: To learn more about how to use search chips, see Search Bar Chips.

Timeline#

You can visualize project search results using the out-of-the-box Timeline widget. In most projects, this is accessed by clicking the chart icon that appears in the upper-right corner of your search results.

The feature allows you to visualize results over time and narrow your search results to specific periods.

Viewing results by time using the Timeline widget

Advanced Features#

Semantic Search#

Semantic search is a powerful feature that allows you to search for documents based on their meaning, rather than just keywords. It’s also known as vector search and is the default search method in Squirro Cognitive Search.

To illustrate the power of semantic search, see the screenshot below that contrasts the search results for the question what is the film Public Housing about using semantic search and traditional keyword search.

Example Query

Squirro’s semantic search understands the natural language meaning behind the query, and when the query is phrased in the form of a question, it highlights the answer to the question (if found within the project’s connected documents).

Internally, Squirro uses a technique called hybrid search to deliver semantic search results using Scoring Profile technology.

Reference: To learn more, see Semantic and Hybrid Search.

Concept Search#

This feature allows you to search using a text block, rather than keywords, as your input. The text block can be very large (i.e. an entire document’s worth of text.)

The concept search feature evaluates an identified text block and searches for contextually similar results. You can perform a concept search either by pasting a block of text from your clipboard into the search bar or by selecting text in a document within the Squirro UI and clicking the magnifying glass icon that appears.

The screenshot below illustrates how the concept search icon appears when a block of document text is selected.

Concept search icon as it appears when a text block is selected.

Other People Ask#

Previously referred to as People Also Searched For, this feature provides suggestions for relevant search terms on other people’s search queries within your project.

Other People Ask goes beyond suggesting superficially similar terms, however. It looks at the meaning behind the original search term and provides semantically-related suggestions.

The feature is accessed by clicking the person icon after performing a search.

Example: In the screenshot below, the search query arsenic water triggered three Other People Ask suggestions. It is a useful tool for understanding what others on your project are searching for and to help generate project context around a search query.

Other People Ask example query

Reference: For more information, see Other People Ask.

Question Answering#

This feature presents a direct answer to a user search question separately from the search results list inside of its own widget.

Not all questions have direct answers, however. This feature will not always apply, but it will provide a direct answer to a question where possible.

The feature is accessed by clicking the dual message icon after performing a search.

Example: In the screenshot below, querying what does FSVP stand for and clicking the Question Answering icon triggers the widget to provide an answer highlighted in the appropriate project document. Clicking the answer inside the widget will take you into the document and highlight the associated text in-document.

Question Answering example query

Reference: For more information, see Question Answering.

Relevance Tuning#

Out of the box, Squirro provides a quality baseline document relevancy scoring model (BM25). However, Squirro also offers you the ability to finetune relevancy scoring to your specific project and user needs.

Relevancy tuning can be done on the document level, or as a query enrichment. The graphic below shows the relationships between data sources, the data ingestion pipeline, the data index (Elasticsearch), and the query parsing pipeline.

Data sources, data ingestion pipeline, and query parsing pipeline

Reference: For more information, see Document Relevancy.

Query Processing#

Query processing improves a user’s search experience by providing more relevant search results.

Squirro achieves this improvement by running each user query through a customizable query-processing workflow that parses, filters, enriches, and expands queries before performing the actual search and presenting the search results to the user, including the following default steps:

  • Language detection

  • Language-specific spaCy analysis

  • Part of speech booster and filer.

  • Query modifier (enrichment)

Reference: For more information, see Query Processing.

Scoring Profiles#

In Squirro Cognitive Search, Scoring Profiles and Scoring Roles are used to improve relevancy. There are two parts to the feature: Scoring Profiles and Scoring Roles.

  • Scoring Profiles define how additional ranking query clauses are built.

  • Scoring Roles define what profiles are applied based on the current user.

You can add scoring profiles and roles whenever the default scoring algorithm based solely on text matching doesn’t meet your project needs.

Reference: For more information, see How to Use Scoring Profiles to Customize Document Relevancy Scoring.

Troubleshooting#

If you encounter any issues using Squirro Cognitive Search, you can browse these docs or contact the friendly folks at Squirro Support.

Contents
  • Introduction
  • What Is Cognitive Search?
  • Connectivity and Scalability
  • Security
  • Features
    • Basic Features
      • Keyword Search
      • Federated Search
      • In-Document Search
      • Synonym Search
      • Labeled Search
      • Combined Keyword and Labeled Search
      • PDF Text Recognition
      • Communities
      • Boolean Operators
      • Advanced Query Syntax
      • Spell Checking
      • Typeahead
      • Search Chips
      • Timeline
    • Advanced Features
      • Semantic Search
      • Concept Search
      • Other People Ask
      • Question Answering
      • Relevance Tuning
        • Query Processing
        • Scoring Profiles
  • Troubleshooting

By Squirro AG

© Copyright 2025, Squirro AG.

Last updated on Jan 15, 2025.