System Requirements#

This page lists the requirements your environment must meet before installing Squirro. The requirements apply to all deployment types: on-premises and cloud.

The recommended installation method is Ansible, which automates the setup of most components. For installation instructions, see the Install and Manage Squirro with Ansible page.

Hardware#

System sizing requirements for disk space, CPU, and disk I/O are determined by the projected data volume and the number of concurrent users.

For detailed sizing guidance tailored to your deployment, contact Squirro Support and submit a technical support request.

Minimum requirements#

The following minimum requirements apply to proof-of-concept (PoC) and development installations:

Processor

Intel Core i5 with 4 cores (x86_64) or higher, with AVX2 extensions.

Memory

16 GB or higher.

Storage

100 GB or higher, with a minimum read/write speed of 100 MB/s.

Shared Storage#

In a multi-node cluster deployment, Squirro requires a shared storage mounted on all nodes. The default mount path is /var/lib/squirro/storage. Single-node deployments use local storage and do not require a shared storage.

The volume of data stored depends on your workload:

  • Binary document ingestion

    Office and PDF files indexed by the platform are stored in full, along with associated assets. Storage requirements grow in proportion to the volume of binary documents ingested.

  • Pipeline plugins (pipelets)

    Custom pipeline plugins deployed to the cluster are stored and replicated across nodes. Storage requirements for pipelets are typically low.

  • Web interface file uploads

    Excel and CSV files uploaded via the data loader are temporarily stored during ingestion. Storage requirements remain low as the files are removed after processing.

  • Data ingestion

    Batch processing queues are written to the shared storage during all data ingestion, including structured data. Storage requirements remain low when no binary documents are ingested.

  • NLP pipeline workloads

    When NLP pipeline steps are active, machine learning model data is cached on the shared storage. Storage requirements depend on the models deployed.

For guidance on technology selection and performance sizing for your environment, visit the Squirro Support website and submit a technical support request.

Linux#

Use locally attached SSD or NVMe storage for the Elasticsearch data directory. Network-attached storage with high latency (for example, software-defined storage or NAS solutions) causes significant performance degradation for Elasticsearch. Cloud block storage such as AWS EBS (gp3) or Azure Managed Disks is acceptable.

Operating System#

Squirro runs on Linux. RPM packages are provided for Red Hat Enterprise Linux (RHEL) and its open-source derivative Rocky Linux.

Distribution

Versions

Architecture

Red Hat Enterprise Linux

8 and 9

x86_64

Rocky Linux

8 and 9

x86_64

RHEL installations require an active Red Hat subscription so that Yum can install packages from the default distribution repository. Installation also requires credentials to access the Squirro YUM repository at mirror.squirro.net. Contact Squirro Support to obtain your credentials before starting the installation.

Java#

Squirro requires Java 21. The search engine (Elasticsearch or OpenSearch) and ZooKeeper both run on the JVM, so the java-21-openjdk package must be installed on every node before the Squirro packages are installed. When using the Ansible role, Java is installed automatically.

Network Requirements#

Firewall ports#

Open the following TCP ports on all servers before installation.

Storage nodes#

TCP port

Usage

Open for

9200

Search engine access (Elasticsearch or OpenSearch).

All storage and Squirro nodes. Restrict to cluster nodes only.

9300

Search engine cluster communication (Elasticsearch or OpenSearch).

All storage nodes. Restrict to cluster nodes only.

Squirro nodes#

TCP port

Usage

Open for

80

Web access (HTTP, redirects to HTTPS).

All Squirro nodes. Do not expose to clients in production. Nginx redirects HTTP to HTTPS.

111

Distributed file system.

All Squirro nodes.

443

Web access (HTTPS).

All clients.

2181

ZooKeeper.

All Squirro nodes (multi-node setups only).

2409

Distributed file system.

All Squirro nodes (multi-node setups only).

2888

ZooKeeper node discovery.

All Squirro nodes (multi-node setups only).

3306

MySQL/MariaDB.

All Squirro nodes.

5432

PostgreSQL (when used instead of MariaDB).

All Squirro nodes.

3888

ZooKeeper node discovery.

All Squirro nodes (multi-node setups only).

6379

Redis (storage).

All Squirro nodes. Restrict to cluster nodes only.

6380

Redis (cache).

All Squirro nodes. Restrict to cluster nodes only.

24007

Network file system.

All Squirro nodes (multi-node setups only).

25

SMTP (email notifications, outbound).

SMTP server.

587

SMTP with STARTTLS (email notifications, outbound).

SMTP server.

Port numbers may vary depending on the cloud provider or additional software components. Contact Squirro Support for guidance on custom deployments.

Elasticsearch (ports 9200 and 9300) and Redis (ports 6379 and 6380) must be reachable only from within the Squirro cluster. Restrict these ports at the network or security-group level and do not expose them to client or public networks. Squirro configures password authentication for both Redis instances, but network isolation is the primary defense layer.

Outbound internet access#

The server requires outbound internet access on TCP ports 80 and 443 to reach the following domains during installation and upgrade:

# Operating system packages (Rocky Linux)
https://*.rockylinux.org

# Operating system packages (Red Hat Enterprise Linux)
https://*.redhat.com

# Squirro RPM packages
https://mirror.squirro.net

# Squirro container images
https://registry.squirro.com

# Ansible roles and modules
https://galaxy.ansible.com
https://ansible-galaxy-ng.s3.dualstack.us-east-1.amazonaws.com

# Python packages
https://pypi.org
https://bootstrap.pypa.io
https://files.pythonhosted.org

# Source code and dependencies
https://github.com
https://*.github.com

For air-gapped environments that cannot reach the internet, offline installation packages are available from the Squirro mirror. For more information, see the Install and Manage Squirro with Ansible page.

Database#

Squirro uses MariaDB (MySQL-compatible) as its default relational database. The Ansible installation role installs and configures MariaDB automatically.

PostgreSQL is supported as an alternative to MariaDB. For setup instructions, see the Configure PostgreSQL as the Database Backend page. For version-specific compatibility questions, visit the Squirro Support website and submit a technical support request.

Search Engine#

Squirro uses Elasticsearch 9.x as its default search and indexing engine. The Ansible installation role installs and configures Elasticsearch automatically.

OpenSearch is supported as a technical preview alternative. Technical preview means the feature is functional but not yet recommended for production deployments. Support coverage is limited, and behavior may change in future releases.

Elasticsearch requires AVX2 CPU extensions. Verify that your hardware meets this requirement before installation. The processor specifications in the Hardware section above already account for that.

Elasticsearch allocates a JVM heap from available system memory. For optimal performance, plan for at least 50% of RAM to be available to the search engine process, up to a maximum heap of 32 GB. On a 64 GB server, that leaves sufficient memory for the remaining Squirro services.

Container Runtime#

Some Squirro services run inside containers. The following services are provided as containers only:

  • GenAI.

The following services can optionally run as containers:

  • Plumber.

  • Transformers.

The squirro-python311-squirro.service.genai RPM package does not declare an explicit dependency on a container runtime, allowing you to choose any OCI-compatible runtime such as Docker or Podman. If you install Squirro using the Ansible role, Docker is installed and configured automatically. For manual installations, install a supported OCI runtime before installing the GenAI package.

GenAI and LLM Access#

GenAI features require access to a large language model (LLM). The server must be able to reach the chosen LLM API endpoint over HTTPS (port 443).

For the full list of supported providers, compatibility requirements, and model recommendations, see the Large Language Model Support page.

Identity Provider#

Single sign-on (SSO) is recommended for all production deployments. Squirro supports SAML 2.0 for identity federation. Plan the SSO integration with your IT team before deployment. The Squirro implementation team can assist with identity provider configuration.

For deployments where SSO is not available, Squirro provides local username and password authentication as a fallback.

DNS#

Squirro requires a DNS hostname for production deployments. Plan with your IT team to create a DNS record pointing to the server IP address or load balancer before deployment. That hostname is also used for the SSL/TLS certificate. DNS provisioning can take time depending on your organization’s internal processes. Initiate the request at least two to three weeks before your target installation date.

SSL/TLS Certificates#

Squirro uses Nginx as its reverse proxy and requires an SSL/TLS certificate for HTTPS access. Obtain a certificate signed by a certificate authority (CA) trusted by your organization before deployment. A self-signed certificate is installed by default and is suitable for initial testing only. Certificate issuance through an internal CA or a public CA can take time. Initiate the request at least two to three weeks before your target installation date.

Squirro communicates with Elasticsearch over port 9200. TLS for that traffic, as well as inter-node TLS in multi-node setups, requires additional certificates. For configuration details, see the Secure Elasticsearch with SSL/TLS page.

Shared Filesystem#

Multi-node deployments require a shared filesystem accessible from all nodes. Squirro uses it to store assets that must be consistent across the cluster. Supported options include NFS, Amazon EFS, and SMB.

Important

The shared filesystem must be mounted and accessible before the first Ansible run. That configuration cannot be changed after installation without a full reinstall.

Single-node deployments do not require a shared filesystem.

Email#

Squirro sends email notifications for digest reports and alerts. An SMTP server accessible from the Squirro node is required. Before deployment, confirm with your IT team that the Squirro node can reach the SMTP server on the required port (typically port 25 or 587). The default configuration expects a local SMTP relay on port 25.

Deployment Topology#

A single-node deployment is suitable for uptime SLAs up to 99.9%. For higher SLA targets, a multi-node topology with managed database and cache services is required.

Squirro supports two node roles:

  • Storage node

    Stores indexed data using Elasticsearch. Add storage nodes to increase capacity.

  • Cluster node

    Handles user requests and data processing. Add cluster nodes to increase throughput.

In a single-node setup, both roles run on the same server. In a multi-node setup, Squirro recommends running each role on dedicated servers for more flexible scaling.

For cloud deployments, Squirro recommends a minimum of three instances for production use: one dedicated storage node running Elasticsearch, and two cluster nodes handling user requests and data processing. That configuration provides a baseline for redundancy at the application layer.

Client Browsers#

Squirro runs in a web browser. The following browsers are supported:

Browser

Version

Microsoft Edge

Latest version

Google Chrome

Latest version

Mozilla Firefox

Latest version

Apple Safari

Latest version

Squirro does not support older browser versions.