Installing Squirro on Linux#

Warning

We don’t recommend you install Squirro manually. Instead use our role for Ansible. See Install and Manage Squirro with Ansible for more information.

Architecture Overview#

A Squirro cluster can contain anywhere from just one to dozens of servers. Adding additional servers to a Squirro cluster always serves two different needs: performance and capacity. As these needs do not necessarily scale in the same way, Squirro differentiates two different types of servers:

  • Storage Nodes: stores the data that is indexed into Squirro. Add more storage nodes to scale capacity.

  • Cluster Nodes: answers requests by users and handles the processing of items that come in. Add more cluster nodes to improve performance.

Storage and cluster nodes can be installed on the same server, and for a single-node setup that is the recommended approach. But if you intend to scale to more than just one server, it is recommended to only install one of the two node types on a server. That makes better use of the available performance and allows for more flexible scaling.

Prerequisites#

This page assumes that a few prerequisites have been met. This includes asystem with internet connectivity and some relevant ports opened in the firewall.

If some of these conditions can not be met, please contact Squirro Support.

Linux Setup#

The installation of the base system is not covered on this page. A fully functional RHEL or CentOS installation is assumed to be set up. The supported versions of these Linux distributions are documented on the System Requirements page.

Networking#

The Linux system needs internet access to download the Squirro packages. Additionally, the system should be accessible from your internal network, so that the Squirro interface can be accessed by administrators and users.

Information on how to set up networking is provided by Red Hat in their Official Networking Guide.

Firewall#

Incoming traffic to Squirro servers should be open on several TCP ports. The individual ports and the required access level are documented below:

Storage Nodes#

TCP Port

Usage

Open For

9200

Elasticsearch access

All storage and Squirro nodes

9300

Elasticsearch replication

All storage nodes

Squirro Nodes#

TCP Port

Usage

Open For

80

Web access

All Squirro nodes and optionally for all clients if HTTPS access (Port 443) is not desired

111

Distributed file system

All Squirro nodes

443

Web access (SSL-protected)

All clients

2181

Zookeper

All Squirro nodes

2409

Distributed file system

All Squirro nodes (in multi-node setups)

2888

Zookeper node discovery

All Squirro nodes (in multi-node setups)

3306

MySQL

All Squirro nodes

3888

Zookeper node discovery

All Squirro nodes (in multi-node setups)

6379

Redis (storage)

All Squirro nodes

6380

Redis (cache)

All Squirro nodes

24007

Network file system

All Squirro nodes (in multi-node setups)

Note

The list and the port numbers reported above may vary depending on the cloud provider or the software components running on top of the core layers. For example, you have the flexibility to implement a distributed file system according to your specific requirements and preferences. For assistance, contact Squirro Support and open a technical support request to receive guidance from experts.

Users#

Squirro provides packages that set up all the Linux users required on a system. Those packages are used in the instructions below.

However, in some environments, users must not be created by packages. For these cases, users have to be manually created up-front. See Linux Users for Squirro for a detailed list of users that need to be set up.

YUM Repositories#

Squirro packages are provided with a Yum repository. Yum is a utility, provided with the RHEL / CentOS Linux system, that downloads and installs packages from central repositories. To configure a new repository, it needs to be set up using a file in the folder /etc/yum.repos.d.

Getting the Right Version of Squirro#

For production deployments, Squirro recommends using the latest LTS release. This can be obtained by using the 3.2-lts in the baseurl of the yum repos config (see below) for the latest LTS release of the 3.2 family. This will ensure that a simple yum update on the server will not update the server to the latest bi-weekly release.

Also note that, when Squirro releases the next LTS release of 3.3-lts, even then it has to be a conscious choice to point the link in the baseurl to the LTS release of the 3.3 family. The previous 3.2-lts link will not get updated to point to the LTS release of 3.3 family.

Squirro#

Create the file /etc/yum.repos.d/squirro.repo. The exact content of the file will be provided to you by Squirro Support when delivering your license.

Use the following examples, but note that the username and password are not filled in:

CentOS 7#

/etc/yum.repos.d/squirro.repo#
[squirro-stable]
name=Squirro - CentOS 7 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/7/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300

CentOS 8#

/etc/yum.repos.d/squirro.repo#
[squirro-stable]
name=Squirro - CentOS 8 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/8/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300

RHEL 7#

/etc/yum.repos.d/squirro.repo#
[squirro-stable]
name=Squirro - Red Hat Enterprise Linux 7 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/7/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300

RHEL 8#

/etc/yum.repos.d/squirro.repo#
[squirro-stable]
name=Squirro - Red Hat Enterprise Linux 8 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/8/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300

Storage Node Installation#

Installation of the storage node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Furthermore, we also need to explicitly install a few dependencies. Use the following commands for the installation:

Java#

sudo su
yum install java-1.8.0-openjdk

Squirro Storage Node#

sudo su
yum install squirro-storage-node-users
yum install elasticsearch
yum install squirro-storage-node

Network Connectivity#

If you are setting up a dedicated storage node, instead of mixing cluster and storage node on the same server, then you need to change the Elasticsearch configuration so it listens on a network IP address.

To do this, edit /etc/elasticsearch/elasticsearch.yml to add the server’s IP address to the network.host setting.

When doing this, you also need to declare the discovery.seed_hosts. To set up a cluster, see Cluster Expansion - the following value will work only for the single-node cluster case.

/etc/elasticsearch/elasticseach.yml#
network.host: <storage node IP>,127.0.0.1
discovery.seed_hosts: ["127.0.0.1"]

Cluster Node Installation#

Installation of the Squirro cluster node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Furthermore, we also need to explicitly install a few dependencies. Use the following commands for the installation:

Java#

Squirro depends on JRE, which is provided in the Squirro Yum repository itself. But this package has to be installed explicitly as none of the Squirro packages declare an explicit dependency on JRE. This is done to provide more flexibility in making Squirro work for various custom deployments where we do not control the version of Java installed on the server. Please execute the yum command below to install JRE:

yum install java-1.8.0-openjdk

Running MySQL and Redis Remotely#

You can choose to run MySql server and Redis servers remotely, i.e. not residing on the Squirro Cluster Node if you go through the trouble of setting up MySql and Redis Server installations with a specific configuration.

To set up Squirro with remote MySql server and Redis server “Backends”, please create a readable file /etc/squirro/backends.ini with the content:

is_mysql_server_remote = true
is_redis_server_remote = true

Squirro Cluster Node#

Installation of the Squirro cluster node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Use the following commands for the installation:

yum install squirro-cluster-node-users
yum install squirro-cluster-node

If the Storage and Cluster node are not the same physical machine, then you need now to adjust the file /etc/nginx/conf.d/upstream-elastic.inc to point to the IP or Hostnames of the storage node(s).

Examples:

Single Server Both Roles#
upstream elastic {
   server 127.0.0.1:9200;
   keepalive 32;
}
Dedicated Storage Node#
upstream elastic {
server 192.168.0.20:9200;
keepalive 32;
}
Multiple Storage Nodes#
upstream elastic {
server 192.168.0.20:9200;
server 192.168.0.21:9200;
server 192.168.0.22:9200;
keepalive 32;
}

If changes have been made to this file, reload the Nginx configuration as follows:

service nginx reload

Caution: If you have chosen to rely on remote MySql and/or Redis server installations, follow the steps in the Set Up on Linux with Remote MySql and Redis Servers section later on this page.

Starting Squirro Services#

Finally, start the Squirro Services using the squirro_start command.

RHEL7/8 and Centos7/8#
source /etc/profile.d/squirro-aliases.sh
squirro_restart

Note: This command will not start the cluster service by default. If you want to start the cluster service also, follow the second systemctl command:

systemctl start sqclusterd

Multi-node Cluster Setup#

There are additional steps that need to be performed when you wish to run Squirro across multiple nodes, as some orchestration between cluster members is needed.

This is handled by a service called Zookeeper. Squirro provides its own Zookeeper library - it needs to be installed separately, though, with:

yum install squirro-python38-squirro.lib.zookeeper

Setting up the License Key#

Squirro packages are provided with a Yum repository. Yum is a utility, provided with the RHEL / CentOS Linux system, that downloads and installs packages from central repositories.

To configure a new repository, follow the steps below:

  1. Set up using a file in the folder /etc/yum.repos.d.

  2. Create or edit /etc/yum.repos.d/squirro.repo :

    [squirro-stable]
    name=Squirro - CentOS-$releasever - Stable
    baseurl=https://FILL_IN_USERNAME_AND:[email protected]/centos/$releasever/stable/$basearch/latest
    enabled=1
    gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
    gpgcheck=1
    sslverify=1
    metadata_expire=300
    
  3. Insert the repository credentials you got for accessing the Squirro Yum repository in this file. Replace FILL_IN_USERNAME_AND by the username and PASSWORD by your password. Also enable the repo by changing the value for “enabled”.

Set Up on Linux with Remote MySql and Redis Servers#

In some cases, e.g. when deploying Squirro to Amazon AWS or MS Azure, you may elect to rely on a remote MySql Server installation (e.g. RDS in case of AWS) and/or remote Redis Servers (e.g. RedisLabs). This page describes how you would configure such remote servers for use by Squirro and how to configure Squirro Cluster Nodes accordingly.

Preparation#

Before you install Squirro RPMs, ensure that you put in place the file /etc/squirro/backends.ini with the following content:

To set up Squirro with remote MySql server and Redis server “Backends”, please create a readable file /etc/squirro/backends.ini:

mkdir -p /etc/squirro
touch /etc/squirro/backends.ini

And add the following content depending on whether you want to run MySql server or Redis servers remotely or both. The example below show the content for both MySql and Redis servers to be configured remotely:

/etc/squirro/backends.ini

# Rely on a remote MySql Server installation
is_mysql_server_remote = true

# Rely on remote Redis Servers
is_redis_server_remote = true

MySql Server Setup#

On the MySql Server front there are server-level configurations, database-level configurations, and Squirro Cluster Node-level configurationsServer-level configuration

MySql Server-level Configuration#

Configure MySql on User level scope, make sure your ~/.my.cnf file content looks like this:

[client]
host="remote mysql server"
user="user for the remote mysql server"
password="password for the remote mysql server"
ssl_ca="location of your SSL certificate"

If there is a requirement, follow the instructions in admin-securing-elastic.

Following configuration are required by Squirro on MySql server level.

[mysqld]
character_set_server=utf8
collation_server=utf8_unicode_ci
innodb_flush_log_at_trx_commit=1

Make sure you configure the MySql client on all Squirro Cluster Nodes to point to the server installation. Also note the name of the MySql administrator account - often called root.

MySql Database and User Creation#

Run the following on a Squirro Cluster Node on which /etc/squirro/backends.ini has been set up and squirro-cluster-node has been installed.

MYSQL_ADMIN=root  # or the name of the MySql Administrator account able to create users and databases
DATABASE_PASSWORD="password of your choosing"

squirro_mysql_databases=`cd /etc/squirro/; grep "^db *=" *.ini    | sed -e "s/\..*$//"`
for mysql_db in $squirro_mysql_databases; do
    . /opt/squirro/setup/ensure_mysql_user_and_database_exist.sh $MYSQL_ADMIN $mysql_db $DATABASE_PASSWORD
done

Squirro Cluster Node Configuration#

Before starting Squirro services, the configuration files under /etc/squirro/ need to be made to point to the remote installation:

MYSQL_SERVER_ADDRESS="address or name of MySql installation"
sed -e "s|squirro/4u|$DATABASE_PASSWORD|" -e "s|localhost|$MYSQL_SERVER_ADDRESS|" -i /etc/squirro/*.ini

Note that the script above depends on the same DATABASE_PASSWORD variable from the “MySql Database and User Creation” section above.

Redis Server Setup#

Redis Server-level Configuration#

Squirro writes into about a dozen Redis databases.

Ideally, your Redis Server setup would have multiple Redis databases although somewhat experimentally it should be possible to use a single Redis Server.

We do however recommend at least having two Redis-Servers one for key/value storage and other for caching. If required, follow the instructions in Securing Redis Instance over SSL.

Squirro Cluster Node Configuration#

You would point the Squirro services to the two Redis Server installation like so:

REDIS_STORAGE_HOST="Redis Server Storage host"
REDIS_STORAGE_PORT="Redis Server Storage port number"
REDIS_STORAGE_PASSWORD="Redis Server Storage password"
REDIS_CACHE_HOST="Redis Server Cache host"
REDIS_CACHE_PORT="Redis Server Cache port number"
REDIS_CACHE_PASSWORD="Redis Server Cache password"

redis_storage_dbs=`grep "^\[redis.*]" /etc/squirro/*.ini    | grep -v "_cache.*]"    | grep -v "redis_key_value"    | sed -e "s/^.*\[//" -e "s/] *$//"    | sort -u`
redis_cache_dbs=`grep "^\[redis.*_cache.*]" /etc/squirro/*.ini    | grep -v "redis_key_value"    | sed -e "s/^.*\[//" -e "s/] *$//"    | sort -u`
redis_key_value_dbs=`grep "^\[redis_key_value_store.*]" /etc/squirro/*.ini    | sed -e "s/^.*\[//" -e "s/] *$//"    | sort -u`
redis_key_value_cache_dbs=`grep "^\[redis_key_value_cache.*]" /etc/squirro/*.ini    | sed -e "s/^.*\[//" -e "s/] *$//"    | sort -u`

# remove old redis passwords:
sed -e "/^password *=/d" -i /etc/squirro/*.ini
sed -e "/^redis_password *=/d" -i /etc/squirro/*.ini

for storage_db in $redis_storage_dbs; do
    sed "s|^\(\[${storage_db}] *\)$|\1\nhost = $REDIS_STORAGE_HOST\nport = $REDIS_STORAGE_PORT\npassword = $REDIS_STORAGE_PASSWORD|" -i /etc/squirro/*.ini
done

for cache_db in $redis_cache_dbs; do
    sed "s|^\(\[${cache_db}] *\)$|\1\nhost = $REDIS_CACHE_HOST\nport = $REDIS_CACHE_PORT\npassword = $REDIS_CACHE_PASSWORD|" -i /etc/squirro/*.ini
done

for storage_db in $redis_key_value_dbs; do
    sed "s|^\(\[${storage_db}] *\)$|\1\nredis_host = $REDIS_STORAGE_HOST\nredis_port = $REDIS_STORAGE_PORT\nredis_password = $REDIS_STORAGE_PASSWORD|" -i /etc/squirro/*.ini
done

for cache_db in $redis_key_value_cache_dbs; do
    sed "s|^\(\[${cache_db}] *\)$|\1\nredis_host = $REDIS_CACHE_HOST\nredis_port = $REDIS_CACHE_PORT\nredis_password = $REDIS_CACHE_PASSWORD|" -i /etc/squirro/*.ini
done

if grep -Fxq "[queues_local]" /etc/squirro/common.ini
then
    sed "s|^\(\[queues_local] *\)$|\1\nredis_host = $REDIS_STORAGE_HOST\nredis_port = $REDIS_STORAGE_PORT\nredis_password = $REDIS_STORAGE_PASSWORD|" -i /etc/squirro/common.ini
else
    cat >>/etc/squirro/common.ini <<FOO

[queues_local]
redis_host = $REDIS_STORAGE_HOST
redis_port = $REDIS_STORAGE_PORT
redis_password = $REDIS_STORAGE_PASSWORD
FOO
fi

Return to the Linux Installation Steps#

Congratulations. You are now ready to continue with the Squirro Cluster Node installation steps.

Linux Users for Squirro#

The following table represents a list of user accounts Squirro creates at installation time.

User Name

Primary Group

Description

Related Services

elasticsearch

elasticsearch

Main data storage for indexed data.

elasticsearch

mysql

mysql

MySQL database used for metadata storage. To investigate this metadata, MySQL can be accessed directly.

mysql

nginx

nginx

Manages nginx web server.

nginx

redis

redis

NoSQL database used for caching.

redis-server, redis-server-cache

sqclustr

squirro

Controls multi-cluster setup of squirro.

sqclusterd

sqconfig

squirro

Manages application configuration.

sqconfigurationd

sqcontnt

squirro

Responsible for extracting relevant content from web pages. See Noise Removal.

sqcontentd

sqdatasource

squirro

Manages dataloading process.

sqdatasourced

sqeauth

squirro

Responsible for managing authentication.

squserd

sqemails

squirro

Responsible for sending out emails on behalf of Squirro

sqemailsenderd

sqfilter

squirro

Handles search tagging rules.See Search Tagging and Alerting.

sqfilteringrd

sqfinger

squirro

Manages the fingerprints (Smart Filters).

sqfingerprintd

sqfront

squirro

Manages frontend servcice, implements main web interface of Squirro.

sqfrontendd

sqingest

squirro

Manages process for enriching and indexing data.

sqingesterd

sqmailer

squirro

Used for preparing the email digest

sqdigestmailerd

sqplumbr

squirro

Responsible for executing pipelets as part of the pipeline. See Pipelets for more information.

sqplumberd

sqprovid

squirro

Manages endpoints for the providers that import data into the system.

sqproviderd

sqptopic

squirro

Manages the Squirro API.

sqtopicd

sqrstory

squirro

Detects near-duplicates as part of the pipeline. See Duplicate Detection.

sqrelatedstoryd

sqsched

squirro

Responsible for executing scheduled jobs that the Squirro application needs to run.

sqschedulerd

sqthumb

squirro

Calculates thumbnails for item images that are displayed in Squirro.

sqthumblerd

sqtproxy

squirro

Manages the web service access to some project metadata.

sqtopicproxyd

sqtrends

squirro

Manages the trend detection survive to analyze trends in time-series data. For more info see trend-detection.

sqtrendsd

squproxy

squirro

Responsbile for manages the web service which gives access to some user data.

squserproxyd

squser

squirro

Endpoint for user management and authentication. See Authentication.

squserd

sqwebsht

squirro

Analyzes web stories and identifies the most relevant image to use for visualization.

sqwebshotd

tika

squirro

Responsible for extracting text from a variety of binary-based documents.

tika

zkeeper

zkeeper

Responsible for cluster coordination services.

zookeeper

Setup on Linux (Offline)#

This section covers installing Squirro on a Linux system without internet access.

Prerequisites#

For an offline installation, Squirro will provide you with a tar.gz file which contains all the required RPM files. Make that file available on the server, for example by copying it over using scp.

YUM Repositories#

Extract Repository#

The tar.gz file above contains a YUM repository that can be used offline. To use this on the server, extract it:

sudo su
mkdir -p /opt/squirro
cd /opt/squirro
tar -xvf /tmp/squirro-xxx.tar.gz

This creates the folder /opt/squirro/repo and populates it with al the RPM files.

Squirro#

Create the file /etc/yum.repos.d/squirro.repo.

[squirro]
name=Squirro
baseurl=file:///opt/squirro/repo/
enabled=1
metadata_expire=300
gpgcheck=0

Storage Node Installation#

Follow the steps in Storage Node Installation to complete the storage node installation.

Cluster Node Installation#

Follow the steps in Cluster Node Installation to complete the cluster node installation.

Disabling User Sign-Up After Installation#

You will create your first user account using the Sign Up button that appears with the first load of a Squirro installation.

The first user created will be a server-level Administrator.

Reference: For further details on server-level roles, see Server-Level Permissions.

After creating your first Administrator account, Squirro recommends disabling the Sign Up feature to prevent unauthorized account creation.

To disable the feature, follow the steps below:

  1. Log in to your Squirro installation as a server administrator.

  2. Navigate to the Server space.

  3. Click Configuration in the left menu.

  4. Search for frontend.userapp; the results list should include frontend.userapp.signup.

  5. Hover over frontend.userapp.signup and click Edit.

  6. Uncheck the box to disable the option, then click the checkmark button to save, as shown in the screenclip below:

Squirro Disable Sign-Up Configuration Option

Troubleshooting & Known Issues#

There is an issue with MariaDB Version 10.6 on RedHat 8.5: Squirro was not able to configure the required SQL tables on MariaDB Version 10.6. This was resolved by downgrading to MariaDB 10.3.