Installing Squirro on Linux#
Warning
We don’t recommend you install Squirro manually. Instead use our role for Ansible. See Install and Manage Squirro with Ansible for more information.
Architecture Overview#
A Squirro cluster can contain anywhere from just one to dozens of servers. Adding additional servers to a Squirro cluster always serves two different needs: performance and capacity. As these needs do not necessarily scale in the same way, Squirro differentiates two different types of servers:
Storage Nodes: stores the data that is indexed into Squirro. Add more storage nodes to scale capacity.
Cluster Nodes: answers requests by users and handles the processing of items that come in. Add more cluster nodes to improve performance.
Storage and cluster nodes can be installed on the same server, and for a single-node setup that is the recommended approach. But if you intend to scale to more than just one server, it is recommended to only install one of the two node types on a server. That makes better use of the available performance and allows for more flexible scaling.
Prerequisites#
This page assumes that a few prerequisites have been met. This includes asystem with internet connectivity and some relevant ports opened in the firewall.
If some of these conditions can not be met, please contact Squirro Support.
Linux Setup#
The installation of the base system is not covered on this page. A fully functional Red Hat Enterprise Linux (RHEL) or Rocky Linux installation is assumed to be set up. The supported versions of these Linux distributions are documented on the System Requirements page.
Networking#
The Linux system needs internet access to download the Squirro packages. Additionally, the system should be accessible from your internal network, so that the Squirro interface can be accessed by administrators and users.
Information on how to set up networking is provided by Red Hat in their Official Networking Guide.
Firewall#
Incoming traffic to Squirro servers should be open on several TCP ports. The individual ports and the required access level are documented below:
TCP Port |
Usage |
Open For |
---|---|---|
9200 |
Elasticsearch access |
All storage and Squirro nodes |
9300 |
Elasticsearch replication |
All storage nodes |
TCP Port |
Usage |
Open For |
---|---|---|
80 |
Web access |
All Squirro nodes and optionally for all clients if |
111 |
Distributed file system |
All Squirro nodes |
443 |
Web access (SSL-protected) |
All clients |
2181 |
Zookeper |
All Squirro nodes |
2409 |
Distributed file system |
All Squirro nodes (in multi-node setups) |
2888 |
Zookeper node discovery |
All Squirro nodes (in multi-node setups) |
3306 |
MySQL |
All Squirro nodes |
3888 |
Zookeper node discovery |
All Squirro nodes (in multi-node setups) |
6379 |
Redis (storage) |
All Squirro nodes |
6380 |
Redis (cache) |
All Squirro nodes |
24007 |
Network file system |
All Squirro nodes (in multi-node setups) |
Note
The list and the port numbers reported above may vary depending on the cloud provider or the software components running on top of the core layers. For example, you have the flexibility to implement a distributed file system according to your specific requirements and preferences. For assistance, contact Squirro Support and open a technical support request to receive guidance from experts.
Users#
Squirro provides packages that set up all the Linux users required on a system. Those packages are used in the instructions below.
However, in some environments, users must not be created by packages. For these cases, users have to be manually created up-front. See Linux Users for Squirro for a detailed list of users that need to be set up.
YUM Repositories#
Squirro packages are provided with a YUM repository. YUM is a utility, provided with the RHEL Linux system, that downloads and installs packages from central repositories. To configure a new repository, it needs to be set up using a file in the folder /etc/yum.repos.d.
Getting the Right Version of Squirro#
For production deployments, Squirro recommends using the latest LTS release. This can be obtained by using the 3.2-lts in the baseurl of the yum repos config (see below) for the latest LTS release of the 3.2 family. This will ensure that a simple yum update on the server will not update the server to the latest bi-weekly release.
Also note that, when Squirro releases the next LTS release of 3.3-lts, even then it has to be a conscious choice to point the link in the baseurl to the LTS release of the 3.3 family. The previous 3.2-lts link will not get updated to point to the LTS release of 3.3 family.
Squirro#
Create the file /etc/yum.repos.d/squirro.repo
. The exact content of the file will be provided to you by Squirro Support when delivering your license.
Use the following examples, but note that the username and password are not filled in:
RHEL 7#
[squirro-stable]
name=Squirro - Red Hat Enterprise Linux 7 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/7/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300
RHEL 8#
[squirro-stable]
name=Squirro - Red Hat Enterprise Linux 8 - Stable
baseurl=https://<user name>:<password>@mirror.squirro.net/centos/8/stable/$basearch/<specific version or 'latest' or '3.2-lts'>/
enabled=1
gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable
gpgcheck=1
sslverify=1
metadata_expire=300
Storage Node Installation#
Installation of the storage node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Furthermore, we also need to explicitly install a few dependencies. Use the following commands for the installation:
Java#
sudo su
yum install java-1.8.0-openjdk
Squirro Storage Node#
sudo su
yum install squirro-storage-node-users
yum install elasticsearch
yum install squirro-storage-node
Network Connectivity#
If you are setting up a dedicated storage node, instead of mixing cluster and storage node on the same server, then you need to change the Elasticsearch configuration so it listens on a network IP address.
To do this, edit /etc/elasticsearch/elasticsearch.yml
to add the server’s IP address to the network.host
setting.
When doing this, you also need to declare the discovery.seed_hosts
. To set up a cluster, see Cluster Expansion - the following value will work only for the single-node cluster case.
network.host: <storage node IP>,127.0.0.1
discovery.seed_hosts: ["127.0.0.1"]
Cluster Node Installation#
Installation of the Squirro cluster node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Furthermore, we also need to explicitly install a few dependencies. Use the following commands for the installation:
Java#
Squirro depends on JRE, which is provided in the Squirro YUM repository itself. But this package has to be installed explicitly as none of the Squirro packages declare an explicit dependency on JRE. This is done to provide more flexibility in making Squirro work for various custom deployments where we do not control the version of Java installed on the server. Please execute the yum command below to install JRE:
yum install java-1.8.0-openjdk
Running MySQL and Redis Remotely#
You can choose to run MySql server and Redis servers remotely, i.e. not residing on the Squirro Cluster Node if you go through the trouble of setting up MySql and Redis Server installations with a specific configuration.
To set up Squirro with remote MySql server and Redis server “Backends”, please create a readable file /etc/squirro/backends.ini
with the content:
is_mysql_server_remote = true
is_redis_server_remote = true
Squirro Cluster Node#
Installation of the Squirro cluster node happens with two separate packages. The first package installs the required Linux users and the second installs the services. Use the following commands for the installation:
yum install squirro-cluster-node-users
yum install squirro-cluster-node
If the Storage and Cluster node are not the same physical machine, then you need now to adjust the file /etc/nginx/conf.d/upstream-elastic.inc
to point to the IP or Hostnames of the storage node(s).
Examples:
upstream elastic {
server 127.0.0.1:9200;
keepalive 32;
}
upstream elastic {
server 192.168.0.20:9200;
keepalive 32;
}
upstream elastic {
server 192.168.0.20:9200;
server 192.168.0.21:9200;
server 192.168.0.22:9200;
keepalive 32;
}
If changes have been made to this file, reload the Nginx configuration as follows:
service nginx reload
Caution: If you have chosen to rely on remote MySql and/or Redis server installations, follow the steps in the Set Up on Linux with Remote MySql and Redis Servers section later on this page.
Starting Squirro Services#
Finally, start the Squirro Services using the squirro_start
command.
source /etc/profile.d/squirro-aliases.sh
squirro_restart
Note: This command will not start the cluster service by default. If you want to start the cluster service also, follow the second systemctl
command:
systemctl start sqclusterd
Multi-node Cluster Setup#
There are additional steps that need to be performed when you wish to run Squirro across multiple nodes, as some orchestration between cluster members is needed.
This is handled by a service called Zookeeper. Squirro provides its own Zookeeper library - it needs to be installed separately, though, with:
yum install squirro-python38-squirro.lib.zookeeper
Setting up the License Key#
Squirro packages are provided with a YUM repository. To configure a new repository, follow the steps below:
Set up using a file in the folder
/etc/yum.repos.d
.Create or edit /etc/yum.repos.d/squirro.repo :
[squirro-stable] name=Squirro - CentOS-$releasever - Stable baseurl=https://FILL_IN_USERNAME_AND:[email protected]/centos/$releasever/stable/$basearch/latest enabled=1 gpgkey=https://mirror.squirro.net/centos/RPM-GPG-KEY-squirro-stable gpgcheck=1 sslverify=1 metadata_expire=300
Insert the repository credentials you got for accessing the Squirro YUM repository in this file. Replace FILL_IN_USERNAME_AND by the username and PASSWORD by your password. Also enable the repo by changing the value for “enabled”.
Set Up on Linux with Remote MySql and Redis Servers#
In some cases, e.g. when deploying Squirro to Amazon AWS or MS Azure, you may elect to rely on a remote MySql Server installation (e.g. RDS in case of AWS) and/or remote Redis Servers (e.g. RedisLabs). This page describes how you would configure such remote servers for use by Squirro and how to configure Squirro Cluster Nodes accordingly.
Preparation#
Before you install Squirro RPMs, ensure that you put in place the file /etc/squirro/backends.ini
with the following content:
To set up Squirro with remote MySql server and Redis server “Backends”, please create a readable file /etc/squirro/backends.ini
:
mkdir -p /etc/squirro
touch /etc/squirro/backends.ini
And add the following content depending on whether you want to run MySql server or Redis servers remotely or both. The example below show the content for both MySql and Redis servers to be configured remotely:
/etc/squirro/backends.ini
# Rely on a remote MySql Server installation
is_mysql_server_remote = true
# Rely on remote Redis Servers
is_redis_server_remote = true
MySql Server Setup#
On the MySql Server front there are server-level configurations, database-level configurations, and Squirro Cluster Node-level configurationsServer-level configuration
MySql Server-level Configuration#
Configure MySql on User level scope, make sure your ~/.my.cnf file content looks like this:
[client]
host="remote mysql server"
user="user for the remote mysql server"
password="password for the remote mysql server"
ssl_ca="location of your SSL certificate"
If there is a requirement, follow the instructions in admin-securing-elastic.
Following configuration are required by Squirro on MySql server level.
[mysqld]
character_set_server=utf8
collation_server=utf8_unicode_ci
innodb_flush_log_at_trx_commit=1
Make sure you configure the MySql client on all Squirro Cluster Nodes to point to the server installation. Also note the name of the MySql administrator account - often called root.
MySql Database and User Creation#
Run the following on a Squirro Cluster Node on which /etc/squirro/backends.ini
has been set up and squirro-cluster-node
has been installed.
MYSQL_ADMIN=root # or the name of the MySql Administrator account able to create users and databases
DATABASE_PASSWORD="password of your choosing"
squirro_mysql_databases=`cd /etc/squirro/; grep "^db *=" *.ini | sed -e "s/\..*$//"`
for mysql_db in $squirro_mysql_databases; do
. /opt/squirro/setup/ensure_mysql_user_and_database_exist.sh $MYSQL_ADMIN $mysql_db $DATABASE_PASSWORD
done
Squirro Cluster Node Configuration#
Before starting Squirro services, the configuration files under /etc/squirro/
need to be made to point to the remote installation:
MYSQL_SERVER_ADDRESS="address or name of MySql installation"
sed -e "s|squirro/4u|$DATABASE_PASSWORD|" -e "s|localhost|$MYSQL_SERVER_ADDRESS|" -i /etc/squirro/*.ini
Note that the script above depends on the same DATABASE_PASSWORD variable from the “MySql Database and User Creation” section above.
Redis Server Setup#
Redis Server-level Configuration#
Squirro writes into about a dozen Redis databases.
Ideally, your Redis Server setup would have multiple Redis databases although somewhat experimentally it should be possible to use a single Redis Server.
We do however recommend at least having two Redis-Servers one for key/value storage and other for caching. If required, follow the instructions in Securing Redis Instance over SSL.
Squirro Cluster Node Configuration#
You would point the Squirro services to the two Redis Server installation like so:
REDIS_STORAGE_HOST="Redis Server Storage host"
REDIS_STORAGE_PORT="Redis Server Storage port number"
REDIS_STORAGE_PASSWORD="Redis Server Storage password"
REDIS_CACHE_HOST="Redis Server Cache host"
REDIS_CACHE_PORT="Redis Server Cache port number"
REDIS_CACHE_PASSWORD="Redis Server Cache password"
redis_storage_dbs=`grep "^\[redis.*]" /etc/squirro/*.ini | grep -v "_cache.*]" | grep -v "redis_key_value" | sed -e "s/^.*\[//" -e "s/] *$//" | sort -u`
redis_cache_dbs=`grep "^\[redis.*_cache.*]" /etc/squirro/*.ini | grep -v "redis_key_value" | sed -e "s/^.*\[//" -e "s/] *$//" | sort -u`
redis_key_value_dbs=`grep "^\[redis_key_value_store.*]" /etc/squirro/*.ini | sed -e "s/^.*\[//" -e "s/] *$//" | sort -u`
redis_key_value_cache_dbs=`grep "^\[redis_key_value_cache.*]" /etc/squirro/*.ini | sed -e "s/^.*\[//" -e "s/] *$//" | sort -u`
# remove old redis passwords:
sed -e "/^password *=/d" -i /etc/squirro/*.ini
sed -e "/^redis_password *=/d" -i /etc/squirro/*.ini
for storage_db in $redis_storage_dbs; do
sed "s|^\(\[${storage_db}] *\)$|\1\nhost = $REDIS_STORAGE_HOST\nport = $REDIS_STORAGE_PORT\npassword = $REDIS_STORAGE_PASSWORD|" -i /etc/squirro/*.ini
done
for cache_db in $redis_cache_dbs; do
sed "s|^\(\[${cache_db}] *\)$|\1\nhost = $REDIS_CACHE_HOST\nport = $REDIS_CACHE_PORT\npassword = $REDIS_CACHE_PASSWORD|" -i /etc/squirro/*.ini
done
for storage_db in $redis_key_value_dbs; do
sed "s|^\(\[${storage_db}] *\)$|\1\nredis_host = $REDIS_STORAGE_HOST\nredis_port = $REDIS_STORAGE_PORT\nredis_password = $REDIS_STORAGE_PASSWORD|" -i /etc/squirro/*.ini
done
for cache_db in $redis_key_value_cache_dbs; do
sed "s|^\(\[${cache_db}] *\)$|\1\nredis_host = $REDIS_CACHE_HOST\nredis_port = $REDIS_CACHE_PORT\nredis_password = $REDIS_CACHE_PASSWORD|" -i /etc/squirro/*.ini
done
if grep -Fxq "[queues_local]" /etc/squirro/common.ini
then
sed "s|^\(\[queues_local] *\)$|\1\nredis_host = $REDIS_STORAGE_HOST\nredis_port = $REDIS_STORAGE_PORT\nredis_password = $REDIS_STORAGE_PASSWORD|" -i /etc/squirro/common.ini
else
cat >>/etc/squirro/common.ini <<FOO
[queues_local]
redis_host = $REDIS_STORAGE_HOST
redis_port = $REDIS_STORAGE_PORT
redis_password = $REDIS_STORAGE_PASSWORD
FOO
fi
Return to the Linux Installation Steps#
Congratulations. You are now ready to continue with the Squirro Cluster Node installation steps.
Linux Users for Squirro#
The following table represents a list of user accounts Squirro creates at installation time.
User Name |
Primary Group |
Description |
Related Services |
elasticsearch |
elasticsearch |
Main data storage for indexed data. |
elasticsearch |
mysql |
mysql |
MySQL database used for metadata storage. To investigate this metadata, MySQL can be accessed directly. |
mysql |
nginx |
nginx |
Manages nginx web server. |
nginx |
redis |
redis |
NoSQL database used for caching. |
redis-server, redis-server-cache |
sqclustr |
squirro |
Controls multi-cluster setup of squirro. |
sqclusterd |
sqconfig |
squirro |
Manages application configuration. |
sqconfigurationd |
sqcontnt |
squirro |
Responsible for extracting relevant content from web pages. See Noise Removal. |
sqcontentd |
sqdatasource |
squirro |
Manages dataloading process. |
sqdatasourced |
sqeauth |
squirro |
Responsible for managing authentication. |
squserd |
sqemails |
squirro |
Responsible for sending out emails on behalf of Squirro |
sqemailsenderd |
sqfilter |
squirro |
Handles search tagging rules.See Search Tagging and Alerting. |
sqfilteringrd |
sqfinger |
squirro |
Manages the fingerprints (Smart Filters). |
sqfingerprintd |
sqfront |
squirro |
Manages frontend servcice, implements main web interface of Squirro. |
sqfrontendd |
sqingest |
squirro |
Manages process for enriching and indexing data. |
sqingesterd |
sqmailer |
squirro |
Used for preparing the email digest |
sqdigestmailerd |
sqplumbr |
squirro |
Responsible for executing pipelets as part of the pipeline. See Pipelets for more information. |
sqplumberd |
sqprovid |
squirro |
Manages endpoints for the providers that import data into the system. |
sqproviderd |
sqptopic |
squirro |
Manages the Squirro API. |
sqtopicd |
sqrstory |
squirro |
Detects near-duplicates as part of the pipeline. See Duplicate Detection. |
sqrelatedstoryd |
sqsched |
squirro |
Responsible for executing scheduled jobs that the Squirro application needs to run. |
sqschedulerd |
sqthumb |
squirro |
Calculates thumbnails for item images that are displayed in Squirro. |
sqthumblerd |
sqtproxy |
squirro |
Manages the web service access to some project metadata. |
sqtopicproxyd |
sqtrends |
squirro |
Manages the trend detection survive to analyze trends in time-series data. For more info see trend-detection. |
sqtrendsd |
squproxy |
squirro |
Responsbile for manages the web service which gives access to some user data. |
squserproxyd |
squser |
squirro |
Endpoint for user management and authentication. See Authentication. |
squserd |
sqwebsht |
squirro |
Analyzes web stories and identifies the most relevant image to use for visualization. |
sqwebshotd |
tika |
squirro |
Responsible for extracting text from a variety of binary-based documents. |
tika |
zkeeper |
zkeeper |
Responsible for cluster coordination services. |
zookeeper |
Setup on Linux (Offline)#
This section covers installing Squirro on a Linux system without internet access.
Prerequisites#
For an offline installation, Squirro will provide you with a tar.gz
file which contains all the required RPM files. Make that file available on the server, for example by copying it over using scp.
YUM Repositories#
Extract Repository#
The tar.gz
file above contains a YUM repository that can be used offline. To use this on the server, extract it:
sudo su
mkdir -p /opt/squirro
cd /opt/squirro
tar -xvf /tmp/squirro-xxx.tar.gz
This creates the folder /opt/squirro/repo
and populates it with al the RPM files.
Squirro#
Create the file /etc/yum.repos.d/squirro.repo
.
[squirro]
name=Squirro
baseurl=file:///opt/squirro/repo/
enabled=1
metadata_expire=300
gpgcheck=0
Storage Node Installation#
Follow the steps in Storage Node Installation to complete the storage node installation.
Cluster Node Installation#
Follow the steps in Cluster Node Installation to complete the cluster node installation.
Disabling User Sign-Up After Installation#
You will create your first user account using the Sign Up button that appears with the first load of a Squirro installation.
The first user created will be a server-level Administrator.
Reference: For further details on server-level roles, see Server-Level Permissions.
After creating your first Administrator account, Squirro recommends disabling the Sign Up
feature to prevent unauthorized account creation.
To disable the feature, follow the steps below:
Log in to your Squirro installation as a server administrator.
Navigate to the Server space.
Click Configuration in the left menu.
Search for frontend.userapp; the results list should include
frontend.userapp.signup
.Hover over
frontend.userapp.signup
and click Edit.Uncheck the box to disable the option, then click the checkmark button to save, as shown in the screenclip below:
Troubleshooting & Known Issues#
There is an issue with MariaDB Version 10.6 on RedHat 8.5: Squirro was not able to configure the required SQL tables on MariaDB Version 10.6. This was resolved by downgrading to MariaDB 10.3.