Cluster Expansion#
This page details how to add cluster nodes to a Squirro installation. For base Linux installation, see the Installing Squirro on Linux page.
For background on Squirro cluster setups, see the How Squirro Scales page. That page provides an overview of Squirro components and their scaling considerations.
Prerequisites#
To add cluster nodes, first ensure that:
The Linux machines are up and running, see the System Requirements page for the list of supported operating systems.
The network is set up and all the machines that are to become part of the Squirro cluster can talk to each other.
The firewalls are open between the machines with the documented ports accessible.
The Squirro YUM repository is configured and accessible. In an enterprise environment, where this poses a problem, offline installation is available. Contact Squirro Support in this case.
Overview#
Any expansion of the cluster requires some work on the old and new nodes. This is outlined in the processes below by splitting the work up into sections, based on where the work is to be executed.
The process as described here involves some cluster downtime. It is possible to expand a Squirro cluster without any downtime involved - but that process requires a bit more planning and orchestration. If you need a downtime-free expansion, contact Squirro Support.
Scaling Considerations#
Cluster nodes and storage nodes can be scaled independently. You can add storage nodes without modifying the cluster node configuration, and vice versa.
Adding storage nodes
Add storage nodes one at a time. Adding multiple nodes simultaneously increases the risk of discovery and replication issues during cluster formation.
Master-eligible node count
To avoid split-brain situations, keep an odd number of master-eligible storage nodes (1, 3, 5, and so on). With an even number of nodes, the cluster may fail to elect a leader if the nodes become equally divided across a network partition. Squirro recommends following the Elasticsearch guidance on limiting the number of master-eligible nodes to a small, odd-numbered set. For more information, see the Elasticsearch high-availability guide on the Elasticsearch website.
Storage Node Installation#
The Squirro storage nodes are based on Elasticsearch. As such, some of the configurations needed for adding a storage node are found in the Elasticsearch configuration files.
Process for New Storage Nodes
Follow the steps below to add a new storage node.
Install/update the storage node package, as described in Storage Node Installation.
Apply the configuration from the previous storage nodes to the new one. Copy over the following setting from
/etc/elasticsearch/elasticsearch.yml:
cluster.name
Make sure that this setting value is copied from the previous storage nodes to the new one - and not the other way around.
Process for All Storage Nodes (Existing as Well as New)
Allow the hosts to discover each other. Again in
/etc/elasticsearch/elasticsearch.ymlchange the following settings:Allow Elasticsearch to bind to a network interface by using the following config:
/etc/elasticsearch/elasticsearch.ymlnetwork.host: <server ip>,127.0.0.1
This is a list of the server’s own IP addresses (as can be retrieved with
ip addrfor example), for example:10.1.4.5.Set
discovery.seed_hostsandcluster.initial_master_nodesto a list of all the storage nodes that have been set up. For example:/etc/elasticsearch/elasticsearch.ymldiscovery.seed_hosts: ["<storagenode1 ip>", "<storagenode2 ip>", "<storagenode3 ip>"] cluster.initial_master_nodes: ["<storagenode1 ip>", "<storagenode2 ip>", "<storagenode3 ip>"]
This is the easiest way to set up discovery and ensure all the Elasticsearch nodes can see each other. However, there are also other ways of configuring the discovery of the Elasticsearch nodes.
This is documented by Elasticsearch in the Discovery section of the Elasticsearch Manual.
Also, optionally, you can set the node name to a friendlier name:
/etc/elasticsearch/elasticsearch.yml# User friendly node name node.name: test-node1
For new nodes remove the current Elasticsearch state:
mv /var/lib/elasticsearch/nodes /tmp/
Note
You can also remove this folder instead of moving it to
/tmp. Moving it allows you to recover if you ran this on the wrong node.Restart the service for the settings to take effect.
systemctl restart elasticsearch
To verify the nodes discovered each other and formed a cluster, you can debug with this command:
es_curl https://localhost:9200/_nodes?pretty | less
If successful, you should see the correct number of nodes in the output:
{ "_nodes" : { "total" : 3, "successful" : 3, "failed" : 0 } }
Set up the number of shards and number of replicas.
Modify
number_of_shardsandnumber_of_replicasin the templates. For more information, see the Configuring Elasticsearch Templates page. For multi-node storage setups, setnumber_of_replicasto1andnumber_of_shardsto the number of Elasticsearch nodes.Now update the shards and replicas settings of indices that were already present on the cluster before updating the templates using the curl command below. You can also selectively update the shards and replica settings of a particular index (instead of all indices) by replacing the
*with the name of the index. This is a cluster wide setting and only needs to be done on one of the nodes of ES cluster.es_curl -XPUT https://127.0.0.1:9200/*/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
Cluster Node Installation#
Process for Each New Squirro Cluster Node Server#
Install the cluster node package, as described in Cluster Node Installation.
Install the additional required package for cluster coordination (Zookeeper library):
yum install squirro-python-squirro.lib.zookeeperEnsure that each of the cluster node can talk to the Elasticsearch cluster (Squirro Storage Node). Change the config at
/etc/squirro/common.inito[index] es_index_servers = <storagenode1 ip>:9200,<storagenode2 ip>:9200,<storagenode3 ip>:9200
Allowlist all the Squirro Cluster nodes in the following nginx ipfilter files:
/etc/nginx/conf.d/ipfilter-cluster-nodes.inc/etc/nginx/conf.d/ipfilter-api-clients.inc/etc/nginx/conf.d/ipfilter-monitoring.inc.In each of these files include each IP address as follows:
allow <clusternode1 ip>; allow <clusternode2 ip>; allow <clusternode2 ip>;
Alternatively, the
allowdirective also accepts network addresses, for example10.1.4.0/24to allowlist an entire network.
Reload nginx at each of the cluster nodes.
$ systemctl reload nginx
Create the MariaDB / MySQL control users required by the cluster service. Because these users have elevated permissions, they are not created automatically by the Squirro installer. To create these users, invoke the
mysqlcommand as root user, then enter the following two commands:grant select, reload, super, lock tables, replication slave, create, drop, alter, insert, process on *.* to "cluster"@"localhost" identified by "CLUSTER_CLUSTER_PASSWORD"; grant replication slave, replication client on *.* to "repl"@"%" identified by "CLUSTER_REPL_PASSWORD";
Generate secure passwords for both users (needs to be the same on all cluster nodes, though), they will be added to the configuration file later.
Stop all the services by executing the following command:
cd /usr/lib/systemd/system for service in $(ls sq*d.service); do echo "Stopping $service"; systemctl stop $service; done for service in redis-server redis-server-cache mariadb zookeeper; do echo "Stopping $service"; systemctl stop $service; done
Squirro cluster service
Edit the
/etc/squirro/cluster.iniconfiguration file.Ensure that the
mgmt_ifacesetting underclustersection specifies a valid network interface on which all the cluster nodes can communicate with each other (you need to look it up using network settings of your cluster node, i.e. withifconfig, and set appropriately:[cluster] # If the appropriate network interface is ens142 mgmt_iface = ens142
Also inside the
[cluster]section, edit the following settings:id: change this to the same value as on the previous cluster nodes - ensuring it’s the same value for all cluster nodes.redis_controller: set this totrueso that Redis replication is managed by the Squirro cluster service.mysql_controller: set this totrueso that MySQL replication is managed by the Squirro cluster service.
Add the database passwords to the
[mysql]section (changing the passwords to the generated values):[mysql] db = mysql+mariadbconnector://cluster:CLUSTER_CLUSTER_PASSWORD@127.0.0.1:3306 repl_password = CLUSTER_REPL_PASSWORD
Add the list of all the zookeeper nodes (including this new node) to the hosts list in the
[zookeeper]section:/etc/squirro/cluster.ini
[zookeeper] hosts = <clusternode1 ip>:2181,<clusternode2 ip>:2181,<clusternode3 ip>:2181
- Turn on endpoint discovery for all Redis and database connections. This ensures that the services consult the cluster service to know which cluster node is the main node.Changing this requires all the
endpoint_discovery(for Redis)db_endpoint_discovery(for MySQL) configuration entries and in every/etc/squirro/*.inifile to be set totrue. This can be automated with the followingsedcommands:sed -i -e 's/^endpoint_discovery = false/endpoint_discovery = true/' /etc/squirro/*.ini sed -i -e 's/^db_endpoint_discovery = false/db_endpoint_discovery = true/' /etc/squirro/*.ini
MySQL
Turn on MySQL replication. This requires two changes in
/etc/mysql/conf.d/replication.cnf- both of these values are commented out by default:server_id: this integer value needs to be a unique value over the whole cluster. For example use10for the first server in the cluster,11for the second, etc.report_host: set this to the human-readable name of the server, as it should be reported to the other hosts - for examplenode01.
Raise the MySQL limits on open files and maximum connections.
/etc/mysql/conf.d/maxconnections.cnf[mysqld] open_files_limit = 8192 max_connections = 500
The
max_connectionssetting should be set higher depending on the number of cluster nodes. Squirro recommends at least 150 connections for each cluster node.
Redis
Extend the list of listening addresses for redis-server and redis-server-cache services, by editing
/etc/redis/redis.confand/etc/redis/cache.confand listing all the cluster nodes (including this new server):/etc/redis/redis.confand/etc/redis/cache.confbind 0.0.0.0
Note
This configures Redis to listen on all network interfaces and IP addresses on the server.
Zookeeper
Set the unique Zookeeper node identifier. This ID needs to start at 1, and then for each node incremented by 1. Write this identifier to
/var/lib/zookeeper/data/myid.Add a list of all cluster nodes to Zookeeper. Edit
/etc/zookeeper/zoo.cfgand list all the cluster nodes (including this new server):/etc/zookeeper/zoo.cfgserver.1=<clusternode1 ip>:2888:3888 server.2=<clusternode2 ip>:2888:3888 server.3=<clusternode3 ip>:2888:3888
Start services necessary for a leader election. Do NOT start the cluster service yet to avoid promoting the new node to main. Start the cluster service on this node only after confirming that a leader has been elected from one of the existing nodes by following the next step.
systemctl start zookeeper systemctl start redis-server systemctl start redis-server-cache systemctl start mariadb
At this point, follow the instructions provided earlier in Process for Each New Squirro Cluster Node Server.
Starting
Start the cluster node:
systemctl start sqclusterd
Wait for the cluster node to come up. Make sure the election leader is the same one as on the previous nodes.
curl -s http://127.0.0.1/service/cluster/v0/leader/cluster.json | python -mjson.tool | grep electionLeader
This command may have be repeated a few times until a result is returned.
Start all other services:
cd /usr/lib/systemd/system for service in $(ls sq*d.service); do echo "Starting $service"; systemctl start $service; done
Process on all the other cluster nodes (existing nodes, before the cluster expansion)#
This process needs to happen together with the Zookeeper configuration on the new cluster node.
Add the new servers to the Zookeeper configuration. Edit
/etc/zookeeper/zoo.cfgand list all the cluster nodes (including this new server):/etc/zookeeper/zoo.cfgserver.1=<clusternode1 ip>:2888:3888 server.2=<clusternode2 ip>:2888:3888 server.3=<clusternode3 ip>:2888:3888
This list should be identical on every cluster node.
Add the new Zookeeper node to the existing list of zookeeper nodes in the Squirro cluster service config file. Edit
/etc/squirro/cluster.iniand list all the zookeeper hosts.[zookeeper] hosts = <clusternode1 ip>:2181,<clusternode2 ip>:2181,<clusternode3 ip>:2181
Extend the list of listening addresses for redis-server and redis-server-cache services, by editing
/etc/redis/redis.confand/etc/redis/cache.confand listing all the cluster nodes:bind 127.0.0.1 <clusternode1 ip> <clusternode2 ip> <clusternode3 ip>
Restart Redis, Zookeeper and then cluster service:
systemctl restart redis-server systemctl restart redis-server-cache systemctl restart zookeeper systemctl restart sqclusterd
Check that the election leader points to one of the existing nodes:
curl -s http://127.0.0.1/service/cluster/v0/leader/cluster.json | python -mjson.tool | grep electionLeader
This outputs a line containing the node selected as the leader by the Squirro cluster node.
Setting up Cluster Node Storage#
Some parts of Squirro require a shared file system. This is used for:
Uploading data loader plugins, pipelets, and custom widgets to a cluster.
Uploading files through the Squirro frontend and handling crawling output.
Indexing binary data files.
You can implement a shared file system through various methods, such as a NAS or an existing clustering file system. For assistance, contact Squirro Support and open a technical support request to receive guidance from experts.
Troubleshooting#
Network Drop Between Servers#
This could be caused by a network monitoring tool closing all idle connections at periodic interval. In this cases, try lowering the TCP keep-alive used by the system and services:
Example, setting the value to 600s:
Change
/proc/sys/net/ipv4/tcp_keepalive_timevalue to 600:
# echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
Change “tcp-keepalive” value to 600 in
/etc/redis/redis.confAdd the following to
/etc/zookeeper/zoo.cfg:
tcpKeepAlive=true
- Add a new file
/etc/sysctl.d/98-elasticsearch.confwith the following content:# lower keepalive settings to avoid elasticsearch cluster disconnects net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 60 net.ipv4.tcp_keepalive_probes = 20