Installing Squirro Using Ansible#

This page provides instructions on how to install Squirro using Red Hat’s Ansible Automation Platform.

Note

If you prefer to deploy Squirro as a service, see start.squirro.com.

Overview#

The Ansible plugin is the recommended way of managing Squirro. It supports standalone installations as well as highly customized, multi-node deployments using Ansible AWX/Tower.

If you still want to install Squirro manually, see Installing Squirro on Linux. With that said, Squirro strongly recommends the Ansible plugin.

This is the way ;-)

Download#

The current Beta version is available on Squirro mirror here:

Download the Squirro Ansible Module

Note: You must have a valid Squirro mirror user account to access this download. If you require assistance, contact Squirro Support.

Quickstart Installation#

If you’re looking for a quick single server setup of the latest Squirro release, follow the steps below:

  1. Secure a Rocky Linux 8, RHEL8, CentOS 7, or RHEL7 Linux server. Squirro recommends AWS, GCP, Azure, Hetzner Cloud, or any physical server.

For development, Vagrant is a great choice, but any VM solution such as Virtualbox, VMware Workstation/Fusion/Player etc will work.

  1. Ensure your setup meets the requirements detailed in System Requirements.

  2. Log in to the server and become the root.

  3. Install Ansible using the following commands:

yum install -y epel-release
yum install -y ansible
  1. Unzip the downloaded Squirro Ansible Module using the following command:

unzip squirro-ansible-x.x.x-beta-release.zip
  1. Edit the playbook-quickstart.yml to make the yum username and password match your Squirro mirror username and password.

Example code shown below:

- name: Quickstart Install Squirro
  hosts: all
  become: true
  vars:
    squirro_clusternode: True
    squirro_storagenode: True
    yum_user: ...
    yum_password: ...
    squirro_channel: stable
    squirro_version: latest
    elasticsearch_discovery_type: single-node
  roles:
    - role: squirro-ansible
  1. Install Squirro using the following command:

ansible-playbook --connection=local --inventory 127.0.0.1, playbook-quickstart.yml

Note

Depending on your system, the installation can take 15-20 minutes.

  1. Open the HTTP and HTTPS port on the firewall as shown below:

# firewall-cmd --zone=public --permanent --add-service=http
# firewall-cmd --zone=public --permanent --add-service=https
  1. Validate if all services are running using the following command:

squirro_status
  1. Access the service at https://your-sytems-ip or http://your-systems-ip.

Note

To learn how to use Squirro, visit the Squirro Academy at learn.squirro.com.

Offline Installation#

If your machine cannot reach mirror.squirro.net, you can install Squirro offline as shown below:

- name: Quickstart Install Squirro
hosts: all
become: true
vars:
    squirro_clusternode: True
    squirro_storagenode: True
    squirro_install_mode: offline
    squirro_packages_tar: /path/to/os-n.n-stable-x86_64-n.n.n.tar.gz
    squirro_channel: stable
    squirro_version: latest
    elasticsearch_discovery_type: single-node

roles:
    - role: squirro-ansible

The corresponding tar.gz file can be downloaded from the mirror.

Example: For Rocky Linux 8, you can get the correct package from: https://mirror.squirro.net/rocky/8/stable/x86_64/.

If you expect to run multiple installations you can speed things up by extracting the tar.gz file and placing it on a shared filesystem, as shown below:

- name: Quickstart Install Squirro
hosts: all
become: true
vars:
    squirro_clusternode: True
    squirro_storagenode: True
    squirro_install_mode: filesystem
    yum_repo_folder: /path/to/offline/yum/repo
    squirro_channel: stable
    squirro_version: latest
    elasticsearch_discovery_type: single-node
roles:
    - role: squirro-ansible

This can be faster as the uncompression will not take place each time.

Note: The yum_repo_folder needs to point at the location that contains the repodata folder.

AWS Standalone Playbook#

Expanding on the quickstart example, on AWS, when deploying Squirro, it is best practice to leverage managed services to increase availability and enable horizontal scaling.

In the example below, the following are leveraged:

  • RDS to externalize MariaDB/MySQL

  • ElastiCache to externalize the Key/Value Store

  • ElastiCache again to externalize the In Memory Cache

  • EFS as a shared filesystem

These resources can be spun up manually in the AWS web console.

Important: Squirro highly recommends you leverage infrastructure automation solutions such as CloudFormation or Terraform.

Once complete, edit the playbook-quickstart-aws.yml file to fill in all the blanks and adjust the RDS and ElastiCache endpoints, credentials, and IPs.

- name: Quickstart Install Squirro
hosts: all
become: true
vars:
    squirro_clusternode: True
    squirro_storagenode: True
    yum_user: ...
    yum_password: ...
    squirro_channel: stable
    squirro_version: latest
    elasticsearch_squirro_cluster_name: squirro-vagrant-testing
    elasticsearch_cluster_nodes: ['10.1.0.2', '10.1.0.3', '...']
    elasticsearch_network_interface: eth0
    remote_mysql_server: True
    mysql_host: yourproject-db.abcdefghij.eu-central-1.rds.amazonaws.com
    mysql_root_user: root
    mysql_root_password: ...
    mysql_shared_service_password: ...
    remote_redis_server: True
    redis_tls: True
    redis_storage_host: master.storage-abcdefgh.euc1.cache.amazonaws.com
    redis_storage_port: 6379
    redis_storage_password: ...
    redis_cache_host: master.storage-abcdefgh.euc1.cache.amazonaws.com
    redis_cache_port: 6380
    redis_cache_password: ...
    remote_filesystem: True
    remote_filesystem_path: /mnt/efs/squirro
roles:
    - role: squirro-ansible

You then run the same steps as under the Quickstart section to install Ansible, but instead run this command to execute the install:

ansible-playbook --connection=local --inventory 127.0.0.1, playbook-quickstart-aws.yml

You can now repeat this on each EC2 instance.

Tip: For production deployments, Squirro typically recommends at least three instances.

This procedure can be fully automated using Cloudinit/Userdata methods to bootstrap new instances and/or to build AMI images using frameworks such as packer.

Caution

This only serves as an example, for full production readiness you will need to delegate the secrets to a secrets manager (e.g. Hashicorp Vault) and also leverage the AWS EC2 discovery plugin for Elasticsearch.

Role Variables#

The following variables can be set to control the squirro-ansible role:

ENTRY POINT: main - The main entry point for the squirro-ansible role.

OPTIONS (= is mandatory):

- config_decrypt_command
        Can be set to a command line utility that will be called when
        the Squirro .ini files encounter prefix.
        [Default: None]
        type: str

- custom_pip_index_url
        If custom pip index-url for the squirro virtualenv. This is
        only of relevance if additional pip packages/wheel need to be
        installed and the installation cannot reach the internet /
        pypi.
        [Default: ]
        type: str

- database_type
        Set the sql database squirro is using. mysql is default and
        recommended. postgres support is currently experimental, see
        https://nektoon.atlassian.net/wiki/x/AQAng
        (Choices: mysql, postgres)[Default: mysql]
        type: str

- elasticsearch_cluster_nodes
        List of hostname or ip addresses of all elasticsearch nodes.
        This is only relevant if `elasticsearch_discovery_type` is set
        to `zen`.
        [Default: ['127.0.0.1']]
        type: list

- elasticsearch_data_dir
        Location of the elasticsearch data folder. Use this to place
        the elasticsearch indices onto dedicated storage volumes. e.g.
        on AWS you can use this to locate the data onto the epherial
        but very fast NVMe drives if you use a m5d, r5d or c5d
        instance type. Note that you cannot change this after the
        initial deployment. If you have multiple volumes, you can gain
        extra performance by letting elasticsearch span all of the
        volumes, in this case use `elasticsearch_data_dirs`3.4-lts
        [Default: /var/lib/elasticsearch]
        type: str

- elasticsearch_data_dirs
        Location of the elasticsearch data folders. See
        `elasticsearch_data_dir` for details. If both
        `elasticsearch_data_dir`` and `elasticsearch_data_dirs` is
        set, the latter is used.
        [Default: ['{{ elasticsearch_data_dir }}']]
        type: list

- elasticsearch_discovery_type
        Set the elasticsearch discovery mode.
        (Choices: single-node, zen)[Default: zen]
        type: str

- elasticsearch_heap_size
        Set the RAM that elasticsearch allocates in gigabytes. If not
        set ansible will detect the total RAM available and will try
        to find a good setting. On a dedicated machine you should use
        no more than 50% of the available RAM, but not more than 32
        gigabytes. On a machine with clusternode and storagenode,
        giving 25% of RAM to elasticsearch is a good starting point.
        In general less is more. A machine starved out of RAM will not
        perform well.
        [Default: None]
        type: int

- elasticsearch_install_aws_plugins
        Install the AWS elasticsearch plugins. Note that this is an
        experimental feature and will need addtional work. For example
        this can currently fail upgrades and will not work in offline
        mode.
        [Default: False]
        type: bool

- elasticsearch_install_azure_plugins
        Install the azure elasticsearch plugins. Note that this is an
        experimental feature and will need addtional work. For example
        this can currently fail upgrades and will not work in offline
        mode.
        [Default: False]
        type: bool

- elasticsearch_install_google_plugins
        Install the Google elasticsearch plugins. Note that this is an
        experimental feature and will need addtional work. For example
        this can currently fail upgrades and will not work in offline
        mode.
        [Default: False]
        type: bool

- elasticsearch_memory_lock
        Should Elasticsearch lock all memory to prevent memory
        fragmentation. This should always be set to True unless in
        test / dev enviroment with extremly low RAM. Setting this to
        FALSE will have a severe performance impact.
        [Default: True]
        type: bool

- elasticsearch_network_interface
        Only use a given network interface, e.g. eth0 or eth1.
        [Default: None]
        type: str

- elasticsearch_network_ip_protocol
        Should elasticsearch listen to IPV4 or IPV6.
        (Choices: ipv4, ipv6)[Default: ipv4]
        type: str

- elasticsearch_replica_count
        Number of elastisearch index shards replicas. In multi-node
        deployments set at least to 1 to ensure fault tolerance.
        [Default: 9]
        type: int

- elasticsearch_shards_number
        Number of elastisearch index shards for new indices.
        [Default: 6]
        type: int

- elasticsearch_squirro_cluster_name
        Name of the elasticsearch cluster. Set a unique string if you
        intend to run a multi-node setup.
        [Default: squirro-cluster-{{ ansible_hostname | to_uuid }}]
        type: str

- frontent_flask_secret_key
        If set the flask secret key for squirro session is set to this
        value. Once set, if this value is unset the previous value
        remains in the frontend.ini file
        [Default: None]
        type: str

- mode
        Is an installation or upgrade performed. This option is marked
        for deprecation.
        (Choices: install, upgrade)[Default: install]
        type: str

- mysql_configuration_password
        Password used for the configuration service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_data_dir
        Path where the local mysql data folder needs to be placed.
        This only works if set before the MariaDB server is installed.
        [Default: /var/lib/mysql]
        type: str

- mysql_datasource_password
        Password used for the datasource service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_emailsender_password
        Password used for the emailsender service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_filtering_password
        Password used for the filtering service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_fingerprint_password
        Password used for the fingerprint service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_host
        Hostname of the remote MySQL/MariaDB server. Example:
        `yourproject-db.abcdefghij.eu-central-1.rds.amazonaws.com`.
        Only relevant if `remote_mysql_server` is set to True
        [Default: None]
        type: str

- mysql_machinelearning_password
        Password used for the machinelearning service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_plumber_password
        Password used for the plumber service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_port
        TCP port of the remote MySQL/MariaDB server. Only relevant if
        `remote_mysql_server` is set to True
        [Default: 3306]
        type: int

- mysql_root_password
        Password of the remote MySQL/MariaDB server root user. Only
        relevant if `remote_mysql_server` is set to True.
        [Default: None]
        type: str

- mysql_root_user
        Username of the remote MySQL/MariaDB server root user. Only
        relevant if `remote_mysql_server` is set to True.
        [Default: None]
        type: str

- mysql_scheduler_password
        Password used for the scheduler service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_shared_service_password
        Default password to use for the various squirro services. Note
        that options exist to have a unique password for each
        services.
        [Default: squirro/4u]
        type: str

- mysql_shared_service_user_hostname
        Username hostname string. This is required for some
        MySQL/MariaDB deployment, e.g. on Microsoft Azure. See the
        Azure example playbook as well.
        [Default: None]
        type: str

- mysql_ssl
        Use SSL for the MySQL/MariaDB connection.
        [Default: False]
        type: bool

- mysql_ssl_ca_certs
        Path to the ca certificate file used for the MySQL/MariaDB
        connections.
        [Default: <python site-packages path>/certifi/cacert.pem]
        type: str

- mysql_topic_password
        Password used for the topic service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_trends_password
        Password used for the trends service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- mysql_user_password
        Password used for the user service sql connection.
        [Default: {{ mysql_shared_service_password }}]
        type: str

- nginx_primary_http_port
        If set, the primary nginx http port tcp 80 will be changed to
        this port
        [Default: None]
        type: int

- postgres_configuration_password
        Password used for the configuration service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_datasource_password
        Password used for the datasource service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_emailsender_password
        Password used for the emailsender service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_filtering_password
        Password used for the filtering service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_fingerprint_password
        Password used for the fingerprint service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_host
        Hostname of the remote PostgreSQL server. Example:
        `yourproject-db.abcdefghij.eu-central-1.rds.amazonaws.com`.
        Only relevant if `remote_postgres_server` is set to True
        [Default: None]
        type: str

- postgres_machinelearning_password
        Password used for the machinelearning service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_plumber_password
        Password used for the plumber service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_port
        TCP port of the remote PostgreSQL server. Only relevant if
        `remote_postgres_server` is set to True
        [Default: 5432]
        type: int

- postgres_scheduler_password
        Password used for the scheduler service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_shared_service_password
        Default password to use for the various squirro services. Note
        that options exist to have a unique password for each
        services.
        [Default: squirro/4u]
        type: str

- postgres_topic_password
        Password used for the topic service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_trends_password
        Password used for the trends service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- postgres_user_password
        Password used for the user service sql connection.
        [Default: {{ postgres_shared_service_password }}]
        type: str

- redis_cache_host
        Hostname of the remote Redis server for caching. Example:
        `master.storage-abcdefgh.euc1.cache.amazonaws.com`. Only
        relevant if `remote_redis_server` is set to True.
        [Default: None]
        type: str

- redis_cache_password
        Password of the remote Redis server for caching. Only relevant
        if `remote_redis_server` is set to True.
        [Default: None]
        type: str

- redis_cache_port
        TCP port of the remote Redis server for caching. Only relevant
        if `remote_redis_server` is set to True.
        [Default: None]
        type: int

- redis_data_dir
        Path where the local redis data folder needs to be placed.
        This only works if set before the Redis servers are installed.
        [Default: /var/lib/redis]
        type: str

- redis_ssl
        Use SSL/TLS for the Redis connections
        [Default: False]
        type: bool

- redis_ssl_ca_certs
        Path to the ca certificate file used for the Redis
        connections.
        [Default: <python site-packages path>/certifi/cacert.pem]
        type: str

- redis_ssl_verify
        Set to False if the Redis server SSL/TLS certificate is self
        signed and/or otherwise untrusted or invalid.
        [Default: True]
        type: bool

- redis_storage_host
        Hostname of the remote Redis server for persistent storage.
        Example: `master.storage-abcdefgh.euc1.cache.amazonaws.com`.
        Only relevant if `remote_redis_server` is set to True.
        [Default: None]
        type: str

- redis_storage_password
        Password of the remote Redis server for persistant storage.
        Only relevant if `remote_redis_server` is set to True.
        [Default: None]
        type: str

- redis_storage_port
        TCP port of the remote Redis server for persistant storage.
        Only relevant if `remote_redis_server` is set to True.
        [Default: None]
        type: int

- remote_filesystem
        Set to true if the various Squirro assets must be placed on a
        remote/shared fileystems. Examples are NFS, EFS, SMB. This
        must be set before the first run, and cannot be changed later.
        [Default: False]
        type: bool

- remote_filesystem_path
        Path to the already mounted remote fileystem. Only relevant if
        `remote_fileystem` is set to True. This must be set before the
        first run, and cannot be changed later. Example:
        `/mnt/efs/squirro`
        [Default: None]
        type: str

- remote_mysql_server
        Set to True if you want to leverage a remote MySQL/MariaDB
        server. e.g. via AWS RDS.
        [Default: False]
        type: bool

- remote_postgres_server
        Set to True if you want to leverage a remote PostgreSQL
        server. e.g. via AWS RDS.
        [Default: False]
        type: bool

- remote_redis_server
        Set to True if you want to leverate remote Redis servers. e.g.
        via AWS ElastiCache or Redis lib_squirro_storage.stat.isdir
        [Default: False]
        type: bool

- service_endpoint_baseurl
        Can be set to a different service endpoint protocol, hostname
        and port. This is only of relevance if a custom nginx server
        needs to be used.
        [Default: http://localhost:81]
        type: str

- squirro_channel
        Which Squirro release channel from the mirror to use for the
        installation. Only applies when squirro_install_mode is set to
        online.
        (Choices: stable, testing, unstable)[Default: stable]
        type: str

- squirro_clusternode
        If set to `True` the squirro-clusternode package is installed
        [Default: False]
        type: bool

- squirro_install_mode
        Where is ansible taking the packages from. online: Packages
        are retrieved from mirror.squirro.net. offline: Package is
        installed from a tar.gz file that can be downloaded from
        https://mirror.squirro.net. filesytem: Same as offline, but
        the tar.gz file is allready extracted. custom_url: Host the
        yum repo on a custom http or https url
        (Choices: online, offline, filesystem, custom_url)[Default:
        online]
        type: str

- squirro_packages_tar
        Location of a installation tar.gz file downloaded from
        https://mirror.squirro.net. Only relevant if
        `squirro_install_mode` is set to `offline`
        [Default: None]
        type: str

- squirro_service_group
        This parameter is related to 'squirro_service_user'. See there
        for further details, all squirro related files will get this
        gid, when squirro_service_user is set to a custom value.
        [Default: None]
        type: str

- squirro_service_user
        Set the linux uid that is used to run all the squirro python
        services. If not set (default), then each service is run with
        it own dedicated user (sqfront, sqtopic, etc) as provided by
        the squirro packages, if set any other value, all services run
        with this specific user. The user and its group (see
        squirro_service_group) needs to pre- exist. This action cannot
        be reverted, as information about detailed file ownership is
        lost in the process. This is only recommended in scenerios
        where updates are not run in place, but the instances or
        containers are discarded frequently (Cloud Instances, Docker,
        K8S, etc.)
        [Default: None]
        type: str

- squirro_storagenode
        If set to `True` the squirro-storagenode package is installed
        [Default: False]
        type: bool

- squirro_version
        Which Squirro version to install. Next to specific version
        numbers you can also use the strings `latest` (for the latest
        bi-weekly release) or `x.y-lts` (e.g., `3.4-lts` for the
        latest LTS release in the 3.4 series).
        [Default: latest]
        type: str

- yum_password
        Password for https://mirror.squirro.net. Only relevant if
        `squirro_install_mode` is set to `offline`. Reach out to
        [email protected] if you don't have this information.
        [Default: None]
        type: str

- yum_repo_folder
        Path to the extracted contents of the offline installer tar.tz
        file downloaded from mirror.squirro.net Only relevant if
        `squirro_install_mode` is set to `fileystem`.
        [Default: None]
        type: str

- yum_repo_url
        Full URL to a webserver hosting a full Squirro release. Only
        relevant if `squirro_install_mode` is set to `fileystem`.
        [Default: None]
        type: str

- yum_user
        Username for https://mirror.squirro.net. Only relevant if
        `squirro_install_mode` is set to `offline`. Reach out to
        [email protected] if you don't have this information.
        [Default: None]
        type: str