How to Set Up Oracle Database as Squirro’s Metadata Storage#

This page provides instructions on how to configure Squirro to work with the Oracle Database (instead of MariaDB or PostgreSQL) on a fresh single-node deployment.

Note

This integration has been developed and tested on Oracle Database 21c Express Edition on a Rocky Linux 8 machine with Squirro.

Overview of the Steps#

There are four primary steps to this process, laid out in detail below:

  1. Set up prerequisites, including a running Oracle Database instance.

  2. Configure Oracle for Squirro.

  3. Configure Squirro to use Oracle by installing the cx_Oracle package and editing the configuration files.

  4. Restart Squirro.

Note: Utility snippets are also provided at the end of this page.

I: Set Up Prerequisites#

Firstly, you must have an Oracle Database instance up and running.

Reference: If you require assistance setting one up, see Oracle’s Official Installation Guide for help.

After you have an instance up and running as per the guide above, perform the following three steps:

  1. Change to the oracle system user:

sudo su - oracle
  1. Add the following to the end of ~/.bashrc:

# Setting the Oracle Database XE Environment Variables
export ORACLE_SID=XE
export ORAENV_ASK=NO
export ORACLE_HOME=/opt/oracle/product/21c/dbhomeXE
. /opt/oracle/product/21c/dbhomeXE/bin/oraenv
  1. Install rlwrap to make it a bit easier to work with sqlplus:

dnf install rlwrap

II: Configure Oracle for Squirro#

Firstly, you must configure Oracle by creating a separate pluggable database (PDB) for every Squirro service.

In every PDB, we create two users:

  • an admin user

  • a user that will be used by the service itself to connect to the PDB.

The admin user is optional (but considered a good practice) and not needed by Squirro to operate.

You can skip the creation of the admin user if you think you won’t need it.

sq_user_pwd='<sq_user_pwd>'
pdb_admin_pwd='<pdb_admin_pwd>'
sq_users_dbs=('configuration' 'datasource' 'emailsender' 'filtering' 'fingerprint' 'machinelearning' 'notes' 'plumber' 'scheduler' 'studio' 'topic' 'trends' 'squser')
for sq_user_db in "${sq_users_dbs[@]}"; do
 echo "Setting up the PDB for the $sq_user_db service..."
 sqlplus / as sysdba << EOF
CREATE PLUGGABLE DATABASE $sq_user_db ADMIN USER admin IDENTIFIED BY $pdb_admin_pwd FILE_NAME_CONVERT = ('/opt/oracle/oradata/XE/pdbseed/', '/opt/oracle/oradata/XE/$sq_user_db/');
 ALTER PLUGGABLE DATABASE $sq_user_db OPEN;
ALTER SESSION SET CONTAINER=$sq_user_db;
CREATE TABLESPACE $sq_user_db DATAFILE '/opt/oracle/oradata/XE/$sq_user_db/${sq_user_db}01.dbf' SIZE 100M AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED;
CREATE USER $sq_user_db IDENTIFIED BY $sq_user_pwd DEFAULT TABLESPACE $sq_user_db TEMPORARY TABLESPACE temp;
GRANT CONNECT, RESOURCE TO $sq_user_db;
 ALTER USER $sq_user_db QUOTA UNLIMITED ON $sq_user_db;
EOF
 echo "Setup of the PDB for the $sq_user_db service has been completed."
done

Note

It’s best not to include any special characters, such as /, in the password. Special characters don’t tend to work well with Oracle. Including special characters may result in an inability to connect to the database.

For the user service, we create the database (PDB) and the database user as squser instead of user because it is a reserved word and Oracle does not allow that, as shown below:

SQL>   CREATE PLUGGABLE DATABASE user ADMIN USER admin IDENTIFIED BY <pdb_admin_pwd> FILE_NAME_CONVERT = ('/opt/oracle/oradata/XE/pdbseed/', '/opt/oracle/oradata/XE/user/')
                            *
ERROR at line 1:
ORA-65000: missing or invalid pluggable database name

In PostgreSQL, this can be circumvented by quoting the identifier, but Oracle does not support that. Therefore, a prefix is added to differentiate it.

III: Configure Squirro to Use Oracle#

Now, we need to configure Squirro to use Oracle as its database. This involves two steps:

  1. Install the cx_Oracle package.

  2. Edit the Squirro configuration files.

Install cx_Oracle#

Firstly, we need to install the package to use Oracle from Python:

squirro_activate
pip install cx_Oracle

mkdir -p /opt/oracle
cd /opt/oracle
wget https://download.oracle.com/otn_software/linux/instantclient/2112000/instantclient-basic-linux.x64-21.12.0.0.0dbru.zip
unzip instantclient-basic-linux.x64-21.12.0.0.0dbru.zip
rm -f instantclient-basic-linux.x64-21.12.0.0.0dbru.zip

yum install libaio

sh -c "echo /opt/oracle/instantclient_21_12 > /etc/ld.so.conf.d/oracle-instantclient.conf"
ldconfig

You can execute the following in a Python interpreter to evaluate if connection can be established successfully to one of the created PDBs:

dialect = "oracle+cx_oracle"
username = "datasource"
password = "<sq_user_pwd>"
hostname = "test-oracle-database.squirro.cloud"
port = 1539
service = "datasource.squirro.cloud"

import cx_Oracle
connection = cx_Oracle.connect(user=username, password=password, dsn=f"{hostname}:{port}/{service}")
cursor = connection.cursor()
cursor.execute("create table test (id number primary key)")
connection.commit()
for table_name in cursor.execute("select table_name from user_tables"):
  print(f"Values: {table_name}")
cursor.close()
connection.close()

Tip

As the oracle system user, you can use the lsnrctl status command to discover the correct hostname and service values.

Having verified that we can talk to Oracle from Python, we can also evaluate whether we can talk to it from SQLAlchemy by creating the tables of one of the Squirro services:

dialect = "oracle+cx_oracle"
username = "datasource"
password = "<sq_user_pwd>"
hostname = "test-oracle-database.squirro.cloud"
port = 1539
service = "datasource.squirro.cloud"

import cx_Oracle
from sqlalchemy import create_engine
from squirro.service.datasource.model import Base

dsn = cx_Oracle.makedsn(hostname, port, service_name=service)
database_url = f"{dialect}://{username}:{password}@{dsn}"
engine = create_engine(database_url)
Base.metadata.create_all(engine)

And we can verify from sqlplus that the tables have been created in the appropriate PDB:

# first ensure that you are the oracle system user
sudo su - oracle

rlwrap sqlplus [email protected]:1539/datasource.squirro.cloud
# input password

# verify that we are in datasource PDB
show con_name;

# display the tables in the PDB
select table_name from user_tables;

# you can drop the TEST table created previously (optional)
drop table test;

Edit Squirro Configuration Files#

You will then need to edit all the configuration files that define communication with the database and update them appropriately to connect to Oracle instead of MariaDB:

You can then verify that the database connection strings have been updated correctly for all the services by examining the output of the following command:

grep -E '^db =' /etc/squirro/*.ini

Secondly, you’ll need to disable the execution of the database migration scripts included in every service. This is because many migration scripts are not database agnostic.

However, in a fresh installation you don’t need to execute them, because the schema is created directly from the defined SQLAlchemy models.

To disable this, open the /etc/squirro/common.ini file and add the following:

[migration]
execute_at_startup = false

Thirdly, you’ll need to remove the ssl and ssl_ca options of the [db] section because they are not supported by cx_Oracle.

To do so, open the /etc/squirro/common.ini file and remove or comment out them:

[db]
#ssl = False
#ssl_ca = /etc/ssl/certs/ca-bundle.trust.crt

IV: Restart Squirro#

Finally, you must stop all the Squirro services, stop and disable the MariaDB service, and restart all the Squirro services as follows:

squirro_stop
systemctl stop mariadb; systemctl disable mariadb
squirro_restart

Utility Snippets#

Use the following snippet to open all the PDBs of Squirro after a database restart, as they are reported as closed after restart:

sq_users_dbs=('configuration' 'datasource' 'emailsender' 'filtering' 'fingerprint' 'machinelearning' 'notes' 'plumber' 'scheduler' 'studio' 'topic' 'trends' 'squser')
for sq_user_db in "${sq_users_dbs[@]}"; do
  echo "Opening the PDB of the $sq_user_db service..."
  sqlplus / as sysdba << EOF
  ALTER PLUGGABLE DATABASE $sq_user_db OPEN;
EOF
  echo "Done."
done

Use the following snippet to drop all the PDBs of Squirro:

sq_users_dbs=('configuration' 'datasource' 'emailsender' 'filtering' 'fingerprint' 'machinelearning' 'notes' 'plumber' 'scheduler' 'studio' 'topic' 'trends' 'squser')
for sq_user_db in "${sq_users_dbs[@]}"; do
  echo "Dropping the PDB of the $sq_user_db service..."
  sqlplus / as sysdba << EOF
  ALTER PLUGGABLE DATABASE $sq_user_db CLOSE IMMEDIATE;
  DROP PLUGGABLE DATABASE $sq_user_db INCLUDING DATAFILES;
EOF
  echo "Dropped successfully the PDB of the $sq_user_db service."
done