Elasticsearch Migration#

Starting from Release 3.7.4, Squirro storage nodes use Elasticsearch 8.x.

Warning

If you are upgrading Squirro from Release 3.6.5 or earlier, you must first upgrade Squirro to at least version 3.6.6 before proceeding further.

Before upgrading Elasticsearch to the new major version, it’s crucial to check if existing indices will work in the new Elasticsearch version.

Elasticsearch 8.x can only read indices created in version 7.0 or later. This means all indices created in Elasticsearch 6.x and earlier versions are not supported.

You must reindex them with Elasticsearch 7.x before proceeding with the upgrade.

This page explains how to migrate existing indices to work with Elasticsearch 8.x.

Overview#

To migrate your indices to Elasticsearch 8.x, you must perform the following steps:

  1. Download provided helper scripts.

  2. Detect any incompatible indices.

  3. Migrate any incompatible indices found.

  4. Verify the total number of documents in the reindexed indices match the original indices.

  5. Begin using the new indices.

  6. Validate changes.

  7. Delete old indices.

  8. Perform a final compatibility verification.

Download Helper Scripts#

Squirro provides helper scripts to make the migration as smooth as possible.

Follow the steps below to download the scripts:

  1. Download the package containing the scripts:

$ yum install squirro-elasticsearch-maintenance
  1. Go to the scripts directory:

$ cd /opt/squirro/elasticsearch/maintenance

Note: The scripts must be executed inside a Squirro virtual environment.

$ squirro_activate

Tip

All scripts allow configuring Elasticsearch server URL, user, password, and certificates. See available options by adding the –help when calling the script.

Detect Incompatible Indices#

To detect if you have any indices that are incompatible with the Elasticsearch 8.x, use the detect_incompatible_indices.py script as shown below:

$ python detect_incompatible_indices.py --elastic-version 8

Running the script will show you whether all indices are compatible with the Elasticsearch 8.x and if you can safely proceed with the Elasticsearch upgrade.

If all indices are compatible, the output will look like this:

All indices are compatible, you can proceed with the upgrade.

If there are indices that must be reindexed first, the output will look like this:

Found incompatible indices: ['squirro_v9_pudxlusdtiyxytp3xuoi9a', 'squirro_v9_fp']. You must reindex them with Elasticsearch 7.x or higher before proceeding with the upgrade.

Migrate Incompatible Indices#

If you found indices that must be reindexed to work properly with Elasticsearch 8.x, start the process by verifying the prerequisites below:

Prerequisites#

Sufficient Disk Space#

At a minimum, you must have spare disk space equal to the primary store size on the storage node.

Note: Documents marked for deletion are skipped during reindexing. As a result, source indices with many such documents require less space on the new index.

Permission to Modify the Index Locator#

Verify that the topic.custom-locator option is set to true in the Configuration Service.

Reindex#

Reference: Learn more about reindexing at Reindexing Elasticsearch.

To perform reindexing, follow the steps below:

  1. Stop required Squirro services using the following command:

$ systemctl stop sqingesterd && systemctl stop sqmachinelearningd && systemctl stop sqfilteringd && systemctl stop sqfingerprintd
  1. Run the reindex.py script and provide the name of the index you want to reindex together with the target index name, as shown below:

$ python reindex.py --original-index <ORIGINAL_INDEX> --target-index <TARGET_INDEX>

For example, if you want to reindex the index called squirro_v9_pudxlusdtiyxytp3xuoi9a, run the following command:

$ python reindex.py --original-index squirro_v9_pudxlusdtiyxytp3xuoi9a --target-index squirro_v9_pudxlusdtiyxytp3xuoi9a-reindexed

Tip

You can reindex multiple indices in parallel by running the script in the background. To accomplish this, add the & symbol and the end of the line.

The script saves logs and index statistics to the files with the reindex- suffix so you can review them to detect any issues.

Verify the Total Number of Documents#

To be sure that reindexing went well, you can verify the total number of documents in the reindexed index and compare it to the number of documents in the original one.

To do so, use the compare_docs.py script as shown below:

$ python compare_docs.py --original-index <ORIGINAL_INDEX> --target-index <TARGET_INDEX>

Note

During reindexing documents with invalid mapping are filtered out, so the number of documents may slightly vary without it meaning that something went wrong. Treat it more like a sanity check to capture obvious issues, for example empty index or half of documents missing.

Utilize the New Index#

After reindexing is done, the output is a new index with a different name than the original one.

To tell Squirro to use this new index instead of the original one, you must perform different actions depending on the index type.

Tip

Project-specific indices can be recognizable by the naming convention, which is squirro_v9_<RANDOM 22 CHARACTERS>.

Every Squirro project creates its own index and stores information about that index in the database.

This means that you must update the project locator in the database to point now to the new index.

To do so, use the update_index_locator.py script as shown below:

$ python update_index_locator.py --original-index <ORIGINAL_INDEX> --target-index <TARGET_INDEX> --token <SQUIRRO_TOKEN>

In addition to changing the pointer in the database, the script also closes the original index.

The script saves logs to the files with the rewire- suffix, so you can review them to detect any issues.

Apart from project-specific indices, where you can replace the original index name with a reindexed one, Squirro also creates other indices that have strictly defined names and cannot be renamed.

Some such indices include:

  • squirro_v9_fp

  • squirro_v9_filter

In these situations, you cannot use a different index name, you must instead reindex back an index with the same name as the original one.

For example, if you reindexed squirro_v9_fp and called the new index squirro_v9_fp-reindexed, you can reindex it back by executing the following command:

$ python reindex.py --original-index squirro_v9_fp-reindexed --target-index squirro_v9_fp

After reindexing back, you can then delete the redundant index using the following command:

$ curl -XDELETE http://localhost:81/ext/elastic/squirro_v9_fp-reindexed

Validate Changes#

Once you’ve migrated your indices and begun using them, you can validate that everything is working correctly by following the steps below:

  1. Restart Squirro services using the following command:

$ squirro_restart
  1. Check the Squirro status using the following command:

$ squirro_status

If all services are healthy, you can now do manual validation by performing searches, clicking on item details, displaying labels, and manually clicking through your dashboards.

Delete Original Index#

Once you’ve validated that the changes are working properly, you can delete the original index.

To do so, use the following command:

$ curl -XDELETE http://localhost:81/ext/elastic/<ORIGINAL_INDEX>

Note

If you reindexed a non-project-specific index (as described above), you’ve likely already deleted the original index in the previous steps.

Check Indices#

After you’ve migrated all incompatible indices, check once again to verify that all indices work with the Elasticsearch 8.x.

To do so, use the detect_incompatible_indices.py script shown below:

$ python detect_incompatible_indices.py --elastic-version 8