Elasticsearch Management#

Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.

Please also refer to the official Elasticsearch Documentation for help.

Note

Starting from Release 3.7.4, Squirro storage nodes use Elasticsearch 8.x. If you are upgrading Squirro from Release 3.7.3 or earlier, you will also need to upgrade Elasticsearch to 8.x. See Elasticsearch Migration to learn more.

For the default setup of Squirro, the following table shows the most important Elasticsearch folders and files on the Squirro storage node:

Location

Description

/etc/sysconfig/elasticsearch

elasticsearch sysconfig file, customization of the elasticsearch setting, e.g ES_HOME, CONF_DIR, DATA_DIR, LOG_DIR

/etc/elasticsearch/elasticsearch.yml

elasticsearch config file, customization of elasticsearch config, e.g. cluster name, http port…

/etc/elasticsearch/

elasticsearch config directory: elasticsearch.yml, jvm.options, certs (symlink not allowed)

/var/log/elasticsearch/

elasticsearch log folder (symlink is allowed)

/var/lib/elasticsearch/

elasticsearch data folder (where we store indices) (symlink is allowed)

/usr/share/elasticsearch/

where elasticsearch is installed, contains /bin, /lib, /plugins

Note

Examples in this guide use the es_curl tool and Squirro Storage Node URL with secured Elasticsearch, which is the default URL starting from Release 3.7.4. For the earlier deployments, use the curl command and http protocol instead.

curl and es_curl#

The instructions provided on this page assume you have upgraded to ES 8.x using Ansible.

If you have not upgraded using Ansible, or if you are using Squirro 3.7.3 or earlier, you will not have access to the es_curl command.

Instead, use the curl command. If you are using secured Elasticsearch, you will need to provide the username and password in the command.

For example, instead of the following:

es_curl -XGET https://localhost:9200/_cat/indices?v

You can use:

curl -XGET --user elastic:<your password> https://localhost:9200

Warning

To use the es_curl tool a user must be either a root, or member of elasticsearch or squirro groups.

Monitoring#

Checking the Size of Indices in ES#

es_curl -XGET https://localhost:9200/_cat/indices?v

You should see a list of the Squirro indices in Squirro with their sizes and the Project ID at the end of the Index name.

Check Cluster/Nodes Are Healthy

es_curl https://localhost:9200/_cluster/health?pretty=true

You should see a JSON containing "status" : "green"

Check that the Elasticsearch Service is Started#

ps aux | grep [e]lasticsearch

This command also allows you to view additional settings of the Elasticsearch service if it’s started, for example memory, default.path, and process id:

496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch

To start, stop, and restart Elasticsearch service:

systemctl {start|stop|restart} elasticsearch

View Indices, Status, and Size#

Each Squirro project corresponds by defualt to an index with the following format squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v9_36syha3_ss-zwvn9gyk1ww

es_curl https://localhost:9200/_cat/indices?v

View Shards, Status, and Size#

Each Elasticsearch index usually contains several shards. number_of_shards is defined in index setting.

es_curl https://localhost:9200/_cat/shards?v

Tip: There are plenty of useful _cat commands to investigate Elasticsearch, just type es_curl 'https://localhost:9200/_cat' to see them.

View Squirro Templates#

This ensures the Squirro templates are used in the Squirro storage node:

# see detail of all templates
es_curl -s https://localhost:9200/_index_template?pretty
# see only squirro template names
es_curl -s https://localhost:9200/_index_template?pretty | grep squirro_v

View Number of Shards and Replicas#

This ensures the number of shards and replicas is set correctly in the Squirro storage node:

# in template
es_curl -s https://localhost:9200/_index_template?pretty | grep -e number_of -e squirro_v
# in setting of given index
es_curl -s https://localhost:9200/{index_name}/_settings?pretty | grep -e number_of -e squirro_v

Tip: You can use wildcard syntax * in index_name, e.g /squirro_v9_*/

View Mapping and Setting of a Given Index#

This ensures a newly created index uses the correct Squirro template:

es_curl https://localhost:9200/{index_name}/_mappings?pretty
es_curl https://localhost:9200/{index_name}/_settings?pretty

View Elasticsearch Stats#

es_curl https://localhost:9200/_stats?pretty
# stats of given index
es_curl https://localhost:9200/{index_name}/_stats?pretty
# stats of nodes
es_curl https://localhost:9200/_nodes/stats?pretty
# stats of cpu and memory
es_curl https://localhost:9200/_nodes/stats/os?pretty
# stats of file system
es_curl https://localhost:9200/_nodes/stats/fs?pretty
# stats of jvm
es_curl https://localhost:9200/_nodes/stats/jvm?pretty
# stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script)
es_curl https://localhost:9200/_nodes/stats/indices/fielddata?pretty

Set Replicas in Multiple Nodes Cluster#

When you have multiple storage nodes, it’s suggested that you create at least one replica for the indices in case a node is shut down. Having a replica will ensure the storage cluster still works.

es_curl -XPUT https://localhost:9200/squirro_v9_*/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'

Investigate Index Content#

View the number of documents and randomly one document in the index:

# query whole index
es_curl https://localhost:9200/{index_name}/_search?pretty&size=1

You should see in the response number of documents, for example

{
"hits" : {
    "total": 12345,
    ...
    "hits": [...list of items...]
}

To get an item by id:

es_curl 'https://localhost:9200/{index_name}/_doc/{item_id}?pretty'

To query the number of items modified before a given time:

es_curl https://localhost:9200/{index_name}/_search?pretty -d '
{
    "query": {
        "range" : {
            "modified_at" : {
                "lte" : "2018-02-13T18:01:44"
            }
        }
    },
    "size": 1
}'

Delete Documents by Query#

Before deleting by a query you must ensure that the query return only documents you want to delete by searching for that query and reviewing the results:

es_curl https://localhost:9200/{index_name}/_search?pretty -d '
{
    "query": {
        "term" : {
            "assoc:sources" : "123456789abcdef"
        }
    }
}'

If you see an unwanted document in the index and you want to delete it, e.g. a document belonging to the same source with the id 123456789abcdef, you can delete it by using the following query:

es_curl -XPOST https://localhost:9200/{index_name}/_delete_by_query -d '
{
    "query": {
        "term" : {
            "assoc:sources" : "123456789abcdef"
        }
    }
}'

After deleting by query, the index size on the disk is not reduced because a document is not deleted from a segment, it is only marked as deleted.

If you want to free disk space after deleting, execute the following command:

es_curl -XPOST https://localhost:9200/{index_name}/_forcemerge?only_expunge_deletes=true

Configuring Elasticsearch Templates#

Squirro uses the concept of Component and Index templates to define the settings for the Elasticsearch indices.

For the reference, see https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html.

To modify the default templates settings you can use the Configuration Service.

Server Configuration topic.elasticsearch.index-templates-settings

{
    "squirro_v9": {
        "number_of_shards": 6,
        "number_of_replicas": 0,
        "additional_components": []
    }
}

You can configure any template you want by specifying an additional JSON object and using the template name as the option key, for example:

{
     "squirro_v9": {
        "number_of_shards": 6,
        "number_of_replicas": 0,
        "additional_components": []
    },
    "squirro_v9_filter": {
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "additional_components": []
    }
}

The additional_components option allows you to define the additional component templates that the index template will be composed of.

For example, if you want to add another field to the mapping called metadata, you can first create a new component template, using the following command:

es_curl -XPUT "https://localhost:9200/_component_template/metadata_property" -H "Content-Type: application/json" -d'
{
    "template": {
        "mappings": {
            "properties": {
                "metadata": {
                    "type": "object",
                    "enabled": false
                }
            }
        }
    }
}'

Then specify the newly created component template in the additional_components option as shown below:

{
    "squirro_v9": {
        "number_of_shards": 6,
        "number_of_replicas": 0,
        "additional_components": ["metadata_property"]
    }
}

After changes, restart Squirro services using the following command:

$ squirro_restart

Reindexing#

To learn more about reindexing, see Reindexing Elasticsearch.

Troubleshooting#

Shards are UNASSIGNED#

If the Elasticsearch status is not green because of an UNASSIGNED shard issue, see How to Resolve Unassigned Shards.

For advanced operations, it’s suggested that you contact Squirro Support for help.

However, there are some simple operations you can try:

Disk Free Space#

Check disk space and free it up. By default, Elasticsearch needs about 20% free space on a disk to reassign index shards to nodes. Investigate this issue by using command df -h ... and du -h ... to find out free disk space and identify files that can be safely deleted (e.g. files in /var/log/squirro or /var/log/elasticsearch).

Number of Replicas#

If you have only one instance of elasticsearch running but number_of_replicas in your index settings is bigger than 0, then you will also have a yellow status for that index with some unsigned shards. View them by:

es_curl -s https://localhost:9200/_cat/shards | grep UNASSIGNED

To check the number of replicas of an index:

es_curl -XGET https://localhost:9200/{index_name}/_settings?pretty | grep number_of_replicas

If number_of_replicas > 0 then set value to 0:

es_curl -XPUT https://localhost:9200/{index_name}/_settings -H "Content-Type: application/json" -d '{"number_of_replicas":0}'

Check as well the number of replicas in the template to ensure future indices do not have the wrong number_of_replicas setting:

es_curl -XGET https://localhost:9200/_index_template/squirro_v9?pretty | grep number_of_replicas

If number_of_replicas > 0 then modify the template (see above).

Retry Failed Allocation#

To find the explanation for the unassigned shard:

es_curl -XGET https://localhost:9200/_cluster/allocation/explain?pretty

To retry failed allocations:

es_curl -XPOST https://localhost:9200/_cluster/reroute?retry_failed

Indices are Moved#

Usually, Elasticsearch indices are stored in /var/lib/elasticsearch/. However, sometimes (server is down and cannot be recovered to old status, symlink is lost, old elasticsearch version puts indices under cluster name…), you cannot find /var/lib/elasticsearch/ because the index is stayed in another mounted point.

To fix this issue:

  1. Check content, permission, and owner of the /var/lib/elasticsearch folder:

sudo ls -l /var/lib/elasticsearch/

You should see folder nodes owned by the user elasticsearch and group elasticsearch.

  1. If /var/lib/elasticsearch is missing or its content is not your expected index then perform the following:

# stop elasticsearch
systemctl stop elasticsearch

Now, choose one of two solutions:

  1. Create a symlink to new index location

ln -s {your_new_index_location} /var/lib/elasticsearch
  1. Point Elasticsearch data dir to the new location in the config file:

vi /etc/elasticsearch/elasticsearch.yml
# edit the line path.data: {your_new_index_location}
  1. Set owner, restart service:

# set owner and group for data folder
chown -R elasticsearch:elasticsearch {your_index_location}
# start elasticsearch again
systemctl start elasticsearch
# check indices status
es_curl https://localhost:9200/_cat/indices?v

Too Many Scroll Contexts#

If you encounter this type of exception:

Trying to create too many scroll contexts. Must be less than or equal to: [500]

you can increase the limit of Elasticsearch by running this command on one of the nodes:

es_curl -X PUT https://localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{
    "persistent" : {
        "search.max_open_scroll_context": 1000
    },
    "transient": {
        "search.max_open_scroll_context": 1000
    }
}'

Alternatively, and better over the longterm, is to reduce the scroll argument of any squirro_client.scan() usage from the default 5m to something like 1m.

Elasticsearch Fails to Start with Error Message#

Elasticsearch fails to start with “Unable to load JNA native support library, native methods will be disabled” error message in the log:

This happens when Elasticsearch tries to use the /tmp/ folder, but that folder is mounted with the noexec flag.

Alternatively, if another temporary folder is used, the Elasticsearch service user has no execution rights in that folder.

The main reason why the noexec flag would be set on tmp is OS hardening. The tmp folder can be leveraged by bad actors to store and execute things.

In a highly hardened system this is not desirable and hence the noexec flag is often set.

The workaround for this is to edit /etc/sysconfig/elasticsearch and to add this line:

ES_TMPDIR=/usr/share/elasticsearch/tmp

This would be a sensible default. But any location will do, as long as the folder is owned by the elasticsearch uid and gid.

Memory Setting#

You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options.d/squirro.options.

Minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other, for example:

-Xms8g
-Xmx8g

Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches. For more information, see the Elasticsearch Heap Size Settings page.

If you run the cluster node and storage node on the same machine, then set the memory for Elasticsearch to no more than 30% of your physical RAM.

Elasticsearch also does not allow you to set more than 32GB of memory.

Warning: In older installations of Squirro, the amount of memory available to Elasticsearch was set in file /etc/elasticsearch/jvm.options.

When both files are present in the system, /etc/elasticsearch/jvm.options.d/squirro.options overrides the options in /etc/elasticsearch/jvm.options.

Testing an Elasticsearch Query#

Sometimes you see an exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:

  1. Save the query in JSON format, e.g to /tmp/es_request.json

  2. Make request to ES using the following as the input file:

es_curl https://localhost:9200/{index_name}/_search?pretty -d @/tmp/es_request.json

Cluster Block Exception in ES Logs#

If the disk usage on the Elasticsearch cluster goes beyond a certain limit, Elasticsearch marks all the indices read only, only allowing the deletion of indices/documents to facilitate space recovery.

To make the indices writable again, first make sure that you have more than 80% disk space available, either by removing old log/unncessary files, adding more disk space, or by any other means. Once done, execute the following on each index which is marked as read only:

es_curl -XPUT https://localhost:9200/*/_settings -H "Content-Type: application/json" -d '{"index.blocks.read_only_allow_delete": null}'

Recover From a Corrupted Index#

The following are symptoms of a corrupted index:

  • Elasticsearch cluster state is red

  • One or multiple shards are not allocated

  • Output of the following:

es_curl -XGET https://localhost:9200/_cluster/allocation/explain?pretty

Looks like this example:

{
  "index": "squirro_v9_spzhmtdsrrc78oodbnolza",
  "shard": 5,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "CLUSTER_RECOVERED",
    "at": "2020-02-14T19:20:02.789Z",
    "last_allocation_status": "no_valid_shard_copy"
  },
  "can_allocate": "no_valid_shard_copy",
  "allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions": [
    {
      "node_id": "7IDt77EJR-uh5PknzF26_Q",
      "node_name": "squirro-node-2a679356-9373-58ec-bad1-d812fbed0cad",
      "transport_address": "127.0.0.1:9300",
      "node_decision": "no",
      "store": {
        "in_sync": true,
        "allocation_id": "xLLfLDCbTgaePDz_oivuQA",
        "store_exception": {
          "type": "corrupt_index_exception",
          "reason": "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
          "caused_by": {
            "type": "i_o_exception",
            "reason": "failed engine (reason: [merge failed])",
            "caused_by": {
              "type": "corrupt_index_exception",
              "reason": "codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/_oon.cfs\") [slice=_oon_Lucene50_0.tim]))"
            }
          }
        }
      }
    }
  ]
}

At this point, if you have replicas, snapshots, or backups, the only correct option is to recover them.

The following steps will get your index back up and running in green state, but you will most likely lose documents in the process:

Return to a Green State Example

Use the Lucene CheckIndex utility to validate and fix the corrupted index.

  1. Stop elasticsearch

  2. Note the affected shards in the message above. In this example scenario, it’s shard 5.

  3. Locate the data folder of this shared, in this example it is:

/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index (also printed in the above message)

  1. Backup the affected shard folder, for example:

tar cvf /tmp/shard5.tar.gz /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5

  1. Enter the lib folder of elasticsearch:

cd /usr/share/elasticsearch/lib

  1. Run the following command (adjust the folder to your situation):

java -cp lucene-core*.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/ -verbose -exorcise
  1. Check the summary output of the tool, in this case:

WARNING: 1 broken segments (containing 179 documents) detected
Took 545.101 sec total.
WARNING: 179 documents will be lost

NOTE: will write new segments file in 5 seconds; this will remove 179 docs from the index. YOU WILL LOSE DATA. THIS IS YOUR LAST CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file "segments_er0"

The example above is good news, given that the tool was able to fix the corrupted segment.

However, 179 documents were lost, and you are not told which ones.

  1. Enter the index folder, in this case cd /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/

  2. Remove any files that start with corrupted, in this case: rm corrupted_Qyj-NdANTo2vr-aUDR6l_g

  3. Start Elasticsearch

  4. Check Elasticsearch status

Install/Remove Elasticsearch Plugins#

Use the following command to install an Elasticsearch plugin:

elasticsearch-plugin install <plugin name>

Use the following command to remove an Elasticsearch plugin

elasticsearch-plugin remove <plugin name>