Elasticsearch Management#
Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.
Please also refer to the official Elasticsearch Documentation for help.
Note
Starting from Release 3.7.4, Squirro storage nodes use Elasticsearch 8.x. If you are upgrading Squirro from Release 3.7.3 or earlier, you will also need to upgrade Elasticsearch to 8.x. See Elasticsearch Migration to learn more.
For the default setup of Squirro, the following table shows the most important Elasticsearch folders and files on the Squirro storage node:
Location |
Description |
---|---|
|
elasticsearch sysconfig file, customization of the elasticsearch setting, e.g |
|
elasticsearch config file, customization of elasticsearch config, e.g. cluster name, http port… |
|
elasticsearch config directory: elasticsearch.yml, jvm.options, certs (symlink not allowed) |
|
elasticsearch log folder (symlink is allowed) |
|
elasticsearch data folder (where we store indices) (symlink is allowed) |
|
where elasticsearch is installed, contains /bin, /lib, /plugins |
Note
Examples in this guide use the es_curl tool and Squirro Storage Node URL with secured Elasticsearch, which is the default URL starting from Release 3.7.4. For the earlier deployments, use the curl command and http protocol instead.
curl and es_curl#
The instructions provided on this page assume you have upgraded to ES 8.x using Ansible.
If you have not upgraded using Ansible, or if you are using Squirro 3.7.3 or earlier, you will not have access to the es_curl
command.
Instead, use the curl
command. If you are using secured Elasticsearch, you will need to provide the username and password in the command.
For example, instead of the following:
es_curl -XGET https://localhost:9200/_cat/indices?v
You can use:
curl -XGET --user elastic:<your password> https://localhost:9200
Warning
To use the es_curl tool a user must be either a root, or member of elasticsearch or squirro groups.
Monitoring#
Checking the Size of Indices in ES#
es_curl -XGET https://localhost:9200/_cat/indices?v
You should see a list of the Squirro indices in Squirro with their sizes and the Project ID at the end of the Index name.
Check Cluster/Nodes Are Healthy
es_curl https://localhost:9200/_cluster/health?pretty=true
You should see a JSON containing "status" : "green"
Check that the Elasticsearch Service is Started#
ps aux | grep [e]lasticsearch
This command also allows you to view additional settings of the Elasticsearch service if it’s started, for example memory
, default.path
, and process id
:
496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
To start, stop, and restart Elasticsearch service:
systemctl {start|stop|restart} elasticsearch
View Indices, Status, and Size#
Each Squirro project corresponds by defualt to an index with the following format squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v9_36syha3_ss-zwvn9gyk1ww
es_curl https://localhost:9200/_cat/indices?v
View Shards, Status, and Size#
Each Elasticsearch index usually contains several shards. number_of_shards
is defined in index setting.
es_curl https://localhost:9200/_cat/shards?v
Tip: There are plenty of useful _cat
commands to investigate Elasticsearch, just type es_curl 'https://localhost:9200/_cat'
to see them.
View Squirro Templates#
This ensures the Squirro templates are used in the Squirro storage node:
# see detail of all templates
es_curl -s https://localhost:9200/_index_template?pretty
# see only squirro template names
es_curl -s https://localhost:9200/_index_template?pretty | grep squirro_v
View Number of Shards and Replicas#
This ensures the number of shards and replicas is set correctly in the Squirro storage node:
# in template
es_curl -s https://localhost:9200/_index_template?pretty | grep -e number_of -e squirro_v
# in setting of given index
es_curl -s https://localhost:9200/{index_name}/_settings?pretty | grep -e number_of -e squirro_v
Tip: You can use wildcard syntax * in index_name, e.g /squirro_v9_*/
View Mapping and Setting of a Given Index#
This ensures a newly created index uses the correct Squirro template:
es_curl https://localhost:9200/{index_name}/_mappings?pretty
es_curl https://localhost:9200/{index_name}/_settings?pretty
View Elasticsearch Stats#
es_curl https://localhost:9200/_stats?pretty
# stats of given index
es_curl https://localhost:9200/{index_name}/_stats?pretty
# stats of nodes
es_curl https://localhost:9200/_nodes/stats?pretty
# stats of cpu and memory
es_curl https://localhost:9200/_nodes/stats/os?pretty
# stats of file system
es_curl https://localhost:9200/_nodes/stats/fs?pretty
# stats of jvm
es_curl https://localhost:9200/_nodes/stats/jvm?pretty
# stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script)
es_curl https://localhost:9200/_nodes/stats/indices/fielddata?pretty
Set Replicas in Multiple Nodes Cluster#
When you have multiple storage nodes, it’s suggested that you create at least one replica for the indices in case a node is shut down. Having a replica will ensure the storage cluster still works.
es_curl -XPUT https://localhost:9200/squirro_v9_*/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
Investigate Index Content#
View the number of documents and randomly one document in the index:
# query whole index
es_curl https://localhost:9200/{index_name}/_search?pretty&size=1
You should see in the response number of documents, for example
{
"hits" : {
"total": 12345,
...
"hits": [...list of items...]
}
To get an item by id:
es_curl 'https://localhost:9200/{index_name}/_doc/{item_id}?pretty'
To query the number of items modified before a given time:
es_curl https://localhost:9200/{index_name}/_search?pretty -d '
{
"query": {
"range" : {
"modified_at" : {
"lte" : "2018-02-13T18:01:44"
}
}
},
"size": 1
}'
Delete Documents by Query#
Before deleting by a query you must ensure that the query return only documents you want to delete by searching for that query and reviewing the results:
es_curl https://localhost:9200/{index_name}/_search?pretty -d '
{
"query": {
"term" : {
"assoc:sources" : "123456789abcdef"
}
}
}'
If you see an unwanted document in the index and you want to delete it, e.g. a document belonging to the same source with the id 123456789abcdef
, you can delete it by using the following query:
es_curl -XPOST https://localhost:9200/{index_name}/_delete_by_query -d '
{
"query": {
"term" : {
"assoc:sources" : "123456789abcdef"
}
}
}'
After deleting by query, the index size on the disk is not reduced because a document is not deleted from a segment, it is only marked as deleted.
If you want to free disk space after deleting, execute the following command:
es_curl -XPOST https://localhost:9200/{index_name}/_forcemerge?only_expunge_deletes=true
Configuring Elasticsearch Templates#
Squirro uses the concept of Component and Index templates to define the settings for the Elasticsearch indices.
For the reference, see https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html.
To modify the default templates settings you can use the Configuration Service.
Server Configuration topic.elasticsearch.index-templates-settings
{
"squirro_v9": {
"number_of_shards": 6,
"number_of_replicas": 0,
"additional_components": []
}
}
You can configure any template you want by specifying an additional JSON object and using the template name as the option key, for example:
{
"squirro_v9": {
"number_of_shards": 6,
"number_of_replicas": 0,
"additional_components": []
},
"squirro_v9_filter": {
"number_of_shards": 1,
"number_of_replicas": 0,
"additional_components": []
}
}
The additional_components option allows you to define the additional component templates that the index template will be composed of.
For example, if you want to add another field to the mapping called metadata, you can first create a new component template, using the following command:
es_curl -XPUT "https://localhost:9200/_component_template/metadata_property" -H "Content-Type: application/json" -d'
{
"template": {
"mappings": {
"properties": {
"metadata": {
"type": "object",
"enabled": false
}
}
}
}
}'
Then specify the newly created component template in the additional_components option as shown below:
{
"squirro_v9": {
"number_of_shards": 6,
"number_of_replicas": 0,
"additional_components": ["metadata_property"]
}
}
After changes, restart Squirro services using the following command:
$ squirro_restart
Reindexing#
To learn more about reindexing, see Reindexing Elasticsearch.
Troubleshooting#
Disk Free Space#
Check disk space and free it up. By default, Elasticsearch needs about 20% free space on a disk to reassign index shards to nodes. Investigate this issue by using command df -h ...
and du -h ...
to find out free disk space and identify files that can be safely deleted (e.g. files in /var/log/squirro or /var/log/elasticsearch
).
Number of Replicas#
If you have only one instance of elasticsearch running but number_of_replicas
in your index settings is bigger than 0, then you will also have a yellow
status for that index with some unsigned shards. View them by:
es_curl -s https://localhost:9200/_cat/shards | grep UNASSIGNED
To check the number of replicas of an index:
es_curl -XGET https://localhost:9200/{index_name}/_settings?pretty | grep number_of_replicas
If number_of_replicas > 0 then set value to 0:
es_curl -XPUT https://localhost:9200/{index_name}/_settings -H "Content-Type: application/json" -d '{"number_of_replicas":0}'
Check as well the number of replicas in the template to ensure future indices do not have the wrong number_of_replicas
setting:
es_curl -XGET https://localhost:9200/_index_template/squirro_v9?pretty | grep number_of_replicas
If number_of_replicas > 0 then modify the template (see above).
Retry Failed Allocation#
To find the explanation for the unassigned shard:
es_curl -XGET https://localhost:9200/_cluster/allocation/explain?pretty
To retry failed allocations:
es_curl -XPOST https://localhost:9200/_cluster/reroute?retry_failed
Indices are Moved#
Usually, Elasticsearch indices are stored in /var/lib/elasticsearch/
. However, sometimes (server is down and cannot be recovered to old status, symlink is lost, old elasticsearch version puts indices under cluster name…), you cannot find /var/lib/elasticsearch/
because the index is stayed in another mounted point.
To fix this issue:
Check content, permission, and owner of the
/var/lib/elasticsearch
folder:
sudo ls -l /var/lib/elasticsearch/
You should see folder nodes owned by the user elasticsearch
and group elasticsearch
.
If
/var/lib/elasticsearch
is missing or its content is not your expected index then perform the following:
# stop elasticsearch
systemctl stop elasticsearch
Now, choose one of two solutions:
Create a symlink to new index location
ln -s {your_new_index_location} /var/lib/elasticsearch
Point Elasticsearch data dir to the new location in the config file:
vi /etc/elasticsearch/elasticsearch.yml
# edit the line path.data: {your_new_index_location}
Set owner, restart service:
# set owner and group for data folder
chown -R elasticsearch:elasticsearch {your_index_location}
# start elasticsearch again
systemctl start elasticsearch
# check indices status
es_curl https://localhost:9200/_cat/indices?v
Too Many Scroll Contexts#
If you encounter this type of exception:
Trying to create too many scroll contexts. Must be less than or equal to: [500]
you can increase the limit of Elasticsearch by running this command on one of the nodes:
es_curl -X PUT https://localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{
"persistent" : {
"search.max_open_scroll_context": 1000
},
"transient": {
"search.max_open_scroll_context": 1000
}
}'
Alternatively, and better over the longterm, is to reduce the scroll argument of any squirro_client.scan()
usage from the default 5m to something like 1m.
Elasticsearch Fails to Start with Error Message#
Elasticsearch fails to start with “Unable to load JNA native support library, native methods will be disabled” error message in the log:
This happens when Elasticsearch tries to use the /tmp/
folder, but that folder is mounted with the noexec
flag.
Alternatively, if another temporary folder is used, the Elasticsearch service user has no execution rights in that folder.
The main reason why the noexec
flag would be set on tmp
is OS hardening. The tmp
folder can be leveraged by bad actors to store and execute things.
In a highly hardened system this is not desirable and hence the noexec
flag is often set.
The workaround for this is to edit /etc/sysconfig/elasticsearch
and to add this line:
ES_TMPDIR=/usr/share/elasticsearch/tmp
This would be a sensible default. But any location will do, as long as the folder is owned by the elasticsearch uid and gid.
Memory Setting#
You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options.d/squirro.options
.
Minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other, for example:
-Xms8g
-Xmx8g
Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches. For more information, see the Elasticsearch Heap Size Settings page.
If you run the cluster node and storage node on the same machine, then set the memory for Elasticsearch to no more than 30% of your physical RAM.
Elasticsearch also does not allow you to set more than 32GB of memory.
Warning: In older installations of Squirro, the amount of memory available to Elasticsearch was set in file /etc/elasticsearch/jvm.options
.
When both files are present in the system, /etc/elasticsearch/jvm.options.d/squirro.options
overrides the options in /etc/elasticsearch/jvm.options
.
Testing an Elasticsearch Query#
Sometimes you see an exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:
Save the query in JSON format, e.g to
/tmp/es_request.json
Make request to ES using the following as the input file:
es_curl https://localhost:9200/{index_name}/_search?pretty -d @/tmp/es_request.json
Cluster Block Exception in ES Logs#
If the disk usage on the Elasticsearch cluster goes beyond a certain limit, Elasticsearch marks all the indices read only, only allowing the deletion of indices/documents to facilitate space recovery.
To make the indices writable again, first make sure that you have more than 80% disk space available, either by removing old log/unncessary files, adding more disk space, or by any other means. Once done, execute the following on each index which is marked as read only:
es_curl -XPUT https://localhost:9200/*/_settings -H "Content-Type: application/json" -d '{"index.blocks.read_only_allow_delete": null}'
Recover From a Corrupted Index#
The following are symptoms of a corrupted index:
Elasticsearch cluster state is red
One or multiple shards are not allocated
Output of the following:
es_curl -XGET https://localhost:9200/_cluster/allocation/explain?pretty
Looks like this example:
{
"index": "squirro_v9_spzhmtdsrrc78oodbnolza",
"shard": 5,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "CLUSTER_RECOVERED",
"at": "2020-02-14T19:20:02.789Z",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
{
"node_id": "7IDt77EJR-uh5PknzF26_Q",
"node_name": "squirro-node-2a679356-9373-58ec-bad1-d812fbed0cad",
"transport_address": "127.0.0.1:9300",
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "xLLfLDCbTgaePDz_oivuQA",
"store_exception": {
"type": "corrupt_index_exception",
"reason": "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
"caused_by": {
"type": "i_o_exception",
"reason": "failed engine (reason: [merge failed])",
"caused_by": {
"type": "corrupt_index_exception",
"reason": "codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/_oon.cfs\") [slice=_oon_Lucene50_0.tim]))"
}
}
}
}
}
]
}
At this point, if you have replicas, snapshots, or backups, the only correct option is to recover them.
The following steps will get your index back up and running in green state, but you will most likely lose documents in the process:
Return to a Green State Example
Use the Lucene CheckIndex utility to validate and fix the corrupted index.
Stop elasticsearch
Note the affected shards in the message above. In this example scenario, it’s shard
5
.Locate the data folder of this shared, in this example it is:
/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index
(also printed in the above message)
Backup the affected shard folder, for example:
tar cvf /tmp/shard5.tar.gz /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5
Enter the
lib
folder of elasticsearch:
cd /usr/share/elasticsearch/lib
Run the following command (adjust the folder to your situation):
java -cp lucene-core*.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/ -verbose -exorcise
Check the summary output of the tool, in this case:
WARNING: 1 broken segments (containing 179 documents) detected
Took 545.101 sec total.
WARNING: 179 documents will be lost
NOTE: will write new segments file in 5 seconds; this will remove 179 docs from the index. YOU WILL LOSE DATA. THIS IS YOUR LAST CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file "segments_er0"
The example above is good news, given that the tool was able to fix the corrupted segment.
However, 179 documents were lost, and you are not told which ones.
Enter the index folder, in this case
cd /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/
Remove any files that start with
corrupted
, in this case:rm corrupted_Qyj-NdANTo2vr-aUDR6l_g
Start Elasticsearch
Check Elasticsearch status
Install/Remove Elasticsearch Plugins#
Use the following command to install an Elasticsearch plugin:
elasticsearch-plugin install <plugin name>
Use the following command to remove an Elasticsearch plugin
elasticsearch-plugin remove <plugin name>