Elasticsearch Management
Contents
Elasticsearch Management#
Squirro relies on Elasticsearch for the storage nodes where the indexed data is persisted. This guide outlines Elasticsearch configuration changes that are often used in the context of Squirro.
Please also refer to the official Elasticsearch Documentation for help.
For the default setup of Squirro, the following table shows the most important Elasticsearch folders and files on the Squirro storage node:
Location |
Description |
---|---|
|
elasticsearch sysconfig file, customization of the elasticsearch setting, e.g |
|
elasticsearch config file, customization of elasticsearch config, e.g. cluster name, http port… |
|
elasticsearch config directory: elasticsearch.yml, jvm.options, templates, scripts, synonyms (symlink not allowed) |
|
elasticsearch log folder (symlink is allowed) |
|
elasticsearch data folder (where we store indices) (symlink is allowed) |
|
where elasticsearch is installed, contains /bin, /lib, /plugins |
Monitoring#
Checking the Size of Indices in ES#
curl -XGET localhost:9200/_cat/indices?v
You should see a list of the Squirro indices in Squirro with their sizes and the Project ID at the end of the Index name.
Check Cluster/Nodes Are Healthy
curl http://localhost:9200/_cluster/health?pretty=true
You should see a JSON containing "status" : "green"
Check Elasticsearch Access From Cluster Node#
Cluster node needs to access storage node through the Nginx service. To check that access:
curl -L http://127.0.0.1:81/ext/elastic/_cluster/health?pretty=true
You should see JSON containing "status" : "green"
Check that the Elasticsearch Service is Started#
ps aux | grep [e]lasticsearch
This command also allows you to view additional settings of the Elasticsearch service if it’s started, for example memory
, default.path
, and process id
:
496 122224 1 11 10:20 ? 00:02:14 /usr/bin/java -Xms6g -Xmx6g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djava.security.policy=file:///etc/elasticsearch/squirro.java.policy -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid -d -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
To start, stop, and restart Elasticsearch service:
systemctl {start|stop|restart} elasticsearch
View Indices, Status, and Size#
Each Squirro project corresponds by defualt to an index with the following format squirro_{template-version}_{lowercase-project-id}, e.g: squirro_v9_36syha3_ss-zwvn9gyk1ww
curl http://localhost:9200/_cat/indices?v
View Shards, Status, and Size#
Each Elasticsearch index usually contains several shards. number_of_shards
is defined in index setting.
curl http://localhost:9200/_cat/shards?v
Tip: There are plenty of useful _cat
commands to investigate Elasticsearch, just type curl 'http://localhost:9200/_cat'
to see them.
View Squirro Templates#
This ensures the Squirro templates are used in the Squirro storage node:
# see detail of all templates
curl -s http://localhost:9200/_template?pretty
# see only squirro template names
curl -s http://localhost:9200/_template?pretty | grep squirro_v
View Number of Shards and Replicas#
This ensures the number of shards and replicas is set correctly in the Squirro storage node:
# in template
curl -s http://localhost:9200/_template?pretty | grep -e number_of -e squirro_v
# in setting of given index
curl -s http://localhost:9200/{index_name}/_settings?pretty | grep -e number_of -e squirro_v
Tip: You can use wildcard syntax * in index_name, e.g /squirro_v9_*/
View Mapping and Setting of a Given Index#
This ensures a newly created index uses the correct Squirro template:
curl http://localhost:9200/{index_name}/_mappings?pretty
curl http://localhost:9200/{index_name}/_settings?pretty
View Elasticsearch Stats#
curl http://localhost:9200/_stats?pretty
# stats of given index
curl http://localhost:9200/{index_name}/_stats?pretty
# stats of nodes
curl http://localhost:9200/_nodes/stats?pretty
# stats of cpu and memory
curl http://localhost:9200/_nodes/stats/os?pretty
# stats of file system
curl http://localhost:9200/_nodes/stats/fs?pretty
# stats of jvm
curl http://localhost:9200/_nodes/stats/jvm?pretty
# stats of fielddata (data of text field is loaded to memory for aggregation, sort or in a script)
curl http://localhost:9200/_nodes/stats/indices/fielddata?pretty
Set Replicas in Multiple Nodes Cluster#
When you have multiple storage nodes, it’s suggested that you create at least one replica for the indices in case a node is shut down. Having a replica will ensure the storage cluster still works.
curl -XPUT http://localhost:9200/squirro_v9/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
curl -XPUT http://localhost:9200/squirro_v9_*/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
curl -XPUT http://localhost:9200/.configsync/_settings -H "Content-Type: application/json" -d '{"index": {"number_of_replicas": 1}}'
Investigate Index Content#
View the number of documents and randomly one document in the index:
# query whole index
curl http://localhost:9200/{index_name}/_search?pretty&size=1
You should see in the response number of documents, for example
{
"hits" : {
"total": 12345,
...
"hits": [...list of items...]
}
To get an item by id:
curl 'http://localhost:9200/{index_name}/_doc/{item_id}?pretty'
To query the number of items modified before a given time:
curl http://localhost:9200/{index_name}/_search?pretty -d '
{
"query": {
"range" : {
"modified_at" : {
"lte" : "2018-02-13T18:01:44"
}
}
},
"size": 1
}'
Delete Documents by Query#
Before deleting by a query you must ensure that the query return only documents you want to delete by searching for that query and reviewing the results:
curl http://localhost:9200/{index_name}/_search?pretty -d '
{
"query": {
"term" : {
"assoc:sources" : "123456789abcdef"
}
}
}'
If you see an unwanted document in the index and you want to delete it, e.g. a document belonging to the same source with the id 123456789abcdef
, you can delete it by using the following query:
curl -XPOST http://localhost:9200/{index_name}/_delete_by_query -d '
{
"query": {
"term" : {
"assoc:sources" : "123456789abcdef"
}
}
}'
After deleting by query, the index size on the disk is not reduced because a document is not deleted from a segment, it is only marked as deleted.
If you want to free disk space after deleting, execute the following command:
curl -XPOST http://localhost:9200/{index_name}/_forcemerge?only_expunge_deletes=true
Troubleshooting#
Disk Free Space#
Check disk space and free it up. By default, Elasticsearch needs about 20% free space on a disk to reassign index shards to nodes. Investigate this issue by using command df -h ...
and du -h ...
to find out free disk space and identify files that can be safely deleted (e.g. files in /var/log/squirro or /var/log/elasticsearch
).
Number of Replicas#
If you have only one instance of elasticsearch running but number_of_replicas
in your index settings is bigger than 0, then you will also have a yellow
status for that index with some unsigned shards. View them by:
curl -s localhost:9200/_cat/shards | grep UNASSIGNED
To check the number of replicas of an index:
curl -XGET http://localhost:9200/{index_name}/_settings?pretty | grep number_of_replicas
If number_of_replicas > 0 then set value to 0:
curl -XPUT http://localhost:9200/{index_name}/_settings -H "Content-Type: application/json" -d '{"number_of_replicas":0}'
Check as well the number of replicas in the template to ensure future indices do not have the wrong number_of_replicas
setting:
curl -XGET http://localhost:9200/_template/squirro_v9?pretty | grep number_of_replicas
If number_of_replicas > 0 then modify squirro_v9.json
and use the following template:
vi /etc/elasticsearch/templates/squirro_v9.json
# edit the line "number_of_replicas": ...,
bash /etc/elasticsearch/templates/ensure_templates.sh
Retry Failed Allocation#
To find the explanation for the unassigned shard:
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
To retry failed allocations:
curl -XPOST localhost:9200/_cluster/reroute?retry_failed
Indices are Moved#
Usually, Elasticsearch indices are stored in /var/lib/elasticsearch/
. However, sometimes (server is down and cannot be recovered to old status, symlink is lost, old elasticsearch version puts indices under cluster name…), you cannot find /var/lib/elasticsearch/
because the index is stayed in another mounted point.
To fix this issue:
Check content, permission, and owner of the
/var/lib/elasticsearch
folder:
sudo ls -l /var/lib/elasticsearch/
You should see folder nodes owned by the user elasticsearch
and group elasticsearch
.
If
/var/lib/elasticsearch
is missing or its content is not your expected index then perform the following:
# stop elasticsearch
systemctl stop elasticsearch
Now, choose one of two solutions:
Create a symlink to new index location
ln -s {your_new_index_location} /var/lib/elasticsearch
Point Elasticsearch data dir to the new location in the config file:
vi /etc/elasticsearch/elasticsearch.yml
# edit the line path.data: {your_new_index_location}
Set owner, restart service:
# set owner and group for data folder
chown -R elasticsearch:elasticsearch {your_index_location}
# start elasticsearch again
systemctl start elasticsearch
# check indices status
curl http://localhost:9200/_cat/indices?v
Too Many Scroll Contexts#
If you encounter this type of exception:
Trying to create too many scroll contexts. Must be less than or equal to: [500]
you can increase the limit of Elasticsearch by running this command on one of the nodes:
curl -X PUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d'{
"persistent" : {
"search.max_open_scroll_context": 1000
},
"transient": {
"search.max_open_scroll_context": 1000
}
}'
Alternatively, and better over the longterm, is to reduce the scroll argument of any squirro_client.scan()
usage from the default 5m to something like 1m.
Elasticsearch Fails to Start with Error Message#
Elasticsearch fails to start with “Unable to load JNA native support library, native methods will be disabled” error message in the log:
This happens when Elasticsearch tries to use the /tmp/
folder, but that folder is mounted with the noexec
flag.
Alternatively, if another temporary folder is used, the Elasticsearch service user has no execution rights in that folder.
The main reason why the noexec
flag would be set on tmp
is OS hardening. The tmp
folder can be leveraged by bad actors to store and execute things.
In a highly hardened system this is not desirable and hence the noexec
flag is often set.
The workaround for this is to edit /etc/sysconfig/elasticsearch
and to add this line:
ES_TMPDIR=/usr/share/elasticsearch/tmp
This would be a sensible default. But any location will do, as long as the folder is owned by the elasticsearch uid and gid.
Memory Setting#
You can set Elasticsearch memory in the file /etc/elasticsearch/jvm.options.d/squirro.options
.
Minimum heap size (Xms) and maximum heap size (Xmx) must be equal to each other, for example:
-Xms8g
-Xmx8g
Elasticsearch may crash because not enough memory on your server. Make sure you set Xmx to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches. For more information, see the Elasticsearch Heap Size Settings page.
If you run the cluster node and storage node on the same machine, then set the memory for Elasticsearch to no more than 30% of your physical RAM.
Elasticsearch also does not allow you to set more than 32GB of memory.
Warning: In older installations of Squirro, the amount of memory available to Elasticsearch was set in file /etc/elasticsearch/jvm.options
.
When both files are present in the system, /etc/elasticsearch/jvm.options.d/squirro.options
overrides the options in /etc/elasticsearch/jvm.options
.
Testing an Elasticsearch Query#
Sometimes you see an exception in the elasticsearch log file where the request body is stored in the log file which is quite long. To reproduce this request for investigation, you can do in following way:
Save the query in JSON format, e.g to
/tmp/es_request.json
Make request to ES using the following as the input file:
curl http://localhost:9200/{index_name}/_search?pretty -d @/tmp/es_request.json
Cluster Block Exception in ES Logs#
If the disk usage on the Elasticsearch cluster goes beyond a certain limit, Elasticsearch marks all the indices read only, only allowing the deletion of indices/documents to facilitate space recovery.
To make the indices writable again, first make sure that you have more than 80% disk space available, either by removing old log/unncessary files, adding more disk space, or by any other means. Once done, execute the following on each index which is marked as read only:
curl -XPUT http://localhost:9200/*/_settings -H "Content-Type: application/json" -d '{"index.blocks.read_only_allow_delete": null}'
Recover From a Corrupted Index#
The following are symptoms of a corrupted index:
Elasticsearch cluster state is red
One or multiple shards are not allocated
Output of the following:
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
Looks like this example:
{
"index": "squirro_v9_spzhmtdsrrc78oodbnolza",
"shard": 5,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "CLUSTER_RECOVERED",
"at": "2020-02-14T19:20:02.789Z",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
{
"node_id": "7IDt77EJR-uh5PknzF26_Q",
"node_name": "squirro-node-2a679356-9373-58ec-bad1-d812fbed0cad",
"transport_address": "127.0.0.1:9300",
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "xLLfLDCbTgaePDz_oivuQA",
"store_exception": {
"type": "corrupt_index_exception",
"reason": "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
"caused_by": {
"type": "i_o_exception",
"reason": "failed engine (reason: [merge failed])",
"caused_by": {
"type": "corrupt_index_exception",
"reason": "codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/_oon.cfs\") [slice=_oon_Lucene50_0.tim]))"
}
}
}
}
}
]
}
At this point, if you have replicas, snapshots, or backups, the only correct option is to recover them.
The following steps will get your index back up and running in green state, but you will most likely lose documents in the process:
Returng to a Green State Example
Use the Lucene CheckIndex utility to validate and fix the corrupted index.
Stop elasticsearch
Note the affected shards in the message above. In this example scenario, it’s shard
5
.Locate the data folder of this shared, in this example it is:
/usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index
(also printed in the above message)
Backup the affected shard folder, for example:
tar cvf /tmp/shard5.tar.gz /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5
Enter the
lib
folder of elasticsearch:
cd /usr/share/elasticsearch/lib
Run the following command (adjust the folder to your situation):
java -cp lucene-core*.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/ -verbose -exorcise
Check the summary output of the tool, in this case:
WARNING: 1 broken segments (containing 179 documents) detected
Took 545.101 sec total.
WARNING: 179 documents will be lost
NOTE: will write new segments file in 5 seconds; this will remove 179 docs from the index. YOU WILL LOSE DATA. THIS IS YOUR LAST CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file "segments_er0"
The example above is good news, given that the tool was able to fix the corrupted segment.
However, 179 documents were lost, and you are not told which ones.
Enter the index folder, in this case
cd /usr/share/elasticsearch/data/nodes/0/indices/7Md9uEZOTk-nAlZvRUvNmg/5/index/
Remove any files that start with
corrupted
, in this case:rm corrupted_Qyj-NdANTo2vr-aUDR6l_g
Start Elasticsearch
Check Elasticsearch status
Synonyms File Missing#
Should text search stop working in your Squirro project, this may be due to a missing synonym analyzer and filters in the index configuration.
This would be evident in the topic.log
and would also be missing from the ES Index configuration. The ES Index configuration can be retrieved by running the following command:
curl -XGET "localhost:9200/$INDEXID" | python -m json.tool
To restore search functionality, follow the steps below:
Stop any dataloading jobs
Stop the ingester service
Close the particular index of the project. This can be achieved by the following command. It is important to replace the
$INDEXID
with the ES index currently experiencing issues. For more info see the Elasticsearch Close Index API page.
curl -X POST "localhost:9200/$INDEXID/_close?pretty"
Once the index has been closed, the Index settings can be updated via the below curl request. For more information see the Elasticsearch Synonym Token Filter page.
Note: The following values must be changed:
$INDEXID - ES Index
$PROJECTID - Squirro project id of affected index
$SYNONYMNAME - Name of the synonym file that cannot be found (Can be found in topic.log Ex ‘title_body_summary’)
$SYNOYNMID - ID of the synonym file that cannot be found (Id can easily be found from the Squirro URL (Explore Dashboard → Load → Synoynms → EDIT $SYNONYMMNAME))
curl -XPUT localhost:9200/$INDEXID/_settings -H "Content-Type: application/json" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonyms_$PROJECTID_$SYNONYMNAME_$SYNONYMIND": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": ["icu_folding", "icu_normalizer", "synonyms_$PROJECTID_$SYNONYMNAME_$SYNOYNMID"],
"char_filter": ["html_strip", "quotation_char_filter"]
}
},
"filter": {
"synonyms_$PROJECTID_$SYNONYMNAME_$SYNOYNMID": {
"type": "synonym_graph",
"synonyms_path": "/etc/elasticsearch/synonyms/$PROJECTID/$SYNONYMID.txt",
"updateable": true
}
}
}
}
}
}
'
Now that the settings have been updated, it’s time to open the index. This can be achieved by the below curl command. For more information, see the Elasticsearch Open Index API page.
curl -X POST "localhost:9200/$INDEXID/_open?pretty"
Resume Squirro ingester services and data loading jobs.
Test full-text search and ensure results are returned as normal:
Install/Remove Elasticsearch Plugins#
Use the following command to install an Elasticsearch plugin:
elasticsearch-plugin install <plugin name>
Use the following command to remove an Elasticsearch plugin
elasticsearch-plugin remove <plugin name>