Elasticsearch’s red status means at least one primary shard (and all of its replicas) is missing. This reminds you of missing data: searches will return partial results, and indexing into that shard will return an exception.
x-pack bug
When I open kibana Web UI this morning, I find this page:
Don’t panic since I have handled this many times. There must be some unassigned shards used by x-pack
plugin. Let’s see what these index are.
curl -XGET http://xs333:19201/_cluster/health?level=indices | json_pp
The output is abbreviated so that we can address monitor index only.
{
{
"cluster_name": "fusiones-v2",
"status": "red",
"timed_out": false,
"number_of_nodes": 38,
"number_of_data_nodes": 28,
"active_primary_shards": 5043,
"active_shards": 5064,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 165,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 96.84452094090649,
"indices": {
".monitoring-kibana-6-2017.08.04" : {
"active_shards" : 0,
"unassigned_shards" : 2,
"relocating_shards" : 0,
"number_of_shards" : 1,
"active_primary_shards" : 0,
"number_of_replicas" : 1,
"status" : "red",
"initializing_shards" : 0
},
".monitoring-es-6-2017.08.04" : {
"initializing_shards" : 0,
"status" : "red",
"number_of_replicas" : 1,
"active_primary_shards" : 0,
"number_of_shards" : 1,
"relocating_shards" : 0,
"unassigned_shards" : 2,
"active_shards" : 0
}
}
}
x-pack
reserves monitor index of recent 7 days by default, How does the index of 2017.08.04 still exist and become unassigned? I can only speculate this is a bug of x-pack
, and delete these red shards.
curl -XDELETE http://xs333:19201/.monitoring-kibana-6-2017.08.04
curl -XDELETE http://xs333:19201/.monitoring-es-6-2017.08.04
After these commands, We can see normal Kibana, like this.
Unfortunately, the status is still red, and there are 165 unassigned shards.
Explain API
GET _cluster/health?level=indices
told us which index is red, to explain the allocation of its shard,
GET _cluster/allocation/explain
{
"index": "rtlogindex_2017-10-09-21_part-00008",
"shard": 0,
"primary": true
}
Specify the index
and shard id
of the shard you would like an explanation for, as well as the primary flag to indicate whether to explain the primary shard for the given shard id or one of its replica shards. These three request parameters are required.
The upper request produces the following output:
{
"index": "rtlogindex_2017-10-09-21_part-00008",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "INDEX_CREATED",
"at": "2017-10-09T18:39:56.971Z",
"last_allocation_status": "no"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions": [
{
"node_id": "SLcx_8U7S-C-5aaS39-iTw",
"node_name": "xs328-i3",
"transport_address": "10.34.41.52:39301",
"node_attributes": {
"rack": "xs-r1",
"ml.enabled": "true"
},
"node_decision": "no",
"weight_ranking": 1,
"deciders": [
{
"decider": "filter",
"decision": "NO",
"explanation": """node does not match index setting [index.routing.allocation.include] filters [_name:"xs393-i4"]"""
}
]
}
]
}
The explain API found the primary shard 0 of rtlogindex_2017-10-09-21_part-00008
to explain,
- which is in the unassigned state (see
current_state
) due to the index having just been created (seeunassigned_info
). - The shard cannot be allocated (see
can_allocate
) due to none of the nodes permitting allocation of the shard (seeallocate_explanation
). - When drilling down to each node’s decision (see
node_allocation_decisions
), we observe that nodexs328-i3
received a decision not to allocate (seenode_decision
) due to the filter decider (seedecider
) preventing allocation with the reason that node does not match index settingindex.routing.allocation.include
(seeexplanation
inside thedeciders
section). The explanation also contains the exact setting to change to allow the shard to be allocated in the cluster.
For more information of allocator
and decider
, allocators try to find the best nodes to hold the shard, and deciders make the decision if allocating to a node is allowed.
In consideration of the reason, we’d better update index’s settings by,
PUT rtlogindex_2017-10-09-21_part-00008/_settings
{
"index.routing.allocation.include._name": null
}
Afterwards, this index become green,
So far, we fixed the Unassigned Primary Shards
of index, the solution also works on Unassigned Replica Shards
and Assigned Shards
.
Reference