前言
Elasticsearch并不支持HDFS作為原生的存儲介質,(支持的store類型://www.elastic.co/guide/en/elasticsearch/reference/7.17/index-modules-store.html#file-system)基于HDFS實現冷熱分離的方案,實現原理是用Elasticsearch創建一個基于HDFS的索引倉庫,然后通過索引快照的方式把冷數據存儲到HDFS。
注意,這種快照目前是不支持搜索的。可搜索快照是企業版才有的功能。
另一種冷熱分離的方案是,SSD機器作為熱節點,HDD機器作為冷節點,都部署在一個集群內,通過node attribute/node role來區分。然后通過ILM來實現索引在冷熱節點之間的根據policy自動進行遷移。
基于HDFS實現冷熱分離方案
hdfs插件安裝
wget //artifacts.elastic.co/downloads/elasticsearch-plugins/repository-hdfs/repository-hdfs-7.10.2.zip
bin/elasticsearch-plugin install file:///home/elasticsearch/repository-hdfs-7.10.2.zip
[elasticsearch@es2 plugins]$ ll
total 4
drwxr-xr-x 3 elasticsearch elasticsearch 244 Jun 19 17:09 analysis-ik
drwxr-xr-x 2 root root 4096 Jul 10 15:38 repository-hdfs
裝好了。
Hadoop安裝
下載:
//www.apache.org/dyn/closer.cgi/hadoop/common/
解壓
tar zxvf hadoop-3.3.6.tar.gz
配置java和免密
[root@es1 hadoop]# java -version
openjdk version "1.8.0_372"
OpenJDK Runtime Environment BiSheng (build 1.8.0_372-b11)
OpenJDK 64-Bit Server VM BiSheng (build 25.372-b11, mixed mode)
[root@es1 hadoop]# ssh es1
Last login: Fri May 26 19:29:13 2023 from 192.168.56.105
[root@es1 ~]#
格式化文件系統
bin/hdfs namenode -format
修改$HADOOP_HOME/etc/Hadoop/hadoop-env.sh
export JAVA_HOME=/usr/bisheng-jdk1.8.0_372
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
啟動NameNode和DataNode
sbin/start-dfs.sh
配置ES與HDFS的連接
hadoop創建目錄
hdfs dfs -mkdir /es_snapshots
將core-site.xml的配置改成:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://es1:9000</value>
</property>
</configuration>
改一下目錄權限:
./bin/hdfs dfs -chmod -R 777 /es_snapshots
es創建快照
curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository" -H 'Content-Type: application/json' -d'
{
"type": "hdfs",
"settings": {
"uri": "hdfs://es1:9000",
"path": "/es_snapshots",
"conf.dfs.client.read.shortcircuit": "false"
}
}'
{"acknowledged":true}
將索引存儲到hdfs
創建測試index:
curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
}
}
}
}'
插入數據:
curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'
{
"name": "John Doe",
"age": 30
}'
curl -X POST "localhost:9200/my_index/_doc" -H 'Content-Type: application/json' -d'
{
"name": "Jane Doe",
"age": 25
}'
創建快照:
curl -X PUT "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"indices": "my_index",
"ignore_unavailable": true,
"include_global_state": false
}'
檢查快照狀態:
curl -X GET "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot?pretty"
查看hdfs,可以看到快照信息:
[root@es1 bin]# ./hdfs dfs -ls /es_snapshots
還原索引:
curl -X POST "localhost:9200/_snapshot/my_hdfs_repository/my_snapshot/_restore" -H 'Content-Type: application/json' -d'
{
"indices": "my_index",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "my_index",
"rename_replacement": "restored_my_index"
}'
基于ILM實現冷熱分離方案
3臺的es集群,在1臺的elasticsearch.yml里配置
node.attr.data: hot
另外1臺配置
node.attr.data: warm
另外一臺配置
node.attr.data: cold
重啟集群。
創建policy
curl -X PUT "localhost:9200/_ilm/policy/test_policy" -H 'Content-Type: application/json' -d'
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size":"10kb",
"max_age":"10m",
"max_docs": 20
}
}
},
"warm": {
"min_age": "0m",
"actions": {
"allocate": {
"require": {
"data": "warm"
}
}
}
},
"cold": {
"min_age": "20m",
"actions": {
"freeze": {},
"allocate": {
"require": {
"data": "cold"
}
}
}
},
"delete": {
"min_age": "1h",
"actions": {
"delete": {}
}
}
}
}
}'
創建template
curl -X PUT "localhost:9200/_template/my_template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["test-*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index.routing.allocation.require.data": "hot",
"index.lifecycle.rollover_alias": "test",
"index.lifecycle.name": "test_policy"
}
}'
創建索引:
curl -X PUT "localhost:9200/test-000001" -H 'Content-Type: application/json' -d'
{
"aliases": {
"test": {}
}
}'
發現索引建在了hot節點上:
$ curl -XGET '//localhost:9200/_cat/shards?v'
index shard prirep state docs store ip node
test-000001 0 p STARTED 7 7.9kb localhost 008
過10分鐘之后觀察:
$ curl -XGET '//localhost:9200/_cat/shards?v'
index shard prirep state docs store ip node
test-000002 0 p STARTED 0 208b localhost 008
test-000001 0 p STARTED 10 11.8kb localhost 009
注意看,test-000001滾動到了009這個warm節點,在hot節點創建了一個新的test-000002。