亚欧色一区w666天堂,色情一区二区三区免费看,少妇特黄A片一区二区三区,亚洲人成网站999久久久综合,国产av熟女一区二区三区

  • 發布文章
  • 消息中心
點贊
收藏
評論
分享
原創

Elasticsearch 簡單介紹及操作

2023-08-30 06:20:47
19
0


ES(Elasticsearch)是一個開源的分布式搜索和分析引擎,基于Lucene庫構建而成。它被設計用于處理大規模數據集,具有高性能、可擴展性和強大的全文搜索功能。

ES的基本原理和架構如下:

1. 分布式存儲:ES將數據分布存儲在多個節點上,每個節點可以容納多個分片。每個分片都是一個獨立的索引,包含一部分數據。這種分布式存儲方式使得ES能夠處理大規模數據集,并提供高可用性和容錯性。

2. 倒排索引:ES使用倒排索引來加速搜索過程。倒排索引是一種將文檔中的每個詞映射到包含該詞的文檔的數據結構。通過倒排索引,ES可以快速定位包含關鍵詞的文檔,從而實現高效的全文搜索。

3. 分布式搜索:當用戶執行搜索請求時,ES將搜索請求發送到集群中的所有節點,并將結果進行合并。每個節點都會搜索自己所負責的分片,并返回匹配的結果。最后,ES將所有節點返回的結果進行合并,并按照相關性進行排序,最終返回給用戶。

4. 實時數據分析:ES支持實時數據分析和聚合功能。它可以對大規模數據集進行復雜的聚合操作,如計算平均值、求和、最大值、最小值等。ES使用分布式計算和緩存機制來提高聚合性能。

5. 高可用性和容錯性:ES通過將每個分片進行復制和分布在不同的節點上來實現高可用性和容錯性。當一個節點發生故障時,ES可以自動將分片從故障節點上復制到其他節點上,從而保證數據的可用性。

6. API和插件生態系統:ES提供了豐富的API和插件生態系統,使開發者可以方便地與其集成,并根據自己的需求進行定制和擴展。開發者可以使用RESTful API、Java API等與ES進行交互,并使用插件來擴展ES的功能。

總之,ES的基本原理和架構包括分布式存儲、倒排索引、分布式搜索、實時數據分析、高可用性和容錯性,以及API和插件生態系統。這些特性使得ES成為一個功能強大的分布式搜索和分析引擎。

 

除了原理的學習,基本操作的運用也是必不可少的。下面將提供一個demo,介紹一些基本操作樣例,源數據可下載,代碼可直接復制到本地執行。

 

ES指導文檔和測試數據來源如下:

package com.example.demo.ES;

import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch._types.FieldValue;
import co.elastic.clients.elasticsearch._types.SortOrder;
import co.elastic.clients.elasticsearch._types.aggregations.*;
import co.elastic.clients.elasticsearch.core.SearchResponse;
import co.elastic.clients.elasticsearch.core.search.Hit;
import co.elastic.clients.elasticsearch.core.search.HitsMetadata;
import co.elastic.clients.json.JsonData;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import org.apache.http.HttpHost;
import org.apache.lucene.index.Terms;
import org.elasticsearch.client.RestClient;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 *
 * 數據準備:
 * 手動在kibana dev_tools執行
 * 1、創建index,名稱為為account
 * 2、創建mapping
 * 3、導入下載的測試數據
 *
 * POM:
 * <dependency><groupId>co.elastic.clients</groupId><artifactId>elasticsearch-java</artifactId><version>8.5.3</version></dependency>
 * <dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>7.17.4</version></dependency>
 * <dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-databind</artifactId><version>2.12.3</version></dependency>
 * <dependency><groupId>jakarta.json</groupId><artifactId>jakarta.json-api</artifactId><version>2.0.1</version></dependency>
 */
public class ESSearchTest
{
    private static Logger logger = LoggerFactory.getLogger(ESSearchTest.class);

    static ElasticsearchClient client = null;
    static String host = "localhost";
    static int port = 9200;

    public static ElasticsearchClient getClient()
    {
        if (client == null)
        {
            return new ElasticsearchClient(
                    new RestClientTransport(
                            RestClient.builder(new HttpHost(host, port)).build(),
                            new JacksonJsonpMapper()));

        }
        return client;
    }

    String indexName = "account";

    @Test
    public void queryAll() throws IOException
    {
        ElasticsearchClient client = ESSearchTest.getClient();
        //1
        SearchResponse<Map> searchResponse = client.search(searchRequestBuilder -> searchRequestBuilder.index(indexName), Map.class);
        logger.info("time cost : {}", searchResponse.took());
        logger.info("size : {}", searchResponse.hits().total().value());

        //2
        SearchResponse<Map> searchResponse2 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(querBuilder -> querBuilder
                                .matchAll(matchAllQueryBuilder -> matchAllQueryBuilder))
                , Map.class);
        logger.info("time cost : {}", searchResponse2.took());
        logger.info("size : {}", searchResponse2.hits().total().value());

        //3
        SearchResponse<Account> searchResponse3 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(querBuilder -> querBuilder
                                .matchAll(matchAllQueryBuilder -> matchAllQueryBuilder))
                , Account.class);
        logger.info("time cost : {}", searchResponse3.took());
        logger.info("size : {}", searchResponse3.hits().total().value());
        List<Hit<Account>> hits = searchResponse3.hits().hits();
        List<Account> accounts = new ArrayList<Account>();
        for (Hit<Account> hit : hits)
        {
            accounts.add(hit.source());
        }
        logger.info("query accounts size: {}", accounts.size());
        logger.info(accounts.toString());
    }


    /**
     * 分頁
     * 排序
     * match查詢
     *
     * @throws IOException
     */
    @Test
    public void queryAccount() throws IOException
    {
        //查詢條件
        AccountSearchQuery accountSearchQuery = new AccountSearchQuery();
        accountSearchQuery.setPageNo(1);
        accountSearchQuery.setPageSize(10);
        accountSearchQuery.setState("ND");

        String indexName = "account";
        ElasticsearchClient client = ESSearchTest.getClient();
        SearchResponse<Account> searchResponse = client.search(searchRequestBuilder -> searchRequestBuilder
                        //索引名
                        .index(indexName)
                        //match查詢
                        .query(queryBuilder -> queryBuilder
                                .match(matchQueryBuilder -> matchQueryBuilder
                                        .field("state").query(accountSearchQuery.getState())))

                        //分頁
                        .from(parsrePageFrom(accountSearchQuery.getPageNo(), accountSearchQuery.getPageSize()))
                        .size(accountSearchQuery.getPageSize())
                        //排序
                        .sort(sortOptionsBuilder -> sortOptionsBuilder
                                .field(fieldSortBuilder -> fieldSortBuilder
                                        .field("age").order(SortOrder.Asc))),
                Account.class);
        logger.info("time cost : {}", searchResponse.took());
        HitsMetadata<Account> hitsMetadata = searchResponse.hits();
        logger.info("size : {}", hitsMetadata.total().value());
        List<Hit<Account>> hits = hitsMetadata.hits();
        List<Account> results = new ArrayList<Account>();
        for (Hit<Account> hit : hits)
        {
            results.add(hit.source());
        }
        logger.info("account size:{}", results.size());
        logger.info(results.toString());
    }

    @Test
    public void queryAccount2() throws IOException
    {
        String city = "Leming";
        String state = "ND";
        Integer pageNo = 1;
        Integer pageSize = 10;
        Integer[] ageRange = new Integer[]{22,35};


        //查詢條件
        AccountSearchQuery accountSearchQuery = new AccountSearchQuery();
        accountSearchQuery.setPageNo(pageNo);
        accountSearchQuery.setPageSize(pageSize);
        accountSearchQuery.setState(state);
        accountSearchQuery.setCity(city);
        accountSearchQuery.setAgeRange(ageRange);

        List<Account> accounts = doQueryAccount(accountSearchQuery);
        logger.info(accounts.toString());

    }
    /**
     * bool組合查詢
     * 過濾返回字段
     * 范圍查找
     * @throws IOException
     */
    @Test
    public List<Account> doQueryAccount(AccountSearchQuery accountSearchQuery)
            throws IOException
    {
        ElasticsearchClient client = ESSearchTest.getClient();
        SearchResponse<Account> searchResponse = client.search(searchRequestBuilder -> searchRequestBuilder
                        //索引名
                        .index(indexName)
                        //match查詢
                        //bool must兩個條件必須都符合
                        .query(queryBuilder -> queryBuilder
                                .bool(boolQueryBuilder -> boolQueryBuilder.
                                        //must
                                        must(queryBuilder2 -> queryBuilder2
                                                //match
                                                .match(matchQueryBuilder -> matchQueryBuilder
                                                        .field("state").query(accountSearchQuery.getState())))
                                        //must
                                        .must(queryBuilder3 -> queryBuilder3
                                                //match
                                                .match(matchQueryBuilder2 -> matchQueryBuilder2
                                                        .field("city").query(accountSearchQuery.getCity())))
                                        //must
                                        .must(queryBuilder4 -> queryBuilder4
                                                //range
                                                .range(rangeQueryBuilder -> rangeQueryBuilder
                                                        .field("age")
                                                            .gt(JsonData.of(accountSearchQuery.getAgeRange()[0]))
                                                            .lt(JsonData.of(accountSearchQuery.getAgeRange()[1]))))))
                        //過濾字段
                        .source(sourceConfigBuilder -> sourceConfigBuilder
                                .filter(sourceFilterBuilder  -> sourceFilterBuilder
                                        //包含
                                        .includes("firstname","age", "state", "city")
                                        //不包含
                                        .excludes("email")))
                        //分頁
                        .from(parsrePageFrom(accountSearchQuery.getPageNo(), accountSearchQuery.getPageSize()))
                        .size(accountSearchQuery.getPageSize())
                        //排序
                        .sort(sortOptionsBuilder -> sortOptionsBuilder
                                .field(fieldSortBuilder -> fieldSortBuilder
                                        .field("age").order(SortOrder.Asc))),
                Account.class);
        HitsMetadata<Account> hitsMetadata = searchResponse.hits();
        List<Hit<Account>> hits = hitsMetadata.hits();
        List<Account> results = new ArrayList<Account>();
        for (Hit<Account> hit : hits)
        {
            results.add(hit.source());
        }
        return results;
    }

    /**
     * match_all 查詢所有
     * match 全文搜索,對輸入內容先分詞再查詢
     * multi_match 多字段匹配以惡搞字段
     * match_phrase 匹配整個查詢字符串
     * term 精確匹配,不分詞(不太適合text字段的匹配)
     * terms 單個字段多個詞查詢
     * bool 組合查詢
     * fuzzy 模糊查詢
     * query_string
     * simple_query_string
     *
     * @throws IOException
     */
    @Test
    public void queryAccount3() throws IOException
    {
        String indexName = "account";
        ElasticsearchClient client = ESSearchTest.getClient();
        //match_all 查詢所有
        SearchResponse<Account> s1 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .query(q -> q.matchAll(m -> m)), Account.class);
        logger.info("s1 size:{}, data:{}", s1.hits().total().value(), s1.toString());
        //match 全文搜索,對輸入內容先分詞再查詢
        SearchResponse<Account> s2 = client.search(searchRequestBuilder -> searchRequestBuilder
                .index(indexName)
                        .query(q -> q.match(m -> m.field("address").query("Fleet")))
                , Account.class);
        logger.info("s2 size:{}, data:{}", s2.hits().total().value(), s2.toString());
        //multi_match 多字段匹配
        SearchResponse<Account> s3 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.multiMatch(m -> m.fields("firstname", "lastname").query("Garrett")))
                , Account.class);
        logger.info("s3 size:{}, data:{}", s3.hits().total().value(), s3.toString());
        //match_phrase 匹配整個查詢字符串,不分詞
        SearchResponse<Account> s4 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.matchPhrase(m -> m.field("address").query("396 Grove Place")))
                , Account.class);
        logger.info("s4 size:{}, data:{}", s4.hits().total().value(), s4.toString());
        //term 精確匹配,不分詞
        SearchResponse<Account> s5 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.term(t -> t.field("age").value(40)))
                , Account.class);
        logger.info("s5 size:{}, data:{}", s5.hits().total().value(), s5.toString());
        //terms 單個屬性多個詞查詢
        List<FieldValue> values = new ArrayList<>();
        values.add(new FieldValue.Builder().anyValue(JsonData.of(40)).build());
        values.add(new FieldValue.Builder().anyValue(JsonData.of(39)).build());
        SearchResponse<Account> s6 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.terms(t -> t.field("age").terms(ts -> ts.value(values))))
                , Account.class);
        logger.info("s6 size:{}, data:{}", s6.hits().total().value(), s6.toString());
        //bool 組合查詢
        SearchResponse<Account> s7 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q
                                .bool(m -> m
                                        .must(q2 -> q2
                                                .match(q3 -> q3.field("age").query(34)))
                                        .mustNot(q3 -> q3
                                                .term(t -> t.
                                                        field("state").value("DC")))
                                        .should(q4 -> q4.matchPhrase(m2 -> m2
                                                .field("address").query("975 Dakota Place")))))
                , Account.class);
        logger.info("s7 size:{}, data:{}", s7.hits().total().value(), s5.toString());
        //fuzzy 模糊查詢
        //fuzziness 最大允許編輯距離、prefix_length、max_expansions的等參數請查詢相關文檔
        //先分詞再模糊查詢
        SearchResponse<Account> s8 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.match(m -> m.field("firstname").query("Bush").fuzziness("1")))
                , Account.class);
        logger.info("s8 size:{}, data:{}", s8.hits().total().value(), s8.toString());
        //不分詞直接模糊查詢
        SearchResponse<Account> s9 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.fuzzy(f -> f.field("firstname").fuzziness("1").value("Bush")))
                , Account.class);
        logger.info("s9 size:{}, data:{}", s9.hits().total().value(), s9.toString());
        //query_string
        //類似match
        SearchResponse<Account> s10 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.defaultField("address").query("Fleet")))
                , Account.class);
        logger.info("s10 size:{}, data:{}", s10.hits().total().value(), s10.toString());
        //類似mulit_match
        SearchResponse<Account> s11 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.fields("firstname", "lastname").query("Garrett")))
                , Account.class);
        logger.info("s11 size:{}, data:{}", s11.hits().total().value(), s11.toString());
        //類似match_phrase
        SearchResponse<Account> s12 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.defaultField("address").query("\"396 Grove Place\"")))
                , Account.class);
        logger.info("s12 size:{}, data:{}", s12.hits().total().value(), s12.toString());
        //運算符
        //同時包含Grove和Place
        SearchResponse<Account> s13 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.fields("address").query("Grove AND Place")))
                , Account.class);
        logger.info("s13 size:{}, data:{}", s13.hits().total().value(), s13.toString());
        //同時包含Grove和Place 或者 包含cobbhumphrey@apexia.com
        SearchResponse<Account> s14 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs
                                .fields("address", "email").query("(Grove AND Place) OR cobbhumphrey@apexia.com")))
                , Account.class);
        logger.info("s14 size:{}, data:{}", s14.hits().total().value(), s14.toString());
        //同時包含Grove和Place
        SearchResponse<Account> s15 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.fields("address").query("Grove Place")))
                , Account.class);
        logger.info("s15 size:{}, data:{}", s15.hits().total().value(), s15.toString());
        //query_simple_string
        //和query_string相比,不支持AND OR,支持+(AND)、|(OR)、-(NOT)
        SearchResponse<Account> s16 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.simpleQueryString(sqs -> sqs.fields("address").query("Grove + Place")))
                , Account.class);
        logger.info("s16 size:{}, data:{}", s16.hits().total().value(), s16.toString());
    }

    /**
     * 聚合查詢
     * 計數
     * 求和
     * 最大
     * 最小
     * 平均
     * 分組
     * 去重
     * @throws IOException
     */
    @Test
    public void queryAccount4() throws IOException
    {
        String indexName = "account";
        ElasticsearchClient client = ESSearchTest.getClient();
        //最大、最小、平均、求和
        SearchResponse<Account> s1 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        //最大
                        .aggregations("maxAge", aggregationBuilder -> aggregationBuilder
                                .max(maxAggregationBuilder -> maxAggregationBuilder.field("age")))
                        //最小
                        .aggregations("minAge", aggregationBuilder -> aggregationBuilder
                                .min(minAggregationBuilder -> minAggregationBuilder.field("age")))
                        //平均
                        .aggregations("avgAge", aggregationBuilder -> aggregationBuilder
                                .avg(avgAggregationBuilder -> avgAggregationBuilder.field("age")))
                        //求和
                        .aggregations("sumAge", aggregationBuilder -> aggregationBuilder
                                .sum(sumAggregationBuilder -> sumAggregationBuilder.field("age")))
                , Account.class);
        logger.info("s1 size:{}, data:{}", s1.hits().total().value(), s1.toString());

        Map<String, Aggregate> aggregations = s1.aggregations();
        Aggregate maxAge = aggregations.get("maxAge");
        Aggregate minAge = aggregations.get("minAge");
        Aggregate avgAge = aggregations.get("avgAge");
        Aggregate sumAge = aggregations.get("sumAge");

        logger.info("max age : {}", maxAge.max().value());
        logger.info("min age : {}", minAge.min().value());
        logger.info("avg age : {}", avgAge.avg().value());
        logger.info("sum age : {}", sumAge.sum().value());

        //計數、統計、最大、最小、平均、求和
        SearchResponse<Account> s2 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .aggregations("accountNumberStat", aggregationBuilder -> aggregationBuilder
                                //統計
                                .stats(statsAggregationBuilder -> statsAggregationBuilder.field("account_number")))
                , Account.class);
        logger.info("s3 size:{}, data:{}", s2.hits().total().value(), s2.toString());
        Aggregate ans = s2.aggregations().get("accountNumberStat");
        logger.info("account number counts:{}", ans.stats().count());
        logger.info("account number sum:{}", ans.stats().sum());
        logger.info("account number max:{}", ans.stats().max());
        logger.info("account number min:{}", ans.stats().min());
        logger.info("account number avg:{}", ans.stats().avg());
        //去重
        SearchResponse<Account> s3 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .aggregations("cityDistinct", aggregationBuilder -> aggregationBuilder
                                //去重
                                .cardinality(cardinalityAggregationBuilder  -> cardinalityAggregationBuilder
                                        .field("city")))
                , Account.class);
        logger.info("s3 size:{}, data:{}", s3.hits().total().value(), s3.toString());
        Aggregate aggregation = s3.aggregations().get("cityDistinct");
        CardinalityAggregate cityDistinct = aggregation.cardinality();
        logger.info("distinct city number:{}", cityDistinct.value());
        //分組
        SearchResponse<Map> s4 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .aggregations("stateAgg", aggregationBuilder -> aggregationBuilder
                                .terms(termsAggregation -> termsAggregation
                                        .field("state")))
                , Map.class);
        logger.info("s4 size:{}, data:{}", s4.hits().total().value(), s4.toString());
        List<StringTermsBucket> bs = s4.aggregations().get("stateAgg").sterms().buckets().array();
        for (StringTermsBucket b : bs)
        {
            logger.info("key:{}, value:{}", b.key().stringValue(), b.docCount());
        }
        /**
         * 做一個復雜的統計
         * 功能:
         * 查詢年齡在20-30歲之間、地址包含”Street“的數據
         * 聚合查詢數據
         * 根據state(假設這個字段代表地域,州名稱)分組
         * 統計每個州的最高、最低、平均工資和總人數(一條數據假設為一個人)
         * 統計每個州以男女為區分,每種性別的最高、最低、平均工資和總人數(一條數據假設為一個人)
         */
        SearchResponse<Map> s5 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .query(queryBuilder -> queryBuilder
                                .bool(boolQueryBuilder -> boolQueryBuilder
                                        .must(queryBuilder2 -> queryBuilder2
                                                //地址包含”Street“的數據
                                                .match(matchQueryBuilder -> matchQueryBuilder
                                                        .field("address").query("Street")))
                                        .must(queryBuilder3 -> queryBuilder3
                                                //年齡在20-30歲之間
                                                .range(rangeBuilder -> rangeBuilder
                                                        .field("age").lt(JsonData.of(30)).gt(JsonData.of(20))))))
                        //聚合
                        .aggregations("stateAgg", aggregationBuilder -> aggregationBuilder
                                //根據state(州)分組
                                .terms(termsAggregation -> termsAggregation.field("state"))
                                //州最高工資
                                .aggregations("stateAccountMaxAgg", aggregationBuilder2 -> aggregationBuilder2.max(queryBuilder4 -> queryBuilder4.field("account_number")))
                                //州最低工資
                                .aggregations("stateAccountMinAgg", aggregationBuilder3 -> aggregationBuilder3.min(queryBuilder5 -> queryBuilder5.field("account_number")))
                                //州平均工資
                                .aggregations("stateAccountAvgAgg", aggregationBuilder4 -> aggregationBuilder4.avg(queryBuilder6 -> queryBuilder6.field("account_number")))
                                //再次聚合
                                .aggregations("genderAgg", aggregationBuilder5 -> aggregationBuilder5
                                        //根據性別分組
                                        .terms(termsAggregation2 -> termsAggregation2.field("gender"))
                                        //當前性別最高薪資
                                        .aggregations("GenderAccountMaxAgg", aggregationBuilder6 -> aggregationBuilder6
                                                .max(queryBuilder7 -> queryBuilder7.field("account_number")))
                                        .aggregations("GenderAccountMinAgg", aggregationBuilder7 -> aggregationBuilder7
                                                .min(queryBuilder8 -> queryBuilder8.field("account_number")))
                                        .aggregations("GenderAccountAvgAgg", aggregationBuilder8 -> aggregationBuilder8
                                                .avg(queryBuilder9 -> queryBuilder9.field("account_number")))))
                , Map.class);
        logger.info("s4 size:{}, data:{}", s5.hits().total().value(), s5.toString());
        List<StringTermsBucket> buckets = s5.aggregations().get("stateAgg").sterms().buckets().array();
        for (StringTermsBucket b : buckets)
        {
            StringBuilder s = new StringBuilder();
            s.append("State ").append(b.key().stringValue())
                    .append(" has ").append(b.docCount()).append(" people. ")
                    .append("Account max:").append(b.aggregations().get("stateAccountMaxAgg").max().value())
                    .append(", min:").append(b.aggregations().get("stateAccountMinAgg").min().value())
                    .append(", avg:").append(b.aggregations().get("stateAccountAvgAgg").avg().value())
                    .append(". ");
            List<StringTermsBucket> list = b.aggregations().get("genderAgg").sterms().buckets().array();
            for (StringTermsBucket l : list)
            {
                s.append("Gender: ").append(l.key().stringValue())
                        .append(" , account max:").append(l.aggregations().get("GenderAccountMaxAgg").max().value())
                        .append(",min:").append(l.aggregations().get("GenderAccountMinAgg").min().value())
                        .append(",avg:").append(l.aggregations().get("GenderAccountAvgAgg").avg().value())
                        .append(".    ");
            }
            logger.info(s.toString());

        }
    }

    private Integer parsrePageFrom(Integer pageNo, Integer pageSize)
    {
        if (pageNo != null && pageSize != null && pageSize.intValue() > 0)
        {
            return (pageNo.intValue() > 0 ? pageNo.intValue() - 1 : 0)
                    * pageSize.intValue();

        }
        return 0;
    }

    public class AccountSearchQuery
    {
        public Integer pageNo;
        public Integer pageSize;

        public String city;

        public String state;

        public Integer[] ageRange;

        public Integer getPageNo()
        {
            return pageNo;
        }

        public void setPageNo(Integer pageNo)
        {
            this.pageNo = pageNo;
        }

        public Integer getPageSize()
        {
            return pageSize;
        }

        public void setPageSize(Integer pageSize)
        {
            this.pageSize = pageSize;
        }

        public String getCity()
        {
            return city;
        }

        public void setCity(String city)
        {
            this.city = city;
        }

        public String getState()
        {
            return state;
        }

        public void setState(String state)
        {
            this.state = state;
        }

        public Integer[] getAgeRange()
        {
            return ageRange;
        }

        public void setAgeRange(Integer[] ageRange)
        {
            this.ageRange = ageRange;
        }
    }

    public static class Account
    {
        private Integer account_number;

        private Integer balance;

        private String firstname;

        private String lastname;

        private Integer age;

        private String gender;

        private String address;

        private String employer;

        private String email;

        private String city;

        private String state;

        public Account()
        {
        }

        public Account(Integer account_number, Integer balance,
                String firstname, String lastname, Integer age, String gender,
                String address, String employer, String email, String city,
                String state)
        {
            this.account_number = account_number;
            this.balance = balance;
            this.firstname = firstname;
            this.lastname = lastname;
            this.age = age;
            this.gender = gender;
            this.address = address;
            this.employer = employer;
            this.email = email;
            this.city = city;
            this.state = state;
        }

        public void setAccount_number(Integer account_number){
            this.account_number = account_number;
        }
        public int getAccount_number(){
            return this.account_number;
        }
        public void setBalance(Integer balance){
            this.balance = balance;
        }
        public int getBalance(){
            return this.balance;
        }
        public void setFirstname(String firstname){
            this.firstname = firstname;
        }
        public String getFirstname(){
            return this.firstname;
        }
        public void setLastname(String lastname){
            this.lastname = lastname;
        }
        public String getLastname(){
            return this.lastname;
        }
        public void setAge(Integer age){
            this.age = age;
        }
        public int getAge(){
            return this.age;
        }
        public void setGender(String gender){
            this.gender = gender;
        }
        public String getGender(){
            return this.gender;
        }
        public void setAddress(String address){
            this.address = address;
        }
        public String getAddress(){
            return this.address;
        }
        public void setEmployer(String employer){
            this.employer = employer;
        }
        public String getEmployer(){
            return this.employer;
        }
        public void setEmail(String email){
            this.email = email;
        }
        public String getEmail(){
            return this.email;
        }
        public void setCity(String city){
            this.city = city;
        }
        public String getCity(){
            return this.city;
        }
        public void setState(String state){
            this.state = state;
        }
        public String getState(){
            return this.state;
        }

        @Override public String toString()
        {
            return "Account{" + "account_number=" + account_number
                    + ", balance=" + balance + ", firstname='" + firstname
                    + '\'' + ", lastname='" + lastname + '\'' + ", age=" + age
                    + ", gender='" + gender + '\'' + ", address='" + address
                    + '\'' + ", employer='" + employer + '\'' + ", email='"
                    + email + '\'' + ", city='" + city + '\'' + ", state='"
                    + state + '\'' + '}';
        }
    }

}
0條評論
0 / 1000
m****n
1文章數
0粉絲數
m****n
1 文章 | 0 粉絲
m****n
1文章數
0粉絲數
m****n
1 文章 | 0 粉絲
原創

Elasticsearch 簡單介紹及操作

2023-08-30 06:20:47
19
0


ES(Elasticsearch)是一個開源的分布式搜索和分析引擎,基于Lucene庫構建而成。它被設計用于處理大規模數據集,具有高性能、可擴展性和強大的全文搜索功能。

ES的基本原理和架構如下:

1. 分布式存儲:ES將數據分布存儲在多個節點上,每個節點可以容納多個分片。每個分片都是一個獨立的索引,包含一部分數據。這種分布式存儲方式使得ES能夠處理大規模數據集,并提供高可用性和容錯性。

2. 倒排索引:ES使用倒排索引來加速搜索過程。倒排索引是一種將文檔中的每個詞映射到包含該詞的文檔的數據結構。通過倒排索引,ES可以快速定位包含關鍵詞的文檔,從而實現高效的全文搜索。

3. 分布式搜索:當用戶執行搜索請求時,ES將搜索請求發送到集群中的所有節點,并將結果進行合并。每個節點都會搜索自己所負責的分片,并返回匹配的結果。最后,ES將所有節點返回的結果進行合并,并按照相關性進行排序,最終返回給用戶。

4. 實時數據分析:ES支持實時數據分析和聚合功能。它可以對大規模數據集進行復雜的聚合操作,如計算平均值、求和、最大值、最小值等。ES使用分布式計算和緩存機制來提高聚合性能。

5. 高可用性和容錯性:ES通過將每個分片進行復制和分布在不同的節點上來實現高可用性和容錯性。當一個節點發生故障時,ES可以自動將分片從故障節點上復制到其他節點上,從而保證數據的可用性。

6. API和插件生態系統:ES提供了豐富的API和插件生態系統,使開發者可以方便地與其集成,并根據自己的需求進行定制和擴展。開發者可以使用RESTful API、Java API等與ES進行交互,并使用插件來擴展ES的功能。

總之,ES的基本原理和架構包括分布式存儲、倒排索引、分布式搜索、實時數據分析、高可用性和容錯性,以及API和插件生態系統。這些特性使得ES成為一個功能強大的分布式搜索和分析引擎。

 

除了原理的學習,基本操作的運用也是必不可少的。下面將提供一個demo,介紹一些基本操作樣例,源數據可下載,代碼可直接復制到本地執行。

 

ES指導文檔和測試數據來源如下:

package com.example.demo.ES;

import co.elastic.clients.elasticsearch.ElasticsearchClient;
import co.elastic.clients.elasticsearch._types.FieldValue;
import co.elastic.clients.elasticsearch._types.SortOrder;
import co.elastic.clients.elasticsearch._types.aggregations.*;
import co.elastic.clients.elasticsearch.core.SearchResponse;
import co.elastic.clients.elasticsearch.core.search.Hit;
import co.elastic.clients.elasticsearch.core.search.HitsMetadata;
import co.elastic.clients.json.JsonData;
import co.elastic.clients.json.jackson.JacksonJsonpMapper;
import co.elastic.clients.transport.rest_client.RestClientTransport;
import org.apache.http.HttpHost;
import org.apache.lucene.index.Terms;
import org.elasticsearch.client.RestClient;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

/**
 *
 * 數據準備:
 * 手動在kibana dev_tools執行
 * 1、創建index,名稱為為account
 * 2、創建mapping
 * 3、導入下載的測試數據
 *
 * POM:
 * <dependency><groupId>co.elastic.clients</groupId><artifactId>elasticsearch-java</artifactId><version>8.5.3</version></dependency>
 * <dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>7.17.4</version></dependency>
 * <dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-databind</artifactId><version>2.12.3</version></dependency>
 * <dependency><groupId>jakarta.json</groupId><artifactId>jakarta.json-api</artifactId><version>2.0.1</version></dependency>
 */
public class ESSearchTest
{
    private static Logger logger = LoggerFactory.getLogger(ESSearchTest.class);

    static ElasticsearchClient client = null;
    static String host = "localhost";
    static int port = 9200;

    public static ElasticsearchClient getClient()
    {
        if (client == null)
        {
            return new ElasticsearchClient(
                    new RestClientTransport(
                            RestClient.builder(new HttpHost(host, port)).build(),
                            new JacksonJsonpMapper()));

        }
        return client;
    }

    String indexName = "account";

    @Test
    public void queryAll() throws IOException
    {
        ElasticsearchClient client = ESSearchTest.getClient();
        //1
        SearchResponse<Map> searchResponse = client.search(searchRequestBuilder -> searchRequestBuilder.index(indexName), Map.class);
        logger.info("time cost : {}", searchResponse.took());
        logger.info("size : {}", searchResponse.hits().total().value());

        //2
        SearchResponse<Map> searchResponse2 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(querBuilder -> querBuilder
                                .matchAll(matchAllQueryBuilder -> matchAllQueryBuilder))
                , Map.class);
        logger.info("time cost : {}", searchResponse2.took());
        logger.info("size : {}", searchResponse2.hits().total().value());

        //3
        SearchResponse<Account> searchResponse3 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(querBuilder -> querBuilder
                                .matchAll(matchAllQueryBuilder -> matchAllQueryBuilder))
                , Account.class);
        logger.info("time cost : {}", searchResponse3.took());
        logger.info("size : {}", searchResponse3.hits().total().value());
        List<Hit<Account>> hits = searchResponse3.hits().hits();
        List<Account> accounts = new ArrayList<Account>();
        for (Hit<Account> hit : hits)
        {
            accounts.add(hit.source());
        }
        logger.info("query accounts size: {}", accounts.size());
        logger.info(accounts.toString());
    }


    /**
     * 分頁
     * 排序
     * match查詢
     *
     * @throws IOException
     */
    @Test
    public void queryAccount() throws IOException
    {
        //查詢條件
        AccountSearchQuery accountSearchQuery = new AccountSearchQuery();
        accountSearchQuery.setPageNo(1);
        accountSearchQuery.setPageSize(10);
        accountSearchQuery.setState("ND");

        String indexName = "account";
        ElasticsearchClient client = ESSearchTest.getClient();
        SearchResponse<Account> searchResponse = client.search(searchRequestBuilder -> searchRequestBuilder
                        //索引名
                        .index(indexName)
                        //match查詢
                        .query(queryBuilder -> queryBuilder
                                .match(matchQueryBuilder -> matchQueryBuilder
                                        .field("state").query(accountSearchQuery.getState())))

                        //分頁
                        .from(parsrePageFrom(accountSearchQuery.getPageNo(), accountSearchQuery.getPageSize()))
                        .size(accountSearchQuery.getPageSize())
                        //排序
                        .sort(sortOptionsBuilder -> sortOptionsBuilder
                                .field(fieldSortBuilder -> fieldSortBuilder
                                        .field("age").order(SortOrder.Asc))),
                Account.class);
        logger.info("time cost : {}", searchResponse.took());
        HitsMetadata<Account> hitsMetadata = searchResponse.hits();
        logger.info("size : {}", hitsMetadata.total().value());
        List<Hit<Account>> hits = hitsMetadata.hits();
        List<Account> results = new ArrayList<Account>();
        for (Hit<Account> hit : hits)
        {
            results.add(hit.source());
        }
        logger.info("account size:{}", results.size());
        logger.info(results.toString());
    }

    @Test
    public void queryAccount2() throws IOException
    {
        String city = "Leming";
        String state = "ND";
        Integer pageNo = 1;
        Integer pageSize = 10;
        Integer[] ageRange = new Integer[]{22,35};


        //查詢條件
        AccountSearchQuery accountSearchQuery = new AccountSearchQuery();
        accountSearchQuery.setPageNo(pageNo);
        accountSearchQuery.setPageSize(pageSize);
        accountSearchQuery.setState(state);
        accountSearchQuery.setCity(city);
        accountSearchQuery.setAgeRange(ageRange);

        List<Account> accounts = doQueryAccount(accountSearchQuery);
        logger.info(accounts.toString());

    }
    /**
     * bool組合查詢
     * 過濾返回字段
     * 范圍查找
     * @throws IOException
     */
    @Test
    public List<Account> doQueryAccount(AccountSearchQuery accountSearchQuery)
            throws IOException
    {
        ElasticsearchClient client = ESSearchTest.getClient();
        SearchResponse<Account> searchResponse = client.search(searchRequestBuilder -> searchRequestBuilder
                        //索引名
                        .index(indexName)
                        //match查詢
                        //bool must兩個條件必須都符合
                        .query(queryBuilder -> queryBuilder
                                .bool(boolQueryBuilder -> boolQueryBuilder.
                                        //must
                                        must(queryBuilder2 -> queryBuilder2
                                                //match
                                                .match(matchQueryBuilder -> matchQueryBuilder
                                                        .field("state").query(accountSearchQuery.getState())))
                                        //must
                                        .must(queryBuilder3 -> queryBuilder3
                                                //match
                                                .match(matchQueryBuilder2 -> matchQueryBuilder2
                                                        .field("city").query(accountSearchQuery.getCity())))
                                        //must
                                        .must(queryBuilder4 -> queryBuilder4
                                                //range
                                                .range(rangeQueryBuilder -> rangeQueryBuilder
                                                        .field("age")
                                                            .gt(JsonData.of(accountSearchQuery.getAgeRange()[0]))
                                                            .lt(JsonData.of(accountSearchQuery.getAgeRange()[1]))))))
                        //過濾字段
                        .source(sourceConfigBuilder -> sourceConfigBuilder
                                .filter(sourceFilterBuilder  -> sourceFilterBuilder
                                        //包含
                                        .includes("firstname","age", "state", "city")
                                        //不包含
                                        .excludes("email")))
                        //分頁
                        .from(parsrePageFrom(accountSearchQuery.getPageNo(), accountSearchQuery.getPageSize()))
                        .size(accountSearchQuery.getPageSize())
                        //排序
                        .sort(sortOptionsBuilder -> sortOptionsBuilder
                                .field(fieldSortBuilder -> fieldSortBuilder
                                        .field("age").order(SortOrder.Asc))),
                Account.class);
        HitsMetadata<Account> hitsMetadata = searchResponse.hits();
        List<Hit<Account>> hits = hitsMetadata.hits();
        List<Account> results = new ArrayList<Account>();
        for (Hit<Account> hit : hits)
        {
            results.add(hit.source());
        }
        return results;
    }

    /**
     * match_all 查詢所有
     * match 全文搜索,對輸入內容先分詞再查詢
     * multi_match 多字段匹配以惡搞字段
     * match_phrase 匹配整個查詢字符串
     * term 精確匹配,不分詞(不太適合text字段的匹配)
     * terms 單個字段多個詞查詢
     * bool 組合查詢
     * fuzzy 模糊查詢
     * query_string
     * simple_query_string
     *
     * @throws IOException
     */
    @Test
    public void queryAccount3() throws IOException
    {
        String indexName = "account";
        ElasticsearchClient client = ESSearchTest.getClient();
        //match_all 查詢所有
        SearchResponse<Account> s1 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .query(q -> q.matchAll(m -> m)), Account.class);
        logger.info("s1 size:{}, data:{}", s1.hits().total().value(), s1.toString());
        //match 全文搜索,對輸入內容先分詞再查詢
        SearchResponse<Account> s2 = client.search(searchRequestBuilder -> searchRequestBuilder
                .index(indexName)
                        .query(q -> q.match(m -> m.field("address").query("Fleet")))
                , Account.class);
        logger.info("s2 size:{}, data:{}", s2.hits().total().value(), s2.toString());
        //multi_match 多字段匹配
        SearchResponse<Account> s3 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.multiMatch(m -> m.fields("firstname", "lastname").query("Garrett")))
                , Account.class);
        logger.info("s3 size:{}, data:{}", s3.hits().total().value(), s3.toString());
        //match_phrase 匹配整個查詢字符串,不分詞
        SearchResponse<Account> s4 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.matchPhrase(m -> m.field("address").query("396 Grove Place")))
                , Account.class);
        logger.info("s4 size:{}, data:{}", s4.hits().total().value(), s4.toString());
        //term 精確匹配,不分詞
        SearchResponse<Account> s5 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.term(t -> t.field("age").value(40)))
                , Account.class);
        logger.info("s5 size:{}, data:{}", s5.hits().total().value(), s5.toString());
        //terms 單個屬性多個詞查詢
        List<FieldValue> values = new ArrayList<>();
        values.add(new FieldValue.Builder().anyValue(JsonData.of(40)).build());
        values.add(new FieldValue.Builder().anyValue(JsonData.of(39)).build());
        SearchResponse<Account> s6 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.terms(t -> t.field("age").terms(ts -> ts.value(values))))
                , Account.class);
        logger.info("s6 size:{}, data:{}", s6.hits().total().value(), s6.toString());
        //bool 組合查詢
        SearchResponse<Account> s7 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q
                                .bool(m -> m
                                        .must(q2 -> q2
                                                .match(q3 -> q3.field("age").query(34)))
                                        .mustNot(q3 -> q3
                                                .term(t -> t.
                                                        field("state").value("DC")))
                                        .should(q4 -> q4.matchPhrase(m2 -> m2
                                                .field("address").query("975 Dakota Place")))))
                , Account.class);
        logger.info("s7 size:{}, data:{}", s7.hits().total().value(), s5.toString());
        //fuzzy 模糊查詢
        //fuzziness 最大允許編輯距離、prefix_length、max_expansions的等參數請查詢相關文檔
        //先分詞再模糊查詢
        SearchResponse<Account> s8 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.match(m -> m.field("firstname").query("Bush").fuzziness("1")))
                , Account.class);
        logger.info("s8 size:{}, data:{}", s8.hits().total().value(), s8.toString());
        //不分詞直接模糊查詢
        SearchResponse<Account> s9 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.fuzzy(f -> f.field("firstname").fuzziness("1").value("Bush")))
                , Account.class);
        logger.info("s9 size:{}, data:{}", s9.hits().total().value(), s9.toString());
        //query_string
        //類似match
        SearchResponse<Account> s10 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.defaultField("address").query("Fleet")))
                , Account.class);
        logger.info("s10 size:{}, data:{}", s10.hits().total().value(), s10.toString());
        //類似mulit_match
        SearchResponse<Account> s11 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.fields("firstname", "lastname").query("Garrett")))
                , Account.class);
        logger.info("s11 size:{}, data:{}", s11.hits().total().value(), s11.toString());
        //類似match_phrase
        SearchResponse<Account> s12 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.defaultField("address").query("\"396 Grove Place\"")))
                , Account.class);
        logger.info("s12 size:{}, data:{}", s12.hits().total().value(), s12.toString());
        //運算符
        //同時包含Grove和Place
        SearchResponse<Account> s13 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.fields("address").query("Grove AND Place")))
                , Account.class);
        logger.info("s13 size:{}, data:{}", s13.hits().total().value(), s13.toString());
        //同時包含Grove和Place 或者 包含cobbhumphrey@apexia.com
        SearchResponse<Account> s14 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs
                                .fields("address", "email").query("(Grove AND Place) OR cobbhumphrey@apexia.com")))
                , Account.class);
        logger.info("s14 size:{}, data:{}", s14.hits().total().value(), s14.toString());
        //同時包含Grove和Place
        SearchResponse<Account> s15 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.queryString(qs -> qs.fields("address").query("Grove Place")))
                , Account.class);
        logger.info("s15 size:{}, data:{}", s15.hits().total().value(), s15.toString());
        //query_simple_string
        //和query_string相比,不支持AND OR,支持+(AND)、|(OR)、-(NOT)
        SearchResponse<Account> s16 = client.search(searchRequestBuilder -> searchRequestBuilder
                        .index(indexName)
                        .query(q -> q.simpleQueryString(sqs -> sqs.fields("address").query("Grove + Place")))
                , Account.class);
        logger.info("s16 size:{}, data:{}", s16.hits().total().value(), s16.toString());
    }

    /**
     * 聚合查詢
     * 計數
     * 求和
     * 最大
     * 最小
     * 平均
     * 分組
     * 去重
     * @throws IOException
     */
    @Test
    public void queryAccount4() throws IOException
    {
        String indexName = "account";
        ElasticsearchClient client = ESSearchTest.getClient();
        //最大、最小、平均、求和
        SearchResponse<Account> s1 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        //最大
                        .aggregations("maxAge", aggregationBuilder -> aggregationBuilder
                                .max(maxAggregationBuilder -> maxAggregationBuilder.field("age")))
                        //最小
                        .aggregations("minAge", aggregationBuilder -> aggregationBuilder
                                .min(minAggregationBuilder -> minAggregationBuilder.field("age")))
                        //平均
                        .aggregations("avgAge", aggregationBuilder -> aggregationBuilder
                                .avg(avgAggregationBuilder -> avgAggregationBuilder.field("age")))
                        //求和
                        .aggregations("sumAge", aggregationBuilder -> aggregationBuilder
                                .sum(sumAggregationBuilder -> sumAggregationBuilder.field("age")))
                , Account.class);
        logger.info("s1 size:{}, data:{}", s1.hits().total().value(), s1.toString());

        Map<String, Aggregate> aggregations = s1.aggregations();
        Aggregate maxAge = aggregations.get("maxAge");
        Aggregate minAge = aggregations.get("minAge");
        Aggregate avgAge = aggregations.get("avgAge");
        Aggregate sumAge = aggregations.get("sumAge");

        logger.info("max age : {}", maxAge.max().value());
        logger.info("min age : {}", minAge.min().value());
        logger.info("avg age : {}", avgAge.avg().value());
        logger.info("sum age : {}", sumAge.sum().value());

        //計數、統計、最大、最小、平均、求和
        SearchResponse<Account> s2 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .aggregations("accountNumberStat", aggregationBuilder -> aggregationBuilder
                                //統計
                                .stats(statsAggregationBuilder -> statsAggregationBuilder.field("account_number")))
                , Account.class);
        logger.info("s3 size:{}, data:{}", s2.hits().total().value(), s2.toString());
        Aggregate ans = s2.aggregations().get("accountNumberStat");
        logger.info("account number counts:{}", ans.stats().count());
        logger.info("account number sum:{}", ans.stats().sum());
        logger.info("account number max:{}", ans.stats().max());
        logger.info("account number min:{}", ans.stats().min());
        logger.info("account number avg:{}", ans.stats().avg());
        //去重
        SearchResponse<Account> s3 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .aggregations("cityDistinct", aggregationBuilder -> aggregationBuilder
                                //去重
                                .cardinality(cardinalityAggregationBuilder  -> cardinalityAggregationBuilder
                                        .field("city")))
                , Account.class);
        logger.info("s3 size:{}, data:{}", s3.hits().total().value(), s3.toString());
        Aggregate aggregation = s3.aggregations().get("cityDistinct");
        CardinalityAggregate cityDistinct = aggregation.cardinality();
        logger.info("distinct city number:{}", cityDistinct.value());
        //分組
        SearchResponse<Map> s4 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .aggregations("stateAgg", aggregationBuilder -> aggregationBuilder
                                .terms(termsAggregation -> termsAggregation
                                        .field("state")))
                , Map.class);
        logger.info("s4 size:{}, data:{}", s4.hits().total().value(), s4.toString());
        List<StringTermsBucket> bs = s4.aggregations().get("stateAgg").sterms().buckets().array();
        for (StringTermsBucket b : bs)
        {
            logger.info("key:{}, value:{}", b.key().stringValue(), b.docCount());
        }
        /**
         * 做一個復雜的統計
         * 功能:
         * 查詢年齡在20-30歲之間、地址包含”Street“的數據
         * 聚合查詢數據
         * 根據state(假設這個字段代表地域,州名稱)分組
         * 統計每個州的最高、最低、平均工資和總人數(一條數據假設為一個人)
         * 統計每個州以男女為區分,每種性別的最高、最低、平均工資和總人數(一條數據假設為一個人)
         */
        SearchResponse<Map> s5 = client.search(
                searchRequestBuilder -> searchRequestBuilder.index(indexName)
                        .query(queryBuilder -> queryBuilder
                                .bool(boolQueryBuilder -> boolQueryBuilder
                                        .must(queryBuilder2 -> queryBuilder2
                                                //地址包含”Street“的數據
                                                .match(matchQueryBuilder -> matchQueryBuilder
                                                        .field("address").query("Street")))
                                        .must(queryBuilder3 -> queryBuilder3
                                                //年齡在20-30歲之間
                                                .range(rangeBuilder -> rangeBuilder
                                                        .field("age").lt(JsonData.of(30)).gt(JsonData.of(20))))))
                        //聚合
                        .aggregations("stateAgg", aggregationBuilder -> aggregationBuilder
                                //根據state(州)分組
                                .terms(termsAggregation -> termsAggregation.field("state"))
                                //州最高工資
                                .aggregations("stateAccountMaxAgg", aggregationBuilder2 -> aggregationBuilder2.max(queryBuilder4 -> queryBuilder4.field("account_number")))
                                //州最低工資
                                .aggregations("stateAccountMinAgg", aggregationBuilder3 -> aggregationBuilder3.min(queryBuilder5 -> queryBuilder5.field("account_number")))
                                //州平均工資
                                .aggregations("stateAccountAvgAgg", aggregationBuilder4 -> aggregationBuilder4.avg(queryBuilder6 -> queryBuilder6.field("account_number")))
                                //再次聚合
                                .aggregations("genderAgg", aggregationBuilder5 -> aggregationBuilder5
                                        //根據性別分組
                                        .terms(termsAggregation2 -> termsAggregation2.field("gender"))
                                        //當前性別最高薪資
                                        .aggregations("GenderAccountMaxAgg", aggregationBuilder6 -> aggregationBuilder6
                                                .max(queryBuilder7 -> queryBuilder7.field("account_number")))
                                        .aggregations("GenderAccountMinAgg", aggregationBuilder7 -> aggregationBuilder7
                                                .min(queryBuilder8 -> queryBuilder8.field("account_number")))
                                        .aggregations("GenderAccountAvgAgg", aggregationBuilder8 -> aggregationBuilder8
                                                .avg(queryBuilder9 -> queryBuilder9.field("account_number")))))
                , Map.class);
        logger.info("s4 size:{}, data:{}", s5.hits().total().value(), s5.toString());
        List<StringTermsBucket> buckets = s5.aggregations().get("stateAgg").sterms().buckets().array();
        for (StringTermsBucket b : buckets)
        {
            StringBuilder s = new StringBuilder();
            s.append("State ").append(b.key().stringValue())
                    .append(" has ").append(b.docCount()).append(" people. ")
                    .append("Account max:").append(b.aggregations().get("stateAccountMaxAgg").max().value())
                    .append(", min:").append(b.aggregations().get("stateAccountMinAgg").min().value())
                    .append(", avg:").append(b.aggregations().get("stateAccountAvgAgg").avg().value())
                    .append(". ");
            List<StringTermsBucket> list = b.aggregations().get("genderAgg").sterms().buckets().array();
            for (StringTermsBucket l : list)
            {
                s.append("Gender: ").append(l.key().stringValue())
                        .append(" , account max:").append(l.aggregations().get("GenderAccountMaxAgg").max().value())
                        .append(",min:").append(l.aggregations().get("GenderAccountMinAgg").min().value())
                        .append(",avg:").append(l.aggregations().get("GenderAccountAvgAgg").avg().value())
                        .append(".    ");
            }
            logger.info(s.toString());

        }
    }

    private Integer parsrePageFrom(Integer pageNo, Integer pageSize)
    {
        if (pageNo != null && pageSize != null && pageSize.intValue() > 0)
        {
            return (pageNo.intValue() > 0 ? pageNo.intValue() - 1 : 0)
                    * pageSize.intValue();

        }
        return 0;
    }

    public class AccountSearchQuery
    {
        public Integer pageNo;
        public Integer pageSize;

        public String city;

        public String state;

        public Integer[] ageRange;

        public Integer getPageNo()
        {
            return pageNo;
        }

        public void setPageNo(Integer pageNo)
        {
            this.pageNo = pageNo;
        }

        public Integer getPageSize()
        {
            return pageSize;
        }

        public void setPageSize(Integer pageSize)
        {
            this.pageSize = pageSize;
        }

        public String getCity()
        {
            return city;
        }

        public void setCity(String city)
        {
            this.city = city;
        }

        public String getState()
        {
            return state;
        }

        public void setState(String state)
        {
            this.state = state;
        }

        public Integer[] getAgeRange()
        {
            return ageRange;
        }

        public void setAgeRange(Integer[] ageRange)
        {
            this.ageRange = ageRange;
        }
    }

    public static class Account
    {
        private Integer account_number;

        private Integer balance;

        private String firstname;

        private String lastname;

        private Integer age;

        private String gender;

        private String address;

        private String employer;

        private String email;

        private String city;

        private String state;

        public Account()
        {
        }

        public Account(Integer account_number, Integer balance,
                String firstname, String lastname, Integer age, String gender,
                String address, String employer, String email, String city,
                String state)
        {
            this.account_number = account_number;
            this.balance = balance;
            this.firstname = firstname;
            this.lastname = lastname;
            this.age = age;
            this.gender = gender;
            this.address = address;
            this.employer = employer;
            this.email = email;
            this.city = city;
            this.state = state;
        }

        public void setAccount_number(Integer account_number){
            this.account_number = account_number;
        }
        public int getAccount_number(){
            return this.account_number;
        }
        public void setBalance(Integer balance){
            this.balance = balance;
        }
        public int getBalance(){
            return this.balance;
        }
        public void setFirstname(String firstname){
            this.firstname = firstname;
        }
        public String getFirstname(){
            return this.firstname;
        }
        public void setLastname(String lastname){
            this.lastname = lastname;
        }
        public String getLastname(){
            return this.lastname;
        }
        public void setAge(Integer age){
            this.age = age;
        }
        public int getAge(){
            return this.age;
        }
        public void setGender(String gender){
            this.gender = gender;
        }
        public String getGender(){
            return this.gender;
        }
        public void setAddress(String address){
            this.address = address;
        }
        public String getAddress(){
            return this.address;
        }
        public void setEmployer(String employer){
            this.employer = employer;
        }
        public String getEmployer(){
            return this.employer;
        }
        public void setEmail(String email){
            this.email = email;
        }
        public String getEmail(){
            return this.email;
        }
        public void setCity(String city){
            this.city = city;
        }
        public String getCity(){
            return this.city;
        }
        public void setState(String state){
            this.state = state;
        }
        public String getState(){
            return this.state;
        }

        @Override public String toString()
        {
            return "Account{" + "account_number=" + account_number
                    + ", balance=" + balance + ", firstname='" + firstname
                    + '\'' + ", lastname='" + lastname + '\'' + ", age=" + age
                    + ", gender='" + gender + '\'' + ", address='" + address
                    + '\'' + ", employer='" + employer + '\'' + ", email='"
                    + email + '\'' + ", city='" + city + '\'' + ", state='"
                    + state + '\'' + '}';
        }
    }

}
文章來自個人專欄
文章 | 訂閱
0條評論
0 / 1000
請輸入你的評論
0
0