ElasticSearch入门一文

一句话定义

使用lucene开源搜索引擎为基础,使用Java编写并提供简单易用RESTful API,
并且能轻易横向扩展,支持PB级别大数据的应用.

能作甚: 数据仓库,数据分析引擎,全文搜索引擎等.

版本

1.X -> 2.X -> 5.X -> 6.X

差异

安装

单节点安装

下载,解压即可.

打开127.0.0.1:9200得到:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"name" : "JPM5EYQ",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "JG6IvmbgTmicNDFFBDDMCQ",
"version" : {
"number" : "6.3.1",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "eb782d0",
"build_date" : "2018-06-29T21:59:26.107521Z",
"build_snapshot" : false,
"lucene_version" : "7.3.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

安装HEAD插件

可以提供WEB界面进行查看结果和操作.
head

1
2
3
npm install
npm run start
open http://localhost:9100/

前提是打开了elasticsearch服务,并且修改下跨域问题.
config/elasticsearch.yml

1
2
http.cors.enabled: true
http.cors.allow-origin: "*"

集群安装

master节点配置:config/elasticsearch

1
2
3
4
5
cluster.name: jimo
node.name: master
node.master: true

network.host: 127.0.0.1

slave节点只需要打开新的terminal,使用新的文件夹.配置如下:改改端口

1
2
3
4
5
6
7
cluster.name: jimo
node.name: slave1

network.host: 127.0.0.1

http.port: 8200
discovery.zen.ping.unicast.hosts: ["127.0.0.1"] # 用于发现master节点

注意的就是:节点的elasticsearch目录不能相互copy

基本概念

  1. 索引(Index):还有相同属性的文档集合(图书索引,车辆索引等)

  2. 类型(Type):索引可以定义一个或多个类型,文档必须属于一个类型(科普类文学类的书,卡车小轿车)

  3. 文档(Document):文档是可以被索引的基本数据单位(每本书,每辆车)

  4. 分片(Shards):每个索引都有多个分片,每个分片是一个lucene索引

  5. 备份(Replica):拷贝一个分片就完成了分片的备份

  6. 集群(Cluster):节点的集合,每个集群有一个名字,通过名字识别不同集群

基本用法

RESTful API格式

http://:/<索引>/<类型>/<文档id>

操作:PUT/GET/POST/DELETE

基本操作

创建索引

  1. 使用elasticsearch-head创建

可以看到分片的分布,竖着看,细边框的是粗边框的备份

查看其信息,发现mappings这一项:如果是空(”mappings”: { })的则代表是非结构化数据,否则可以自定义结构化结构.
下面给book索引定义一个带有title字段的novel属性:

2. 使用HTTP请求创建,推荐使用Postman,便于编写json

插入

文档插入:

指定id插入

或者自动生成id插入.(注意POST方式和去掉id)

在head中查看结果:

修改

指定id通过URL修改

通过脚本修改,支持的脚本语言有:内置的,js,python.
下面使用内置的修改:(可看到,脚本可以灵活的使用参数)

删除

  1. 删除文档

  2. 删除索引

使用head

使用命令行:

查询

先插入一些数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
{
"title": "python之父",
"author": "王麻子",
"word_count": 1000,
"publish_date": "2002-10-01"
},
{
"title": "java",
"author": "王三",
"word_count": 2000,
"publish_date": "2017-08-20"
},
{
"title": "java入门",
"author": "王四",
"word_count": 5000,
"publish_date": "2017-08-15"
},
{
"title": "C++入门",
"author": "王五",
"word_count": 10000,
"publish_date": "2000-09-20"
},
{
"title": "java精通",
"author": "李四",
"word_count": 8000,
"publish_date": "2010-09-20"
},
{
"title": "java大法好",
"author": "张三",
"word_count": 3000,
"publish_date": "2017-08-01"
},
{
"title": "代码整洁之道",
"author": "寂寞哥",
"word_count": 5000,
"publish_date": "1997-01-20"
},
{
"title": "太极拳",
"author": "赵牛",
"word_count": 1000,
"publish_date": "2005-08-20"
}
  1. 简单查询

Get查询

POST查询所有数据:

  1. 条件查询

指定数据量:

按条件并按日期降序排序:

  1. 聚合查询

按日期和字数聚合:

统计:

或直接指定函数:

高级查询

query

query context:

1
2
3
4
5
6
查询时除了判断文档是否满足查询条件外,还会
计算一个_score的字段来标识匹配程度,范围0-1

常用查询:
1.全文本查询:针对文本数据
2.字段级别查询:针对结构化数据,如日期,数字等

1.全文本查询

模糊查询

定向查询: 如果使用match,那么java入门会被分成java和入门2个词

多个关键字查询

语法查询:fields省略可查询所有字段

2.字段级别查询

指定值

范围查询: 数字,日期呀

filter

filter context:

1
查询时只判断yes或no,不进行匹配程度判断.

复合查询

结合查询和过滤.

  1. 固定分数查询:通过boost指定分数,每个filter过滤出的结果都是这个分数

  2. 布尔查询
    must:

must_not:

should:

同时加上过滤:

实战

集成spring-boot. 代码地址

环境:Intellij IDE,JDK1.8

  1. 建一个spring-boot项目
  2. pom.xml
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    <dependencies>
    <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-devtools</artifactId>
    <scope>runtime</scope>
    </dependency>
    <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.elasticsearch.client/transport -->
    <dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>6.3.1</version>
    </dependency>
    <dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-api</artifactId>
    <version>2.7</version>
    </dependency>
    <dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
    <version>2.7</version>
    </dependency>
    </dependencies>
  3. ESConfig配置TransportClient:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    @Configuration
    public class ESConfig {

    @Bean
    public TransportClient client() throws UnknownHostException {
    final InetSocketTransportAddress nodeAddress =
    new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300);

    final Settings settings = Settings.builder().put("cluster.name", "jimo").build();

    final PreBuiltTransportClient client = new PreBuiltTransportClient(settings);
    client.addTransportAddress(nodeAddress);

    return client;
    }
    }
  4. 增删改查操作
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    @RestController
    @RequestMapping("/book/novel")
    public class BookNovelController {

    private final TransportClient client;

    @Autowired
    public BookNovelController(TransportClient client) {
    this.client = client;
    }
    //...
    }
    查询:
    1

增加:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@PostMapping("/new")
public ResponseEntity addBook(
@RequestParam("title") String title,
@RequestParam("author") String author,
@RequestParam("word_count") int wordCount,
@RequestParam("publish_date")
@DateTimeFormat(pattern = "yyyy-MM-dd HH:mm:ss")
Date publishDate) {
try {
final XContentBuilder content = XContentFactory.jsonBuilder()
.startObject()
.field("title", title)
.field("author", author)
.field("word_count", wordCount)
// .field("publish_date", publishDate.getTime())
.endObject();
final IndexResponse result = client.prepareIndex("book", "novel")
.setSource(content).get();
return new ResponseEntity(result.getId(), HttpStatus.OK);
} catch (IOException e) {
e.printStackTrace();
return new ResponseEntity(HttpStatus.INTERNAL_SERVER_ERROR);
}
}

删除:

1
2
3
4
5
@DeleteMapping
public ResponseEntity deleteBook(@RequestParam("id") String id) {
final DeleteResponse response = client.prepareDelete("book", "novel", id).get();
return new ResponseEntity(response.getResult(), HttpStatus.OK);
}

修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
@PostMapping("/update")
public ResponseEntity updateBook(
@RequestParam("id") String id,
@RequestParam(name = "title", required = false) String title,
@RequestParam(name = "author", required = false) String author) {
final UpdateRequest updateRequest = new UpdateRequest("book", "novel", id);
try {
final XContentBuilder builder = XContentFactory.jsonBuilder().startObject();
if (title != null) {
builder.field("title", title);
}
if (author != null) {
builder.field("author", author);
}
builder.endObject();
updateRequest.doc(builder);
} catch (IOException e) {
e.printStackTrace();
return new ResponseEntity(HttpStatus.INTERNAL_SERVER_ERROR);
}
try {
final UpdateResponse updateResponse = client.update(updateRequest).get();
return new ResponseEntity(updateResponse.getResult(), HttpStatus.OK);
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
return new ResponseEntity(HttpStatus.INTERNAL_SERVER_ERROR);
}
}

查询:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
@PostMapping("/query")
public ResponseEntity queryBook(
@RequestParam(name = "author", required = false) String author,
@RequestParam(name = "title", required = false) String title,
@RequestParam(name = "gt_word_count", defaultValue = "0") Integer gtWordCount,
@RequestParam(name = "lt_word_count", required = false) Integer ltWordCount) {
final BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
if (author != null) {
boolQuery.must(QueryBuilders.matchQuery("author", author));
}
if (title != null) {
boolQuery.must(QueryBuilders.matchQuery("title", title));
}
final RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("word_count").from(gtWordCount);
if (ltWordCount != null && ltWordCount >= gtWordCount) {
rangeQuery.to(ltWordCount);
}
boolQuery.filter(rangeQuery);
final SearchRequestBuilder builder = client.prepareSearch("book")
.setTypes("novel")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(boolQuery)
.setFrom(0)
.setSize(10);

System.out.println(builder);

final SearchResponse response = builder.get();
List<Map<String, Object>> result = new ArrayList<>();

for (SearchHit hit : response.getHits()) {
result.add(hit.getSource());
}
return new ResponseEntity(result, HttpStatus.OK);
}

遇到的问题

参考:blog

  1. java.lang.ClassNotFoundException: org.elasticsearch.transport.Netty3Plugin

  2. failed to parse [publish_date]

    需要将date的fromat改为dateOptionalTime.注意:数据类型是不能修改的,所以建立索引时就要考虑清除.

总结

官方的java客户端连接例子