Elasticsearch 学习笔记 1

Elasticsearch 是一个开源的搜索引擎,基于 Lucene(被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库)。

安装

唯一依赖 Java,如果没有则先安装 Java。

Elasticsearch 安装包可以从这里下载。

运行

将压缩包解压后并可以直接运行

1
./bin/elasticsearch -d

加上参数 -d 以守护进程的方式运行。

测试安装是否成功

1
curl 'http://localhost:9200/?pretty'

如果看到一下回复信息则表明安装成功。

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"status" : 200,
"name" : "Paul Bailey",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.2",
"build_hash" : "e43676b1385b8125d647f593f7202acbd816e8ec",
"build_timestamp" : "2015-09-14T09:49:53Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

RESTful API

Elasticsearch 为 Java 用户提供了两种内置客户端:节点客户端(node client)和传输客户端(Transport client)。而其他语言则可以通过 RESTful API 进行交互。

默认提供 RESTful API 的端口是 9200,以 JSON 数据进行交互。

可以直接用curl发送请求,例如:

1
2
3
4
5
6
7
8
curl -XPUT "http://120.25.84.82:9200/megacorp/employee/2" -d'
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}'

或者使用marvel这个插件,下面的示例都使用的这个插件。

安装插件 ./bin/plugin -i elasticsearch/marvel/latest
访问 http://localhost:9200/_plugin/marvel/sense/index.html

创建
1
2
3
4
5
6
7
8
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
查询单个
1
GET /megacorp/employee/2

返回的结果中,原始 JSON 文档在 _source 字段中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
简单搜索
1
GET /megacorp/employee/_search

返回结果,默认返回前10个结果。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
},

...

]
}
}
简单搜索带条件
1
GET /megacorp/employee/_search?q=first_name:John

DSL查询(Query DSL)

DSL(Domain Specific Language 特定领域语言)以 JSON 请求体的形式出现。

1
2
3
4
5
6
7
8
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"first_name" : "John"
}
}
}

还支持很多复杂的查询,以后在具体介绍。

全文搜索

这个功能一般的数据库是没有提供的。例如:搜索所有喜欢“rock climbing”的员工。

1
2
3
4
5
6
7
8
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}

返回的结果,默认是按相似匹配度来排序的,_score是这条记录的相关性评分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.16273327,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.16273327,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.016878016,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}
短语搜索

匹配若干个关键字,使用match_phrase

1
2
3
4
5
6
7
8
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}

RESTful API 提供了很多很强大的接口,可以方便的进行搜索、聚合等操作,这里只是简单介绍了下,具体详细的请看相关文档。