目录
本文以 es 6.6.5 版本为例进行演示.
put website/_doc/1 { "title": "小白学es01", "desc": "the first blog about es", "level": 1, "post_date": "2018-10-10", "post_address": { "country": "china", "province": "guangdong", "city": "guangzhou" } } put website/_doc/2 { "title": "小白学es02", "desc": "the second blog about es", "level": 3, "post_date": "2018-11-11", "post_address": { "country": "china", "province": "zhejiang", "city": "hangzhou" } }
搜索条件: 搜索博客等级(level)大于等于2, 同时发布日期(post_date)是2018-11-11的博客:
(1) 不使用filter:
get website/_doc/_search { "query": { "bool": { "must": [ { "match": { "post_date": "2018-11-11" } }, { "range": { "level": { "gte": 2 } } } ] } } } // 结果信息: "hits": { "total": 1, "max_score": 2.0, "hits": [ { "_index": "website2", "_type": "blog", "_id": "2", "_score": 2.0, // 评分为2.0 "_source": { "title": "小白学es02", "desc": "the second blog about es", "level": 3, "post_date": "2018-11-11", "post_address": { "country": "china", "province": "zhejiang", "city": "hangzhou" } } } ] }
(2) 使用filter:
get website/_doc/_search { "query": { "bool": { "must": { "match": { "post_date": "2018-11-11" } }, "filter": { "range": { "level": { "gte": 2 } } } } } } // 结果信息: "hits": { "total": 1, "max_score": 1.0, "hits": [ { "_index": "website2", "_type": "blog", "_id": "2", "_score": 1.0, // 评分为1.0 "_source": { "title": "小白学es02", "desc": "the second blog about es", "level": 3, "post_date": "2018-11-11", "post_address": { "country": "china", "province": "zhejiang", "city": "hangzhou" } } } ] }
filter和query一起使用时, 会先执行filter.
filter
—— 只根据搜索条件过滤出符合的文档, 将这些文档的评分固定为1, 忽略tf/idf信息, 不计算相关度分数;
query
—— 先查询符合搜索条件的文档, 然后计算每个文档对于搜索条件的相关度分数, 再根据评分倒序排序.
建议:
filter 性能更好, 无排序 —— 不计算相关度分数, 不用根据相关度分数进行排序, 同时es内部还会缓存(cache)比较常用的filter的数据 (使用bitset <0或1> 来记录包含与否).
query 性能较差, 有排序 —— 要计算相关度分数, 要根据相关度分数进行排序, 并且没有cache功能.
1) 业务关心的、需要根据匹配的相关度进行排序的搜索条件 放在 query
中;
2) 业务不关心、不需要根据匹配的相关度进行排序的搜索条件 放在 filter
中.
版权声明
作者:
出处: 博客园
如对本文有疑问, 点击进行留言回复!!
去 HBase,Kylin on Parquet 性能表现如何?
如何找到Hive提交的SQL相对应的Yarn程序的applicationId
如何在 HBase Shell 命令行正常查看十六进制编码的中文?哈哈~
网友评论