您在实际测试、生产的过程中,可能会遇到一些查询不符合预期的情况,针对这些难以调整搜索效果的场景,云搜索服务提供了多种调优策略。本文介绍使用 Pipeline&Query 模板的调优方案。
混合搜索中的语义检索与全文匹配的权重是通过 search pipeline 配置的,需要修改时可以更新原 pipeline 的配置(在 dev_tools 中执行,或通过 API 执行),顺序为 search template 中向量检索和全文匹配的对应顺序,权重通过weights
字段进行配置。
PUT _search/pipeline/search_pipeline { "description": "text embedding pipeline for remote inference", "request_processors": [ { "remote_embedding": { "remote_config": { "method": "POST", "url": "url", "headers": { "Content-Type": "application/json" }, "advance_request_body": { "model": "model" } } } }, { "pre_analyze": { "analysis_config": { "tokenizer": "ik_smart", "filter": [ "default_dynamic_synonym" ] } } } ], "phase_results_processors": [ { "normalization-processor": { "normalization": { "technique": "rrf", "parameters": { "rank_constant": 60 } }, "combination": { "technique": "arithmetic_mean", "parameters": { "weights": [ 0.2, 0.8 ] } } } } ] }
查询语句支持多样化的查询调优。以下示例为function_score
调优结合混合搜索调优的 Demo。
{ "source": { "_source": ["post_title","post_content","model_content"], "size": 10, "query": { "hybrid": { "queries": [{ "remote_neural": { "post_title_knn": { "query_text": "{{searchVal}}", "k": 1000 } } }, { "function_score": { "query": { "bool": { "should": [{ "match_phrase_prefix": { "post_title": { "query": "{{searchVal}}", "analyzer": "standard" } } }, { "match": { "model_content": { "query": "{{searchVal}}", "minimum_should_match": "90%" } } }, { "match_phrase_prefix": { "post_title.pinyin": { "query": "{{searchVal}}", "analyzer": "standard" } } } ] } }, "functions": [{ "filter": { "match": { "post_title": "游戏" } }, "weight": 0.9 }, { "filter": [{ "match": { "post_title": "小说" } }], "weight": 0.7 }, { "filter": { "term": { "model_content": "攻略" } }, "weight": 1.08 }, { "script_score": { "script": { "source": " 1.0 + 0.3 * doc['priority'].value" } } } ] } } ] } } } }
使用 function score 逻辑替换原有的全文匹配,function_score
段中为原有的全文检索逻辑。
functions
中为提分逻辑,比如"match": {"post_title": "游戏"}
+ "weight": 0.9
表示当搜索结果中满足 post_title 匹配到游戏时,整个搜索分数乘 0.9,为降分逻辑;如果希望有提分逻辑可以设置大于 1 的 weight。Script_score
部分为整体的权重逻辑,如果文档中有一个字段是优先级的概念(比如签约作者 > 用户自创),可以把签约作者的文档优先级字段设置为 3,用户自创的文档优先级设置为 1,会根据0.3 * doc['priority'].value
逻辑对匹配文档进行得分调整。相关文档:
查询语句支持多样化的查询调优,比如 Boost、Constant score 等调分逻辑,都可以与混合搜索相结合。如需了解详情,请参见开源文档: