Pipeline&Query 模板调优--云搜索服务-火山引擎

文档中心

导航

Pipeline&Query 模板调优

最近更新时间：2024.08.30 15:04:55首次发布时间：2024.08.29 14:12:59

您在实际测试、生产的过程中，可能会遇到一些查询不符合预期的情况，针对这些难以调整搜索效果的场景，云搜索服务提供了多种调优策略。本文介绍使用 Pipeline&Query 模板的调优方案。

Pipeline 设置语义权重

混合搜索中的语义检索与全文匹配的权重是通过 search pipeline 配置的，需要修改时可以更新原 pipeline 的配置（在 dev_tools 中执行，或通过 API 执行），顺序为 search template 中向量检索和全文匹配的对应顺序，权重通过weights字段进行配置。

PUT _search/pipeline/search_pipeline
{
    "description": "text embedding pipeline for remote inference",
    "request_processors": [
      {
        "remote_embedding": {
          "remote_config": {
            "method": "POST",
            "url": "url",
            "headers": {
              "Content-Type": "application/json"
            },
            "advance_request_body": {
              "model": "model"
            }
          }
        }
      },
      {
        "pre_analyze": {
          "analysis_config": {
            "tokenizer": "ik_smart",
            "filter": [
              "default_dynamic_synonym"
            ]
          }
        }
      }
    ],
    "phase_results_processors": [
      {
        "normalization-processor": {
          "normalization": {
            "technique": "rrf",
            "parameters": {
              "rank_constant": 60
            }
          },
          "combination": {
            "technique": "arithmetic_mean",
            "parameters": {
              "weights": [
                0.2,
                0.8
              ]
            }
          }
        }
      }
    ]
  }

Query 调优

查询语句支持多样化的查询调优。以下示例为function_score调优结合混合搜索调优的 Demo。

{
    "source": {
      "_source": ["post_title","post_content","model_content"],
      "size": 10,
      "query": {
        "hybrid": {
          "queries": [{
              "remote_neural": {
                "post_title_knn": {
                  "query_text": "{{searchVal}}",
                  "k": 1000
                }
              }
            },
            {
              "function_score": {
                "query": {
                  "bool": {
                    "should": [{
                        "match_phrase_prefix": {
                          "post_title": {
                            "query": "{{searchVal}}",
                            "analyzer": "standard"
                          }
                        }
                      },
                      {
                        "match": {
                          "model_content": {
                            "query": "{{searchVal}}",
                            "minimum_should_match": "90%"
                          }
                        }
                      },
                      {
                        "match_phrase_prefix": {
                          "post_title.pinyin": {
                            "query": "{{searchVal}}",
                            "analyzer": "standard"
                          }
                        }
                      }
                    ]
                  }
                },
                "functions": [{
                    "filter": {
                      "match": {
                        "post_title": "游戏"
                      }
                    },
                    "weight": 0.9
                  },
                  {
                    "filter": [{
                      "match": {
                        "post_title": "小说"
                      }
                    }],
                    "weight": 0.7
                  },
                  {
                    "filter": {
                      "term": {
                        "model_content": "攻略"
                      }
                    },
                    "weight": 1.08
                  }, {
                    "script_score": {
                      "script": {
                        "source": " 1.0 +  0.3 * doc['priority'].value"
                      }
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }

使用 function score 逻辑替换原有的全文匹配，function_score 段中为原有的全文检索逻辑。

functions 中为提分逻辑，比如"match": {"post_title": "游戏"} + "weight": 0.9 表示当搜索结果中满足 post_title 匹配到游戏时，整个搜索分数乘 0.9，为降分逻辑；如果希望有提分逻辑可以设置大于 1 的 weight。
Script_score 部分为整体的权重逻辑，如果文档中有一个字段是优先级的概念（比如签约作者 > 用户自创），可以把签约作者的文档优先级字段设置为 3，用户自创的文档优先级设置为 1，会根据0.3 * doc['priority'].value逻辑对匹配文档进行得分调整。

相关文档：
查询语句支持多样化的查询调优，比如 Boost、Constant score 等调分逻辑，都可以与混合搜索相结合。如需了解详情，请参见开源文档：