search_by_id--向量数据库VikingDB-火山引擎

文档中心

立即注册

导航

search_by_id

最近更新时间：2025.03.24 13:54:48首次发布时间：2024.04.17 14:21:07

概述

search_by_id 用于主键 id 检索。根据主键 id，搜索与其距离最近的 limit 个向量。

说明

对于使用 hnsw-hybrid 的混合索引，暂不支持 search_by_id 用法。
Collection 数据写入/删除后，Index 数据更新时间预计20s，不能立即在 Index 检索到。
当请求参数 filter 配置时，表示混合检索；当请求参数 filter 没有配置时，表示纯向量检索。
异步调用使用async_search_by_id接口，参数不变。

前提条件

通过 create_collection 接口创建数据集时，定义字段 fields 已添加 vector 字段。
通过 upsert_data 接口写入数据时，已写入 vector 类型的字段名称和字段值。
通过 create_index 创建索引时，已创建 vector_index 向量索引。

请求参数

参数	类型	是否必选	默认值	参数说明
id	int64	是		主键 id。
filter	map	否	None	过滤条件，详见标量过滤。默认为空，不做过滤。过滤条件包含 must、must_not、range、range_out四类查询算子，包含 and 和 or 两种对查询算子的组合。
limit	int	否	10	检索结果数量，最大5000个。
dense_weight	float	否	0.5	对于标量过滤检索，dense_weight 用于控制稠密向量在检索中的权重。范围为[0.2，1]。仅在检索的索引为混合索引时有效。
output_fields	list<string>	否		过滤字段，指定要返回的标量或向量字段列表。 output_fields 不传时，返回所有的标量字段，不返回向量字段。 output_fields 为空列表时，不返回 fields 字段。 output_fields 格式错误或者过滤字段不是 collection 里的字段时, 接口返回错误。如果索引的距离方式为cosine，向量字段返回的向量是归一化后的向量。
partition	string/int	否	"default"	子索引名称，类型与 partition_by 的 field_type 一致，字段值对应 partition_by 的 field_value。 field_type 为 int64，list<int64> 时，partition 输入类型为 int64。 field_type 为 string，list<string> 时，partition 输入类型为 string，格式要求 "^[a-zA-Z0-9._]+$"。

filter 表达式

算子	算子说明	示例
must	针对指定字段名生效，语义为必须在 [...] 之中，即 "must in"。	`{ "op": "must", "field": "region", "conds": ["cn", "sg"] }`
must_not	针对指定字段名生效，语义为必须不在 [...] 之中，即 "must not in"。	`{ "op": "must_not", "field": "data_type", "conds": [1,2,3] }`
range	针对指定字段名生效，语义为必须在指定范围内。配置使用`gte`（大于等于）, `gt`（大于）, `lte`（小于等于）, `lt`（小于），用以圈定一维范围。另外，支持用 `center` 和 `radius` 表示二维圆内范围。	`// price 在 [100.0, 500.0) { "op": "range", "field": "price", "gte": 100.0, "lt": 500.0 } //price >= 100.0 { "op": "range", "field": "price", "gte": 100.0 } // 以 center 为中心，半径为50的圆内 { "op": "range", "field": ["pos_x", "pos_y"], "center": [100.0, 123.4], "radius": 50.0 }`
range_out	针对指定字段名生效，语义为必须在指定范围外。配置使用`gte`（大于等于）, `gt`（大于）, `lte`（小于等于）, `lt`（小于），用以圈定一维范围。	`// 筛选价格低于100或高于500的商品 { "op": "range_out", "field": "price", "gt": 500.0, "lt": 100.0 }`
and	逻辑算子，针对逻辑查询需求，对多个条件取交集。	`{ "op": "and", // 算子名 "conds": [ // 条件列表，支持嵌套逻辑算子和 must/must_not 算子 { "op": "must", "field": "type", "conds": [1] }, { ... // 支持>=1的任意数量的条件进行组合 } ] }`
or	逻辑算子，针对逻辑查询需求，对多个条件取并集。	`{ "op": "or", // 算子名 "conds": [ // 条件列表，支持嵌套逻辑算子和 must/must_not 算子 { "op": "must", "field": "type", "conds": [1] }, { ... // 支持>=1的任意数量的条件进行组合 } ] }`

示例

请求参数

# 获取指定索引，程序初始化时调用即可，无需重复调用
index = vikingdb_service.get_index("example", "example_index")

res = index.search_by_id("22", limit=2, output_fields=["doc_id", "like", "text_vector"], partition="default")

# 异步调用
async def search_by_id():
    index = await vikingdb_service.async_get_index("async", "async")
    res = await index.async_search_by_id("222")
asyncio.run(search_by_id())

返回值

Python 调用执行上面的任务，返回 List<Data> 。Data 实例包含的属性如下表所示。

属性	说明
id	主键 id。
fields	请求返回中的 fields 字段，是具体的数据，字典类型。
score	表示找到的向量和输入的向量的匹配程度。