本文介绍了如何使用边缘大模型网关平台预置的目标检测智能体。
边缘大模型网关预置目标检测智能体。该智能体能够识别各种目标物体,包括不同类别、形状、大小、颜色的物体。
要使用目标检测智能体,您需要:
创建一个网关访问密钥,并为该密钥绑定 目标检测智能体。相关操作,请参见调用平台预置智能体。
获取网关访问密钥的 API key。相关操作,请参见查看密钥(API Key)。
调用目标检测智能体 API 执行目标检测任务。关于 API 的使用说明,请参见 API 使用方法。
目标检测智能体的使用方式整体上符合 OpenAI 标准 Chat 接口,仅有微小差异。您可以参考 OpenAI 相关文档 进行调用。具体差异,请参见与 OpenAI 的不同之处。
以下是对单张图片进行检测的示例:
curl "https://ai-gateway.vei.volces.com/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $YOUR_API_KEY" \ -d '{ "model": "AG-object-detection-agent", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "请检测出其中的苹果" }, { "type": "image_url", "image_url": {"url": "b64_img_url"} } ] } ], "stream": false }'
向目标检测智能体发送的请求中,关于目标的描述:
示例
# pip install openai # https://platform.openai.com/docs/api-reference import base64 from openai import OpenAI client = OpenAI( base_url="https://ai-gateway.vei.volces.com/v1", api_key="YOUR_API_KEY", ) def img_to_base64(img_path): if img_path.find("http") >= 0: base64_str = base64.b64encode(httpx.get(img_path).content).decode("utf-8") else: with open(img_path, "rb") as f: base64_str = base64.b64encode(f.read()).decode("utf-8") return base64_str # 示例1: 中文描述的带属性目标检测 text = "检测出其中戴红色安全帽的人" image_fn = "./test_data/dod_helmets.jpg" # #示例2: 中文描述的带属性目标数量统计 # text = "有几个红苹果" # image_fn = "./test_data/dod_apples.jpg" # #示例3: 中文描述的普通目标检测 # text = "苹果" # image_fn = "./test_data/dod_apples.jpg" b64 = img_to_base64(image_fn) completion = client.chat.completions.create( model="AG-object-detection-agent", messages=[ { "role": "user", "content": [ {"type": "text", "text": text}, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}, } ], }, ], max_tokens=300, ) print(completion)
目标检测能体可以返回更详尽的目标检测结果:检测结果存放在 jdata["choices"][0]["message"]["content"]
中,包含“scores”、“labels”、“boxes”三个字段。其中:
完整返回结果示例:
{ "id": "AG-object-detection-agent-1740727485120", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": { "scores": [0.43657687306404114,0.42889276146888733,0.3822806775569916,0.3763623833656311,0.3696218729019165,0.3690265417098999], "labels": [1,1,1,0,1,1], "boxes": [[878.8975811004639,258.3551917076111,1130.8835220336914,891.4355516433716], [826.282527923584,279.6627961397171,962.5131340026855,699.0053584575653], [486.19521975517273,273.1498453617096,613.9511375427246,782.4513545036316], [77.88547253608704,268.2019966840744,293.77442049980164,877.022209405899], [225.4987235069275,298.51445257663727,285.8189606666565,656.4610722064972], [235.5941677093506,268.90106761455536,555.3854942321777,887.1528959274292]], "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null } } ], "created": 1740727485120, "model": "AG-object-detection-agent", "object": "chat.completion", "service_tier": null, "system_fingerprint": "", "usage": { "completion_tokens": 57, "prompt_tokens": 1503, "total_tokens": 1560, "completion_tokens_details": null, "prompt_tokens_details": null } }