Moonshot-v1-128k 是 Moonshot AI 推出了一款千亿参数的语言模型,具备优秀的语义理解、指令遵循和文本生成能力。支持 128K 长上下文窗口,适合超长文本的理解和内容生成场景。本文详细介绍了 Moonshot-v1-128k 的SDK及API使用方法。
Host:maas-api.ml-platform-cn-beijing.volces.com
Region: cn-beijing
提供统一 SDK 的接入形式(需要用 ak/sk 进行旁路鉴权,火山鉴权逻辑可以参考)
Golang SDK: https://github.com/volcengine/volc-sdk-golang
Python SDK: https://github.com/volcengine/volc-sdk-python
说明
调用前请修改:
VOLC_ACCESSKEY
和VOLC_SECRETKEY
;{YOUR_ENDPOINT_ID}
(具体请参考 API Specification)// Usage: // // 1. go get -u github.com/volcengine/volc-sdk-golang // 2. VOLC_ACCESSKEY=XXXXX VOLC_SECRETKEY=YYYYY go run main.go package main import ( "encoding/json" "errors" "fmt" "os" api "github.com/volcengine/volc-sdk-golang/service/maas/models/api/v2" client "github.com/volcengine/volc-sdk-golang/service/maas/v2" ) func main() { r := client.NewInstance("maas-api.ml-platform-cn-beijing.volces.com", "cn-beijing") // fetch ak&sk from environmental variables r.SetAccessKey(os.Getenv("VOLC_ACCESSKEY")) r.SetSecretKey(os.Getenv("VOLC_SECRETKEY")) req := &api.ChatReq{ Messages: []*api.Message{ { Role: api.ChatRoleUser, Content: "天为什么这么蓝?", }, { Role: api.ChatRoleAssistant, Content: "因为有你", }, { Role: api.ChatRoleUser, Content: "花儿为什么这么香?", }, }, } endpointId := "{YOUR_ENDPOINT_ID}" TestNormalChat(r, endpointId, req) TestStreamChat(r, endpointId, req) } func TestNormalChat(r *client.MaaS, endpointId string, req *api.ChatReq) { got, status, err := r.Chat(endpointId, req) if err != nil { errVal := &api.Error{} if errors.As(err, &errVal) { // the returned error always type of *api.Error fmt.Printf("meet maas error=%v, status=%d\n", errVal, status) } return } fmt.Println("chat answer", mustMarshalJson(got)) } func TestStreamChat(r *client.MaaS, endpointId string, req *api.ChatReq) { ch, err := r.StreamChat(endpointId, req) if err != nil { errVal := &api.Error{} if errors.As(err, &errVal) { // the returned error always type of *api.Error fmt.Println("meet maas error", errVal.Error()) } return } for resp := range ch { if resp.Error != nil { // it is possible that error occurs during response processing fmt.Println(mustMarshalJson(resp.Error)) return } fmt.Println(mustMarshalJson(resp)) // last response may contain `usage` if resp.Usage != nil { // last message, will return full response including usage, role, finish_reason, etc. fmt.Println(mustMarshalJson(resp.Usage)) } } } func mustMarshalJson(v interface{}) string { s, _ := json.Marshal(v) return string(s) }
注意
目前仅支持 python>=3.5
。
''' Usage: 1. python3 -m pip install --user volcengine 2. VOLC_ACCESSKEY=XXXXX VOLC_SECRETKEY=YYYYY python main.py ''' import os from volcengine.maas.v2 import MaasService from volcengine.maas import MaasException, ChatRole def test_chat(maas, endpoint_id, req): try: resp = maas.chat(endpoint_id, req) print(resp) except MaasException as e: print(e) def test_stream_chat(maas, endpoint_id, req): try: resps = maas.stream_chat(endpoint_id, req) for resp in resps: print(resp) except MaasException as e: print(e) if __name__ == '__main__': maas = MaasService('maas-api.ml-platform-cn-beijing.volces.com', 'cn-beijing') maas.set_ak(os.getenv("VOLC_ACCESSKEY")) maas.set_sk(os.getenv("VOLC_SECRETKEY")) # document: "https://www.volcengine.com/docs/82379/1099475" # chat req = { "messages": [ { "role": ChatRole.USER, "content": "天为什么这么蓝" }, { "role": ChatRole.ASSISTANT, "content": "因为有你" }, { "role": ChatRole.USER, "content": "花儿为什么这么香?" }, ] } endpoint_id = "{YOUR_ENDPOINT_ID}" test_chat(maas, endpoint_id, req) test_stream_chat(maas, endpoint_id, req) #
/* # pom.xml <dependency> <groupId>com.volcengine</groupId> <artifactId>volc-sdk-java</artifactId> <version>LATEST</version> </dependency> */ package com.volcengine.example.maas.v2; import com.volcengine.model.maas.api.v2.*; import com.volcengine.service.maas.MaasException; import com.volcengine.service.maas.v2.MaasService; import com.volcengine.service.maas.v2.impl.MaasServiceImpl; import java.util.ArrayList; import java.util.Arrays; import java.util.stream.Stream; public class ChatV2Demo { public static void main(String[] args) { MaasService maasService = new MaasServiceImpl("maas-api.ml-platform-cn-beijing.volces.com", "cn-beijing"); // fetch ak&sk from environmental variables maasService.setAccessKey(System.getenv("VOLC_ACCESSKEY")); maasService.setSecretKey(System.getenv("VOLC_SECRETKEY")); ChatReq req = new ChatReq() .withMessages(new ArrayList<>(Arrays.asList( new Message().withRole(Message.ChatRole.USER).withContent("天为什么这么蓝?"), new Message().withRole(Message.ChatRole.ASSISTANT).withContent("因为有你"), new Message().withRole(Message.ChatRole.USER).withContent("什么是中国") ))); String endpointId = "${YOUR_ENDPOINT_ID}"; testChat(maasService, endpointId, req); testStreamChat(maasService, endpointId, req); } private static void testChat(MaasService maasService, String endpointId, ChatReq req) { try { ChatResp resp = maasService.chat(endpointId, req); System.out.println(resp.getChoices().get(0).getMessage().getContent()); System.out.println(resp.getUsage()); } catch (MaasException e) { System.out.println("req_id: " + e.getRequestId()); System.out.println("code: " + e.getCode()); System.out.println("code_n: " + e.getCodeN()); System.out.println("message: " + e.getMsg()); e.printStackTrace(); } } private static void testStreamChat(MaasService maasService, String endpointId, ChatReq req) { Stream<ChatResp> resps = null; try { resps = maasService.streamChat(endpointId, req); } catch (MaasException e) { e.printStackTrace(); } assert resps != null; // it is possible that error occurs during response processing try { resps.forEach(resp -> { System.out.println(resp.getChoices().get(0).getMessage().getContent()); // last message, will return full response including usage, role, finish_reason, etc. if (resp.getUsage() != null && resp.getUsage().getTotalTokens() > 0) { System.out.println(resp.getUsage()); } }); } catch (RuntimeException e) { Throwable cause = e.getCause(); if (cause instanceof MaasException) { System.out.println("req_id: " + ((MaasException) cause).getRequestId()); System.out.println("code: " + ((MaasException) cause).getCode()); System.out.println("code_n: " + ((MaasException) cause).getCodeN()); System.out.println("message: " + ((MaasException) cause).getMsg()); } System.out.println("caught: " + e); } } }
主要参考 OpenAI 和 HuggingFace
Parameters 记录可选控制参数,具体哪些参数可用依赖模型服务(模型详情页会描述哪些参数可用)
字段 | 类型 | 描述 |
---|---|---|
messages (required) | list |
|
stream | boolean | 是否流式返回。如果为 true,则按 SSE 协议返回数据 |
parameters.max_new_tokens | integer | 最多新生成 token 数(不包含 prompt 的 token 数目),和 |
parameters.temperature | number | 采样温度,(0, 1.0] |
parameters.top_p | number | 核采样,[0, 1.0] |
parameters.top_k | integer | top-k-filtering 算法保留多少个 最高概率的词 作为候选,正整数。 |
parameters.stop | list | 用于指定模型在生成响应时应停止的标记。当模型生成的响应中包含这些标记时,生成过程将停止 |
parameters.logit_bias | map<string,number> | 接受一个map,该对象将token(token id使用tokenization接口获取)映射到从-100到100的关联偏差值。每个模型的效果有所不同,但-1和1之间的值会减少或增加选择的可能性;-100或100应该导致禁止或排他选择相关的token。 |
字段 | 类型 | 描述 |
---|---|---|
req_id | string | 请求 id |
choice | object |
|
usage | object |
|
error(optioanl) | object |
|
在 stream 模式下,基于 SSE (Server-Sent Events) 协议返回生成内容,每次返回结果为生成的部分内容片段:
内容片段按照生成的先后顺序返回,完整的结果需要调用者拼接才能得到;
如果流式请求开始时就出现错误(如参数错误),HTTP返回非200,方法调用也会直接返回错误;
如果流式过程中出现错误,HTTP 依然会返回 200, 错误信息会在一个片段返回。