请先查看接入必读了解具体接入方式,再参考此文档完成接入。
payload
字段为将请求参数序列化后的json文本参考详细说明功能调用-通用协议-WebSocket。
payload
配置参数为json字符串格式
字段 | 描述 | 类型 | 是否必传 | 默认值 |
---|---|---|---|---|
text | 输入文本 | string | 否。text与ssml字段至少一个非空,若二者都非空则按照ssml字段 | - |
ssml | 输入文本(SSML格式),与text字段至少一个非空 | string | 否。text与ssml字段至少一个非空,若二者都非空则按照ssml字段 | - |
speaker | 发音人,具体见附录:发音人列表 | string | 是 | - |
audio_config | 补充参数 | object | 否 | |
audio_config.format | 输出音频编码格式,wav/mp3/aac | string | 否 | mp3 |
audio_config.sample_rate | 输出音频采样率,可选值 [8000,16000,22050,24000,32000,44100,48000] | number | 否 | 24000 |
audio_config.speech_rate | 语速,取值范围[-50,100],100代表2.0倍速,-50代表0.5倍数 | number | 否 | 0 |
audio_config.pitch_rate | 音调,取值范围[-12,12] | number | 否 | 0 |
audio_config.enable_timestamp | 是否选择同时返回字与音素时间戳 | bool | 否 | false |
示例:
{ "text": "欢迎使用文本转语音服务。", "speaker": "zh_female_qingxin", "audio_config": { "format": "wav", "sample_rate": 16000 } }
响应中不同消息类型的接收:
enable_timestamp=false
时,服务端返回二进制消息类型。文本消息类型响应的定义如下:
字段 | 描述 | 类型 |
---|---|---|
task_id | 请求任务id,用于链路追踪、问题排查 | string |
message_id | 请求任务消息id,用于链路追踪、问题排查 | string |
namespace | 服务接口命名空间,比如TTS | string |
event | 服务请求任务事件,比如StartTask | string |
data | 请求响应二进制数据,标准base64编码 | string |
payload | 请求响应文本信息,json字符串格式 | string |
status_code | 状态码 | number |
status_text | 状态信息 | string |
响应结果payload为json字符串格式,json内容格式如下:
字段 | 描述 | 类型 |
---|---|---|
duration | 音频时长,单位秒 | number |
words | 字的时间戳,单位秒。需要请求参数audio_config.enable_timestamp =true | array |
words.word | 字内容 | string |
words.start_time | 当前字开始时间 | number |
words.end_time | 当前字结束时间 | number |
phonemes | 音素的时间戳,单位秒。需要请求参数audio_config.enable_timestamp =true | array |
phonemes.phone | 音素内容 | string |
phonemes.start_time | 当前音素开始时间 | number |
phonemes.end_time | 当前音素结束时间 | number |
payload示例:
{ "duration": 3.0, "words": [ { "word": "你", "start_time": "0", "end_time": "0.05" }, ... ], "phonemes": [ { "phone": "C0n", "start_time": "0", "end_time": "0.025" }, ... ] }
流式调用方式参考公共WebSocket流式协议
// Code sample: // use websocket client to invoke SAMI Streaming Service package main import ( "bytes" "encoding/json" "flag" "fmt" "io/ioutil" "log" "net/url" "os" "os/signal" "time" "github.com/gorilla/websocket" ) type WebSocketRequest struct { Token string `header:"SAMI-Token,required" json:"token,required" query:"token,required"` Appkey string `json:"appkey,required" query:"appkey,required" vd:"$!=''"` Namespace string `json:"namespace,required" query:"namespace,required" vd:"$!=''"` Version string `json:"version,omitempty" query:"version"` Event string `json:"event,omitempty" query:"event"` Payload *string `form:"payload" json:"payload,omitempty"` Data []byte `form:"data" json:"data,omitempty"` TaskId string `json:"task_id,omitempty" query:"task_id"` } type WebSocketResponse struct { TaskId string `form:"task_id,required" json:"task_id,required" query:"task_id,required"` MessageId string `form:"message_id,required" json:"message_id,required" query:"message_id,required"` Namespace string `form:"namespace,required" json:"namespace,required" query:"namespace,required"` Event string `form:"event,required" json:"event,required" query:"event,required"` StatusCode int32 `form:"status_code,required" json:"status_code,required" query:"status_code,required"` StatusText string `form:"status_text,required" json:"status_text,required" query:"status_text,required"` Payload *string `form:"payload,omitempty" json:"payload,omitempty" query:"payload,omitempty"` Data []byte `form:"data,omitempty" json:"data,omitempty" query:"data,omitempty"` } const ( EventStartTask = "StartTask" EventTaskStarted = "TaskStarted" EventFinishTask = "FinishTask" EventTaskFinished = "TaskFinished" ) var ( // websocket domain addr = flag.String("addr", "sami.bytedance.com", "http service address") // user auth token = "your_token" appkey = "your_appkey" u url.URL c *websocket.Conn interrupt chan os.Signal done chan struct{} err error inputFile = "input.pcm" outputFile = "output.wav" ) func main() { flag.Parse() log.SetFlags(0) interrupt = make(chan os.Signal, 1) signal.Notify(interrupt, os.Interrupt) done = make(chan struct{}) u = url.URL{Scheme: "wss", Host: *addr, Path: "/api/v1/ws"} log.Printf("connecting to %s", u.String()) c, _, err = websocket.DefaultDialer.Dial(u.String(), nil) if err != nil { log.Fatal("dial:", err) } defer c.Close() streamingTTSTest("zh_female_qingxin", "Hello, 欢迎使用文本转语音服务", "wav", 16000, true) // streamingTTSTest("zh_female_qingxin", "Hello, 欢迎使用文本转语音服务", "wav", 16000, false) } func readTaskStartedEvent() error { msgType, message, err := c.ReadMessage() if err != nil { log.Println("read TaskStarted event failed, ", err) return err } if msgType != websocket.TextMessage { log.Println("read TaskStarted event failed, message type not TextMessage: ", msgType) return fmt.Errorf("MessageTypeNotMatch") } taskStartedEvent := &WebSocketResponse{} err = json.Unmarshal(message, taskStartedEvent) if err != nil { return err } if taskStartedEvent.Event != EventTaskStarted { log.Printf("read TaskStarted event failed, event type not match: %+v", *taskStartedEvent) return fmt.Errorf("EventTypeNotMatch") } log.Printf("read TaskStarted event: %+v", *taskStartedEvent) return nil } func streamingTTSTest(speaker, text, format string, sampleRate int, enableTimestamp bool) { var buf bytes.Buffer defer func() { log.Println("save bytes into file:", buf.Len()) if buf.Len() > 0 { _ = ioutil.WriteFile(outputFile, buf.Bytes(), 0644) } }() // send control message payloadStr := fmt.Sprintf( `{"audio_config":{"format":"%v","speech_rate":0, "sample_rate":%v, "enable_timestamp":%t},"speaker":"%v","text":"%v"}`, format, sampleRate, enableTimestamp, speaker, text, ) controlReq := &WebSocketRequest{ Token: token, TaskId: "test_mock", Appkey: appkey, Namespace: "TTS", Event: EventStartTask, Payload: &payloadStr, } controlMsg, _ := json.Marshal(controlReq) err = c.WriteMessage(websocket.TextMessage, controlMsg) if err != nil { log.Println("write:", err) return } if err = readTaskStartedEvent(); err != nil { log.Println("read failed, ", err) return } controlReq.Event = EventFinishTask controlMsg, _ = json.Marshal(controlReq) err = c.WriteMessage(websocket.TextMessage, controlMsg) if err != nil { log.Println("write:", err) return } go func() { defer close(done) isFirst := true startTime := time.Now() for { mt, message, err := c.ReadMessage() if err != nil { log.Println("read:", err) return } if isFirst { startTime = time.Now() isFirst = false } if mt == websocket.BinaryMessage { log.Printf("recv: byte[%v]", len(message)) buf.Write(message) } else { wsResp := WebSocketResponse{} ttsPayload := "" err := json.Unmarshal(message, &wsResp) if err != nil { log.Printf("recv text message, parse failed") } if wsResp.Event == EventTaskFinished { log.Printf( "recv TaskFinished event: %+v, cost_time=%v", wsResp, time.Since(startTime).Milliseconds(), ) return } if wsResp.Payload != nil { ttsPayload = *wsResp.Payload } buf.Write(wsResp.Data) log.Printf("recv: data=byte[%d], payload=%v", len(wsResp.Data), ttsPayload) } } }() for { select { case <-done: return case <-interrupt: log.Println("interrupt") // Cleanly close the connection by sending a close message and then // waiting (with timeout) for the server to close the connection. err := c.WriteMessage( websocket.CloseMessage, websocket.FormatCloseMessage(websocket.CloseNormalClosure, ""), ) if err != nil { log.Println("write close:", err) return } select { case <-done: case <-time.After(5 * time.Second): } return } } }
HTTP状态码 | 业务状态码 | 错误信息 | 错误说明 | 解决办法 |
---|---|---|---|---|
400 | 40402004 | TTSInvalidSpeaker | TTS 发音人设置无效 | 检查TTS 发音人是否正确设置 |
400 | 40402001 | TTSEmptyText | TTS未设置文本 | 检查TTS文本是否设置 |
400 | 40402002 | TTSInvalidText | TTS设置文本非法 | 检查TTS文本与发音人可能不匹配、无可读内容 |
400 | 40402003 | TTSExceededTextLimit | TTS文本长度超限 | 检查TTS文本是否超限。非流式接口上限为 1000 个utf-8字符;流式接口上限为 2000 个utf-8字符(包括空格、标点、汉字、字母等) |