自动字幕打轴--豆包语音-火山引擎

文档中心

立即注册

导航

自动字幕打轴

最近更新时间：2023.09.18 19:19:36首次发布时间：2022.10.31 17:19:48

1. 流程简介

自动字幕打轴功能整体处理流程分为三个阶段：

客户端抽取视频中音轨，转成音频文件；
把音频文件和字幕文本发送至后端集群，获取任务 ID；
通过任务 ID 访问后端接口获取结果。

非阻塞查询流程

alt

阻塞查询流程

alt

2. 鉴权

设置鉴权内容，请参考鉴权方法。

3. 提交音频

3.1 请求

请求地址：http://openspeech.bytedance.com/api/v1/vc/ata/submit
请求方式：HTTP POST

3.1.1 音频二进制请求方式

Header 需要加入内容类型标识：

Content-Type: audio/wav

Url 参数如下所示：

字段	说明	是否必填	备注
appid	应用标识	✓	用于标识当前应用。
caption_type	字幕识别类型	✓	speech（说话）或 singing（唱歌）。
audio_text	音频字幕文本	✓	用于打轴的字幕文本
sta_punc_mode	打轴服务标点模式		默认值为'1'（省略打轴结果句级别末尾逗号句号）可选'2'（省略打轴结果句级别某些标点，使用空格代替）可选'3'（保留原文本完整标点）
~~caption_category~~	~~字幕输出模式~~		~~2（固定传入值）~~。
~~cluster~~	~~请求集群~~		~~ata_cluster（固定传入值）~~

Body 直接传输音频二进制数据

请求示例：

POST /api/v1/vc/ata/submit?appid=your_appid&caption_type=speech HTTP/1.1
Host: openspeech.bytedance.com
Content-Type: multipart/form-data;boundary="boundary"

--boundary
Content-Disposition: form-data; name="data"; filename="talk.wav"
Content-Type: audio/wav
--boundary
Content-Disposition: form-data; name="audio-text"

hello world
--boundary--

3.1.2 音频地址请求方式

Header 需要加入内容类型标识：

Content-Type: application/json

Url 参数同上。
Body 为 JSON 格式字符串，参数如下所示：

字段	说明	是否必填	备注
url	音频地址	✓	用于标识当前应用。

请求示例：

POST /api/v1/ata/submit?appid=your_appid&caption_type=speech HTTP/1.1
Host: openspeech.bytedance.com
Content-Type: application/json
Connection: keep-alive
Content-Length: xxxxx

{
    "url":"http://xxx.com/talk.wav",
    "audio_text": "hello world",
}

3.2 应答

应答格式： JSON

应答字段：

字段	说明	层级	格式	是否必填	备注
code	状态码	1	int	✓	0 为成功，非 0 为失败。
message	状态信息	1	string	✓	失败时标记失败原因。
id	任务 ID	1	string		仅当提交成功时填写。

应答示例：

{
    "code": "0",
    "message": "Success",
    "id": "fc5aa03e-6ae4-46a3-b8cf-1910a44e0d8a"
}

4. 查询结果

4.1 请求

请求地址：http://openspeech.bytedance.com/api/v1/vc/ata/query

请求方式：HTTP GET

字段	说明	是否必填	备注
appid	应用标识	✓	用于标识当前应用。
id	任务 ID	✓	这里填写的是submit接口返回的id。
blocking	查询结果时是否阻塞		0表示非阻塞，1表示阻塞（默认是阻塞模式）。

请求示例：

GET /api/v1/vc/ata/query?appid=your_appid&id=fc5aa03e-6ae4-46a3-b8cf-1910a44e0d8a HTTP/1.1
Host: openspeech.bytedance.com

4.2 应答

应答格式：JSON

应答字段：

字段	说明	层级	格式	是否必填	备注
id	任务 ID	1	string	✓
code	状态码	1	int	✓	0 为成功，非 0 为失败。详情请参考错误码。
message	状态信息	1	string	✓
utterances	分句结果	1	list		仅当成功时填写。
start_time	起始时间	2	int		距离音频开始的毫秒偏移值。
end_time	结束时间	2	int		距离音频开始的毫秒偏移值。
text	文本	2	string
words	词粒度信息	2	list

应答示例：

{
    "id": "d22cca84-8c8a-4d15-aa2c-ac550518d5ae",
    "code": 0,
    "message": "Success",
    "duration": 5.3174375,
    "utterances": [
        {
            "text": "如果您没有其他需要举报的话这边就先挂断了",
            "start_time": 0,
            "end_time": 3197,
            "words": [
                {
                    "text": "如",
                    "start_time": 0,
                    "end_time": 208
                },
                {
                    "text": "果",
                    "start_time": 208,
                    "end_time": 317
                },
                {
                    "text": "您",
                    "start_time": 322,
                    "end_time": 460
                },
                {
                    "text": "没",
                    "start_time": 460,
                    "end_time": 580
                },
                {
                    "text": "有",
                    "start_time": 580,
                    "end_time": 717
                },
                {
                    "text": "其",
                    "start_time": 722,
                    "end_time": 877
                },
                {
                    "text": "他",
                    "start_time": 882,
                    "end_time": 1037
                },
                {
                    "text": "需",
                    "start_time": 1042,
                    "end_time": 1180
                },
                {
                    "text": "要",
                    "start_time": 1180,
                    "end_time": 1317
                },
                {
                    "text": "举",
                    "start_time": 1322,
                    "end_time": 1477
                },
                {
                    "text": "报",
                    "start_time": 1482,
                    "end_time": 1637
                },
                {
                    "text": "的",
                    "start_time": 1642,
                    "end_time": 1780
                },
                {
                    "text": "话",
                    "start_time": 1780,
                    "end_time": 1917
                },
                {
                    "text": "这",
                    "start_time": 2042,
                    "end_time": 2180
                },
                {
                    "text": "边",
                    "start_time": 2180,
                    "end_time": 2317
                },
                {
                    "text": "就",
                    "start_time": 2322,
                    "end_time": 2460
                },
                {
                    "text": "先",
                    "start_time": 2460,
                    "end_time": 2597
                },
                {
                    "text": "挂",
                    "start_time": 2602,
                    "end_time": 2757
                },
                {
                    "text": "断",
                    "start_time": 2802,
                    "end_time": 2957
                },
                {
                    "text": "了",
                    "start_time": 3042,
                    "end_time": 3197
                }
            ]
        },
        {
            "text": "祝您生活愉快再见",
            "start_time": 3442,
            "end_time": 4877,
            "words": [
                {
                    "text": "祝",
                    "start_time": 3442,
                    "end_time": 3580
                },
                {
                    "text": "您",
                    "start_time": 3580,
                    "end_time": 3717
                },
                {
                    "text": "生",
                    "start_time": 3722,
                    "end_time": 3877
                },
                {
                    "text": "活",
                    "start_time": 3882,
                    "end_time": 4020
                },
                {
                    "text": "愉",
                    "start_time": 4020,
                    "end_time": 4157
                },
                {
                    "text": "快",
                    "start_time": 4162,
                    "end_time": 4317
                },
                {
                    "text": "再",
                    "start_time": 4562,
                    "end_time": 4717
                },
                {
                    "text": "见",
                    "start_time": 4722,
                    "end_time": 4877
                }
            ]
        }
    ]
}

5. 错误码

类别/状态号	含义	说明
0	成功
2000	正在处理	任务处理中。
请求类
1001	请求参数无效	请求参数缺失必需字段 / 字段值无效 / 重复请求。
1002	无访问权限	token 无效 / 过期 / 无权访问指定服务。
1003	访问超频	当前 appid 访问 QPS 超出设定阈值。
1004	访问超额	当前 appid 访问次数超出限制。
1005	服务器繁忙	服务过载，无法处理当前请求。
1008 - 1009	保留号段	待定。
音频类
1010	音频过长	音频数据时长超出阈值。
1011	音频过大	音频数据大小超出阈值。
1012	音频格式无效	音频 header 有误 / 无法进行音频解码。
1013	音频静音	音频未识别出任何文本结果。
1014 - 1019	保留号段	待定。
识别类
1020	识别等待超时	识别等待过程超时。
1021	识别处理超时	识别处理过程超时。
1022	识别错误	识别过程中发生错误。
1023 - 1029	保留号段	待定。
1099	未知错误	未归类错误。

7. Demo

以下 demo 均使用 token 鉴权方式。

curl

submit

curl -X POST -H 'Accept: */*' -H 'Authorization: Bearer; ${your_token}' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.22.0' -H 'content-type: application/json' -d '{"url": "${your_url}", "audio_text": "${your_audio_text}"}' 'https://openspeech.bytedance.com/api/v1/vc/ata/submit?appid=${your_appid}&caption_type=speech'

query

curl -X GET -H 'Accept: */*' -H 'Authorization: Bearer; ${your_token}' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.22.0' 'https://openspeech.bytedance.com/api/v1/vc/ata/query?appid=${your_appid}&id=fbf5edcf-378e-4f54-8a7f-5d8fd5e5e2f1'

Python

URL音频版本：
url_sta_demo.py
1.35KB
二进制音频版本：
binary_sta_demo.py
1.39KB

JAVA

URL音频版本：
url_sta_demo.java
2.35KB

CPP

URL音频版本：
url_sta_demo.cc
3.08KB