分词 API--火山方舟大模型服务平台-火山引擎

文档中心

导航

分词 API

最近更新时间：2025.04.09 16:00:44首次发布时间：2024.09.05 11:06:24

POST https://ark.cn-beijing.volces.com/api/v3/tokenization

调用本接口，可以将文本转换为模型可理解的 token id，并返回文本的 tokens 数量、token id、 token 在原始文本中的偏移量等信息。

调试

API Explorer

您可以通过 API Explorer 在线发起调用，无需关注签名生成过程，快速获取调用结果。

去调试

鉴权方式

本接口支持 API Key 鉴权方式，详见签名鉴权方式。

请求参数

请求体

参数名称	类型	是否必填	描述	示例值
model	String	是	本次请求使用模型的 Model ID 或推理接入点 (Endpoint ID)。	doubao-pro-32k-241215 或 ep-20240918***_***
text	String of Array String	是	需要分词的内容列表	["天空为什么这么蓝", "花儿为什么这么香"]

响应参数

请求响应

参数名称	类型	描述	示例值
id	String	本次请求的唯一标识	2024122611112****
model	String	本次请求实际使用的模型名称和版本	doubao-pro-32k-241215
created	Integer	本次请求创建时间的 Unix 时间戳（秒）	1724902147
object	String	固定为`list`	list
data	Array of Tokenization	本次请求的分词输出内容	-

数据结构

Tokenization

参数名称	类型	描述	示例值
index	Integer	分词结果的序号，与请求参数`text`列表中的内容顺序对应	0
object	String	固定为 `tokenization`	tokenization
total_tokens	Integer	对应内容的总 token 数量	4
token_ids	Array of Integer	对文本进行分词后的具体词语在词表中的 id 列表	[14539, 4752, 5189, 5399]
offset_mapping	Array of Array of Integer	对文本进行分词后的词语偏移量，列表中每个元素是一个包含两个整数的列表：第一个整数表示词或标记在原始文本中的起始索引（是从0开始），第二个整数表示结束索引（不包括该索引处的字符）	[[0, 2], [2, 5], [5, 7], [7, 8]]

请求示例

curl -X POST  https://ark.cn-beijing.volces.com/api/v3/tokenization \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ea764f0f-3b60-45b3-****-************" \
  -d '{
    "model":"doubao-pro-32k-241215",
    "text":["天空为什么这么蓝"]
  }'

响应示例

{
        "object": "list",
        "id": "021718067849899d92fcbe0865fdffdde********************",
        "model": "doubao-pro-32k-241215",
        "data": [
                {
                        "object": "tokenization",
                        "index": 0,
                        "total_tokens": 4,
                        "token_ids": [
                                14539,
                                4752,
                                5189,
                                5399
                        ],
                        "offset_mapping": [
                                [
                                        0,
                                        2
                                ],
                                [
                                        2,
                                        5
                                ],
                                [
                                        5,
                                        7
                                ],
                                [
                                        7,
                                        8
                                ]
                        ]
                }
        ],
        "created": 1724902147
}

错误处理

错误响应

本接口调用失败的返回结构和参数释义请参见返回结构文档。

错误码

本接口错误码请参见公共错误码文档。