Ray访问TOS使用示例--E-MapReduce-火山引擎

文档中心

立即注册

导航

E-MapReduce

Ray访问TOS使用示例

最近更新时间：2024.05.13 11:37:04首次发布时间：2024.04.10 13:28:29

在Ray中可以通过pyarrow.fs.S3FileSystem方式访问对象存储TOS的数据，也可以通过HDFS协议访问对象存储TOS的数据。本章节介绍下这两种的操作示例。

采用pyarrow.fs.S3FileSystem方式访问TOS示例

定义对象存储使用的FileSystem，并填写access_key、secret_key、endpoint_override信息

# 配置正确的ak、sk和endpoint：
import ray
from pyarrow import fs
s3 = fs.S3FileSystem(access_key=xxxx, secret_key=xxxx,endpoint_override='tos-s3-cn-beijing.ivolces.com', force_virtual_addressing=True)

示例中：access_key、secret_key是访问对象存储TOS的密钥Access Key ID和Secret Access Key，可以在密钥列表中获取；endpoint_override是TOS的访问域名S3 Endpoint值（参考地域和访问域名），需根据实际情况填写。

将数据存储到对象存储中：

ds = ray.data.range(100)
ds.write_csv(path = "{bucket_name}/test_tos_data", filesystem = s3)

示例中{bucket_name}，是访问TOS得桶名字。

读取存储在对象存储的CSV文件，并打印一行

ds = ray.data.read_csv(paths = "{bucket_name}/test_tos_data", filesystem = s3)
ds.show(limit=1)

采用HDFS协议方式访问TOS示例

定义pyarrow.fs.HadoopFileSystem对象，并填写endpoint、access-key-id、secret-access-key、bucket_name信息

from pyarrow import fs
import ray
ray.init()

hdfs_fs = fs.HadoopFileSystem(host='tos://{bucket_name}', extra_conf={
       "fs.AbstractFileSystem.tos.impl":"io.proton.tos.TOS",
       "fs.tos.impl":"io.proton.fs.RawFileSystem",
       "fs.tos.endpoint":"tos-cn-beijing.ivolces.com",
       "fs.tos.credentials.provider":"io.proton.common.object.tos.auth.DefaultCredentialsProviderChain",
       "fs.tos.access-key-id":"xxxx",
       "fs.tos.secret-access-key":"xxxx",
       })

示例中：fs.tos.access-key-id、fs.tos.secret-access-key是访问对象存储TOS的密钥Access Key ID和Secret Access Key，可以在密钥列表中获取；fs.tos.endpoint是TOS的访问域名Endpoint值；{bucket_name}，是访问TOS得桶名字。需根据实际情况填写。

将数据存储到对象存储中

ds = ray.data.range(100)
ds.write_csv(path = "/test_data", filesystem = hdfs_fs)

读取存储在对象存储的CSV文件，并打印一行

ds = ray.data.read_csv(paths = "/test_data", filesystem = hdfs_fs)
ds.show(limit=1)