dolphinscheduler 资源中心支持使用对象存储,从开源官网描述看,已经支持对接 s3,oss 等。EMR on ECS 集成开源组件 dolphinscheduler 3.1.9,为了拓展产品生态,这里对火山 TOS 作为外部对象存储链路进行操作说明。
说明
本文以Proton 2.2.1版本为例。
wget proton https://proton-pkgs.tos-cn-beijing.volces.com/public/proton-2.2.1-bin.tar.gz
示例: plugins 目录下存在 proton-hadoop3-bundle-2.2.1.jar 包。
将 proton-hadoop3-bundle-2.2.1.jar 移动到 api-server,worker-server 的 libs 目录下。
api-server,worker-server 的 common.properties 都需要修改成以下形式。
# resource storage type: HDFS, S3, OSS, NONE resource.storage.type=HDFS # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.storage.upload.base.path=/dolphinscheduler # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path resource.hdfs.root.user=hdfs # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir # 资源存储路径 resource.hdfs.fs.defaultFS=tos://${tos_bucket}/dolphinscheduler # 指定proton实现访问TOS fs.tos.impl=io.proton.fs.ProtonFileSystem fs.AbstractFileSystem.tos.impl=io.proton.fs.ProtonFS # TOS bucket的endpoint, 以北京为例;视情况选择内网或外网域名 fs.tos.endpoint=tos-cn-beijing.volces.com # TOS bucket的region fs.tos.region=cn-beijing # 访问TOS用的AK/SK fs.tos.access-key-id=xxx fs.tos.secret-access-key=xxx