S3 外表--ByteHouse云数仓版-火山引擎

文档中心

导航

S3 外表

最近更新时间：2024.11.06 13:58:49首次发布时间：2024.11.06 13:58:49

引擎提供了与对象存储的集成，允许通过 ByteHouse 管理S3 上的数据

用法 - 表引擎

-- 创建单文件表
CREATE TABLE s3_engine_table (name String, value UInt32) ENGINE=CnchS3('http://....' or 'https://' or 's3://', '[format]'，'[compression]'，'[ak_id, ak_secret]') 

-- 使用wildcard创建多文件表
CREATE TABLE s3_engine_table_glob_uri (name String, value UInt32) ENGINE=CnchS3('http://....' or 'https://' or 's3://', '[format]'，'[compression]', '[ak_id, ak_secret]') 

-- 查询
select * from s3_engine_table
select * from s3_engine_table_glob_uri where _file = 'fileSimple*.csv'

-- 插入
insert into s3_engine_table values ('one', 1), ('two', 2), ('three', 3)

查询支持通过 _path和_file 列进行裁剪。

CnchS3引擎参数

s3_path:
- s3://bucket/file_path 会使用 Bytehouse 集群默认配置的 S3 存储
- http[s]://s3_endpoints/bucket/prefix/*.csv 会访问路径指定的对象存储
Format: 文件格式参考CNCH S3/HDFS 外表使用
- CSV
- JSON
- JSONEachRow
- Parquet
- ORC
Compression: 支持gzip/zstd/lz4/snappy/bzip2/xz/brotli。不指定时从文件后缀中推测
access_id, access_key : 用户具有访问对象存储的密钥。不指定时使用 Bytehouse 配置的默认密钥或者通过settings s3_access_key_id = xx, s3_access_key_secret = xxx指定

用法 - table function

# 查询
select * from CnchS3('http://s3.region.amazonaws.com.cn/bucket/normal.csv', 'name String, id String')
select * from CnchS3('http://s3.region.amazonaws.com.cn/bucket/*.csv', 'name String, id String') where _path like '%test%'


# 插入
insert into function  CnchS3('http://s3.region.amazonaws.com.cn/bucket/normal_write.csv', 'name String, id String') values ('one', '1'), ('two', '2'), ('three', '3')

# 分区插入
# 请确保路径中包含'{_partition_id}'字段
insert into function
    CnchS3('http://s3.region.amazonaws.com.cn/bucket/{_partition_id}_partition_write.csv', 'name String, id String')
    partition by name
values ('one', '1'),('one', '2'),('two', '1'),('two', '2');

select * from jiashuo_db.s3_glob where _file like '%partition_write%'

Format 和 Compression 为可选参数。需要指定数据的 schema。