HDFS 外表--ByteHouse云数仓版-火山引擎

文档中心

导航

HDFS 外表

最近更新时间：2024.11.06 13:58:49首次发布时间：2024.11.06 13:58:49

这个引擎提供了与 Apache Hadoop 生态系统的集成，允许通过 ByteHouse 管理 HDFS 上的数据。S3 外表

用法 - 表引擎

# 创建单文件表
CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=CnchHDFS('hdfs//ip:port/prefix/file.csv', 'CSV', '[compression]') settings key=value

# 使用wildcard创建多文件表
CREATE TABLE hdfs_engine_table_glob_uri (name String, value UInt32) ENGINE=CnchHDFS('hdfs//ip:port/prefix/test/file*.csv', 'CSV'，'[compression]') settings key=value

# 查询
select * from hdfs_engine_table
select * from hdfs_engine_table_glob_uri where _file = 'fileSimple*.csv'

# 插入
insert into hdfs_engine_table values ('one', 1), ('two', 2), ('three', 3)

查询支持通过 _path和_file 列进行裁剪。

CnchHDFS引擎参数

hdfs_path: 具体到某个路径或者使用通配符选择多个文件
Format: 文件格式参考CNCH S3/HDFS 外表使用
- CSV
- JSON
- JSONEachRow
- Parquet
- ORC
Compression: 支持gzip/zstd/lz4/snappy/bzip2/xz/brotli。不指定时从文件后缀中推测

用法 - table function

# 查询
select * from CnchHDFS('hdfs//ip:port/prefix/test/file.csv', 'city String, name String', '[format]', '[compression]')
select * from CnchHDFS('hdfs//ip:port/prefix/test/file*.csv', 'city String, name String', '[format]', '[compression]') where _path like '%test%'


# 插入
insert into function CnchHDFS('hdfs//ip:port/prefix/test/file.csv', 'city String, name String', '[format]', '[compression]') values ('one', '1'), ('two', '2'), ('three', '3')

# 分区插入
# 请确保路径中包含'{_partition_id}'字段
insert into function CnchHDFS('hdfs//ip:port/prefix/test/{_partition_id}_file.csv', 'city String, name String', '[format]', '[compression]') partition by city values ('one', '1'), ('two', '2'), ('three', '3')

Format 和 Compression 为可选参数。需要指定数据的 schema。