在 Trino 或 Presto 上可以实现对 Icerberg 表的操作。火山引擎 E-MapReduce(EMR)默认配置了Trino(或 Presto)和 Icerberg 的连接器。本文介绍下在 Trino 中操作 Iceberg 表,同样的方法也可以操作在 Presto 中操作。
参考 Trino组件操作, 在 Trino的 cli 命令后面添加下 Iceberg 的连接信息
--catalog iceberg --schema default
参数描述说明如下:
参数 | 说明 |
---|---|
catalog iceberg: | iceberg 为使用的 Iceberg 的 catalog 名称 |
schema default | default 为使用的 Iceberg 的 schema 名称 |
CREATE TABLE iceberg.default.customer_orders ( order_id BIGINT, order_date DATE, account_number BIGINT, customer VARCHAR, country VARCHAR ) WITH ( format = 'ORC', partitioning = ARRAY['bucket(account_number, 10)','country'] )
INSERT INTO iceberg.default.customer_orders VALUES (2, CAST('2021-01-12' AS DATE), 670011, 'bcd', 'ca'); INSERT INTO iceberg.default.customer_orders VALUES (2, CAST('2021-01-12' AS DATE), 670012, 'bcde', 'cn');
查询语句示例:
SELECT * FROM iceberg.default.customer_orders;
输出示例:
order_id | order_date | account_number | customer | country ----------+------------+----------------+----------+--------- 1 | 2021-10-11 | 1002123 | abc | us 2 | 2021-01-12 | 670011 | bcd | ca 2 | 2022-02-11 | 9070011 | efg | cn 2 | 2022-01-01 | 560012 | zxc | us (4 rows)
增加筛选条件:
SELECT * FROM iceberg.default.customer_orders WHERE account_number = 670012;
EXPLAIN ANALYZE SELECT * SELECT iceberg.default.customer_orders WHERE account_number = 670012;
DESC customer_orders;
查看快照:
SELECT * FROM "customer_orders$snapshots"
输出示例:
committed_at | snapshot_id | parent_id | operation | ---------------------------------------+---------------------+---------------------+-----------+------------------------------------------------------------------ 2022-01-13 13:34:00.657 Asia/Shanghai | 1942301704603642054 | NULL | append | hdfs://emr-master-1-1:8020/warehouse/tablespace/managed/hive/custom 2022-01-13 13:39:18.665 Asia/Shanghai | 2127546335909550653 | 1942301704603642054 | append | hdfs://emr-master-1-1:8020/warehouse/tablespace/managed/hive/custom 2022-01-17 19:56:28.807 Asia/Shanghai | 6215604771056148566 | 2127546335909550653 | append | hdfs://emr-master-1-1:8020/warehouse/tablespace/managed/hive/custom 2022-01-17 19:57:12.160 Asia/Shanghai | 7079051850206849569 | 6215604771056148566 | append | hdfs://emr-master-1-1:8020/warehouse/tablespace/managed/hive/custom 2022-01-17 19:57:55.779 Asia/Shanghai | 3727192433427494890 | 7079051850206849569 | append | hdfs://emr-master-1-1:8020/warehouse/tablespace/managed/hive/custom (5 rows)
时间旅行查询:
SELECT count(*) FROM "customer_orders@6215604771056148566"; _col0 ------- 2 (1 row) SELECT count(*) FROM "customer_orders@3727192433427494890"; _col0 ------- 4 (1 row)
CALL iceberg.system.rollback_to_snapshot('default', 'customer_orders', 3727192433427494890);
SELECT * FROM "customer_orders$partitions";
partition | record_count | file_count | total_size | ---------------------------------------+--------------+------------+------------+--------------------------------------------------------------------------------- {account_number_bucket=9, country=us} | 1 | 1 | 714 | {order_id={min=2, max=2, null_count=0}, order_date={min=2022-01-01, max=2022-01- {account_number_bucket=2, country=ca} | 1 | 1 | 711 | {order_id={min=2, max=2, null_count=0}, order_date={min=2021-01-12, max=2021-01- {account_number_bucket=1, country=us} | 1 | 1 | 711 | {order_id={min=1, max=1, null_count=0}, order_date={min=2021-10-11, max=2021-10- {account_number_bucket=5, country=cn} | 1 | 1 | 716 | {order_id={min=2, max=2, null_count=0}, order_date={min=2022-02-11, max=2022-02-