每个表都有对应的系统表,系统表包含每个表相关的元数据及信息,例如创建的快照和使用的选项。用户可以通过批的方式查询访问系统表。目前,Flink、Spark、Trino 和 StarRocks 支持查询系统表。在某些情况下,表名需要用反引号包裹以避免语法解析冲突,例如三重访问模式:
SELECT * FROM my_catalog.my_db.`my_table$snapshots`;
通过快照表可以查询表的快照历史信息,包括快照中发生的记录数。
SELECT * FROM my_table$snapshots; /* +--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+----------------+ | snapshot_id | schema_id | commit_user | commit_identifier | commit_kind | commit_time | base_manifest_list | delta_manifest_list | changelog_manifest_list | total_record_count | delta_record_count | changelog_record_count | watermark | +--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+----------------+ | 2 | 0 | 7ca4cd28-98e... | 2 | APPEND | 2022-10-26 11:44:15.600 | manifest-list-31323d5f-76e6... | manifest-list-31323d5f-76e6... | manifest-list-31323d5f-76e6... | 2 | 2 | 0 | 1666755855600 | | 1 | 0 | 870062aa-3e9... | 1 | APPEND | 2022-10-26 11:44:15.148 | manifest-list-31593d5f-76e6... | manifest-list-31593d5f-76e6... | manifest-list-31593d5f-76e6... | 1 | 1 | 0 | 1666755855148 | +--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+----------------+ 2 rows in set */
通过查询快照表,可以了解表的提交和过期信息,并通过时间旅行访问历史数据。
通过模式表可以查询表的历史模式结构。
SELECT * FROM my_table$schemas; /* +-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+ | schema_id | fields | partition_keys | primary_keys | options | comment | update_time | +-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+ | 0 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-28 11:44:20.600 | | 1 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-27 11:44:15.600 | | 2 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-26 11:44:10.600 | +-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+ 3 rows in set */
可以通过联接快照表和模式表,获取指定快照的字段信息。
SELECT s.snapshot_id, t.schema_id, t.fields FROM my_table$snapshots s JOIN my_table$schemas t ON s.schema_id=t.schema_id where s.snapshot_id=100;
通过选项表可以查询表在DDL中指定的选项信息。未显示的选项将使用默认值。
SELECT * FROM my_table$options; /* +------------------------+--------------------+ | key | value | +------------------------+--------------------+ | snapshot.time-retained | 5 h | +------------------------+--------------------+ 1 rows in set */
如需审计表的变更日志,可使用audit_log
系统表。通过该表,可以获取增量数据的rowkind
列,用于过滤等操作以完成审计。
rowkind
有四种取值:
+I
:插入操作-U
:更新操作(旧值)+U
:更新操作(新值)-D
:删除操作SELECT * FROM my_table$audit_log; /* +------------------+-----------------+-----------------+ | rowkind | column_0 | column_1 | +------------------+-----------------+-----------------+ | +I | ... | ... | +------------------+-----------------+-----------------+ | -U | ... | ... | +------------------+-----------------+-----------------+ | +U | ... | ... | +------------------+-----------------+-----------------+ 3 rows in set */
若需要极致的读取性能且可接受稍旧数据,可使用ro
(读优化)系统表。该表通过仅扫描无需合并的文件提升性能。
对于主键表,ro
系统表仅扫描最顶层文件(即最新 Full Compaction 结果)。注意不同分桶可能在不同时间进行 Full Compaction,因此不同键的值可能来自不同快照。
对于追加表,由于所有文件都无需合并,ro
系统表行为与普通追加表一致。
SELECT * FROM my_table$ro;
可查询表在特定快照下的文件信息。
-- 查询最新快照的文件 SELECT * FROM my_table$files; /* +-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+ | partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time | +-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+ | [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166| | [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166| | [2] | 0 | data-83aa7973-060b-40b6-8c8... | orc | 0 | 0 | 1 | 605 | [d] | [d] | {cnt=0, val=0, word=0} | {cnt=2, val=32, word=d} | {cnt=2, val=32, word=d} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166| | [5] | 0 | data-3d304f4a-bcea-44dc-a13... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=5, val=51, word=c} | {cnt=5, val=51, word=c} | 1691551246788 | 1691551246152 |2023-02-24T16:06:21.166| | [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246722 | 1691551246273 |2023-02-24T16:06:21.166| | [4] | 0 | data-2c9b7095-65b7-4013-a7a... | orc | 0 | 0 | 1 | 593 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=4, val=12, word=a} | {cnt=4, val=12, word=a} | 1691551246321 | 1691551246109 |2023-02-24T16:06:21.166| +-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+ 6 rows in set */ -- 查询指定快照的文件 SELECT * FROM my_table$files /*+ OPTIONS('scan.snapshot-id'='1') */; /* +-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+ | partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time | +-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+ | [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166| | [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166| | [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166| +-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+ 3 rows in set */
通过标签表可以查询表的标签历史信息,包括标签基于的快照和快照的部分历史信息。您还可以获取所有标签名称并按名称时间旅行到特定标签数据。
SELECT * FROM my_table$tags; /* +----------+-------------+-----------+-------------------------+--------------+--------------+ | tag_name | snapshot_id | schema_id | commit_time | record_count | branches | +----------+-------------+-----------+-------------------------+--------------+--------------+ | tag1 | 1 | 0 | 2023-06-28 14:55:29.344 | 3 | [] | | tag3 | 3 | 0 | 2023-06-28 14:58:24.691 | 7 | [branch-1] | +----------+-------------+-----------+-------------------------+--------------+--------------+ 2 rows in set */
可查询包含下一个快照ID的所有消费者。
SELECT * FROM my_table$consumers; /* +-------------+------------------+ | consumer_id | next_snapshot_id | +-------------+------------------+ | id1 | 1 | | id2 | 3 | +-------------+------------------+ 2 rows in set */
可查询表的最新快照或指定快照中包含的所有清单文件。
-- 查询最新快照的清单 SELECT * FROM my_table$manifests; /* +--------------------------------+-------------+------------------+-------------------+---------------+ | file_name | file_size | num_added_files | num_deleted_files | schema_id | +--------------------------------+-------------+------------------+-------------------+---------------+ | manifest-f4dcab43-ef6b-4713... | 12365| 40 | 0 | 0 | | manifest-f4dcab43-ef6b-4713... | 1648 | 1 | 0 | 0 | +--------------------------------+-------------+------------------+-------------------+---------------+ 2 rows in set */ -- 查询指定快照的清单 SELECT * FROM my_table$manifests /*+ OPTIONS('scan.snapshot-id'='1') */; /* +--------------------------------+-------------+------------------+-------------------+---------------+ | file_name | file_size | num_added_files | num_deleted_files | schema_id | +--------------------------------+-------------+------------------+-------------------+---------------+ | manifest-f4dcab43-ef6b-4713... | 12365| 40 | 0 | 0 | +--------------------------------+-------------+------------------+-------------------+---------------+ 1 rows in set */
通过聚合字段表可以查询表的历史聚合信息。
SELECT * FROM my_table$aggregation_fields; /* +------------+-----------------+--------------+--------------------------------+---------+ | field_name | field_type | function | function_options | comment | +------------+-----------------+--------------+--------------------------------+---------+ | product_id | BIGINT NOT NULL | [] | [] | <NULL> | | price | INT | [true,count] | [fields.price.ignore-retrac... | <NULL> | | sales | BIGINT | [sum] | [fields.sales.aggregate-fun... | <NULL> | +------------+-----------------+--------------+--------------------------------+---------+ 3 rows in set */
可查询表的分区文件信息。
SELECT * FROM my_table$partitions; /* +---------------+----------------+--------------------+--------------------+------------------------+ | partition | record_count | file_size_in_bytes| file_count| last_update_time| +---------------+----------------+--------------------+--------------------+------------------------+ | [1] | 1 | 645 | 1 | 2024-06-24 10:25:57.400| +---------------+----------------+--------------------+--------------------+------------------------+ */
全局系统表包含Paimon中所有表的统计信息。为了方便查询,我们创建了一个名为sys
的系统数据库。在Flink中可通过以下SQL显示所有全局系统表:
USE sys; SHOW TABLES;
该表类似于选项表,但会显示所有数据库中所有表的选项。
SELECT * FROM sys.all_table_options; /* +---------------+--------------------------------+--------------------------------+------------------+ | database_name | table_name | key | value | +---------------+--------------------------------+--------------------------------+------------------+ | my_db | Orders_orc | bucket | -1 | | my_db | Orders2 | bucket | -1 | | my_db | Orders2 | sink.parallelism | 7 | | my_db2| OrdersSum | bucket | 1 | +---------------+--------------------------------+--------------------------------+------------------+ 7 rows in set */
通过目录选项表可查询目录的选项信息。未显示的选项将使用默认值。
SELECT * FROM sys.catalog_options; /* +-----------+---------------------------+ | key | value | +-----------+---------------------------+ | warehouse | hdfs:///path/to/warehouse | +-----------+---------------------------+ 1 rows in set */