ByteHouse集群空间使用率是日常运维中重点关注的监控项之一。集群存储空间的不足可能导致严重后果,例如数据无法写入、无法备份,以及存储空间扩容任务耗时过长等。本文介绍如何通过SQL语句查看ByteHouse集群的磁盘空间使用情况。
以下示例以 192.18.. 节点为基础环境,在实际使用过程中,请根据您的场景修改对应参数。如果您不知道如何获取节点名称,可以通过以下方式获取。
SELECT * FROM system.clusters;
通过查看表数据大小,可以识别表占用磁盘空间的大小,从而为您优化数据库性能以及合理规划存储资源提供有效分析依据。
查看192.18.*.*
节点下每个表活跃数据的情况。
SELECT `database`, table, formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小 formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小 round(usize / size, 2) AS compr_rate, --压缩率 sum(rows) AS rows, --总行数 count() AS part_count --part数量 FROM clusterAllReplicas('<cluster_name>', system.parts) WHERE (active = 1) AND (table LIKE '%') AND (`database` LIKE '%') AND host() = '192.18.*.*' GROUP BY `database`, table ORDER BY size DESC;
查看每个副本中的表数据。
SELECT hostname() AS h, `database` , table, count(*) AS data_part_cnt, --数据部分总量 sum(rows) AS total_rows, --总行数 formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --压缩数据大小 sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩数据大小 FROM clusterAllReplicas('<cluster_name>', system.parts) WHERE active = 1 GROUP BY h, `database`, table ORDER BY total_rows DESC;
查看集群中占用磁盘空间排名前十的表。
SELECT `database`, table, sum(bytes_on_disk) AS bytes_on_disk FROM clusterAllReplicas('<cluster_name>', system.parts) WHERE active AND (`database` != 'system') GROUP BY `database`, table ORDER BY bytes_on_disk DESC LIMIT 10;
查询模板
SELECT `database`, table, column, formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed, formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed, round(usize / size, 2) AS compr_rate, sum(rows) rows_cnt, round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size FROM clusterAllReplicas('<cluster_name>', system.parts_columns) WHERE (active = <active_type>) AND (table LIKE '<table_name>') AND host() = '<node_name>' GROUP BY `database`, table, column ORDER BY size DESC;
参数说明
参数 | 说明 |
---|---|
cluster_name | 集群名称 |
table_name | 目标数据表表名。 |
node_name | 目标集群节点名。 |
active_type | 数据是否活跃。
|
示例
查看192.18.*.*
节点上query_log
表中存储活跃数据的列。
SELECT `database`, table, column, formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed, formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed, round(usize / size, 2) AS compr_rate, sum(rows) rows_cnt, round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size FROM clusterAllReplicas('<cluster_name>', system.parts_columns) WHERE (active = 1) AND (table LIKE 'query_log') AND host() = '192.18.*.*' GROUP BY `database`, table, column ORDER BY size DESC;
查询模板
SELECT partition AS `分区`, sum(rows) AS `总行数`, formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`, formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`, round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率` FROM clusterAllReplicas('<cluster_name>', system.parts) WHERE (database IN ('<database_name>')) AND (table IN ('<table_name>')) AND (partition LIKE '<partition_prefix>') GROUP BY partition ORDER BY partition ASC
参数说明
参数 | 说明 |
---|---|
cluster_name | 集群名称 |
database_name | 数据库名。 |
table_name | 数据表表名。 |
partition_prefix | 分区前缀。 |
示例
查看<cluster name>
数据库中test
表分区前缀为2019-12-
的分区信息。
SELECT partition AS `分区`, sum(rows) AS `总行数`, formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`, formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`, round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率` FROM clusterAllReplicas('<cluster_name>', system.parts) WHERE (`database` IN ('<cluster name>')) AND (table IN ('test')) AND (partition LIKE '2019-12-%') GROUP BY partition ORDER BY partition ASC
在ByteHouse中,system.part
表存储了datapart
的状态、大小、创建时间等信息。您可以通过此表了解datapart
的详细信息。
活跃的datapart
是表中当前被活跃使用的数据。了解活跃数据的数据量大小,可以帮助您识别表的实际数据大小,从而识别表占用磁盘空间的大小。
查看192.18.*.*
节点上所有非系统表的活跃数据分区信息。
SELECT `database`, table, count(*) AS data_part_cnt, --活跃数据分区的数量 sum(rows) AS total_rows, --总行数 formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --磁盘上的总压缩数据大小 sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩的总数据大小 FROM clusterAllReplicas('<cluster_name>', system.parts ) WHERE active = 1 AND `database` != 'system' AND host() = '192.18.*.*' GROUP BY `database`, table ORDER BY total_rows DESC;
非活跃的datapart
可能包含过时或已标记为删除的数据。若这些数据不再需要,建议及时清理,以减少磁盘占用。
查看192.18.*.*
节点上所有非系统表的非活跃数据分区信息。
SELECT `database`, table, count(*) AS data_part_cnt, --非活跃数据分区数量 sum(rows) AS total_rows, --总行数 formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --磁盘上的压缩数据总大小 sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩数据总大小 FROM clusterAllReplicas('<cluster_name>', system.parts ) WHERE active = 0 AND `database` != 'system' AND host() = '192.18.*.*' GROUP BY `database`, table;
Projections是ByteHouse中一种用于优化查询性能的数据结构,其类似于物化视图,存储了预先计算的聚合或者数据变换结果。了解Projections的大小有助于评估其对磁盘空间的影响。
查看192.18.*.*
节点上test
表的Projections占用磁盘空间大小。
SELECT database, table, name, formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小 formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小 round(usize / size, 2) AS compr_rate, --压缩比率 sum(rows) AS rows, --总行数 count() AS part_count --part总数 FROM clusterAllReplicas('<cluster_name>', system.projection_parts ) WHERE (table = 'test') AND (active = 1) AND host() = '192.18.*.*' GROUP BY database, table, name ORDER BY size DESC;
查看192.18.*.*
节点上test
表的每个Projection列占用磁盘空间大小。
SELECT database, table, column, formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed, --压缩数据大小 formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小 round(usize / size, 2) AS compr_rate --压缩比率 FROM clusterAllReplicas('<cluster_name>', system.projection_parts_columns ) WHERE (active = 1) AND (table LIKE 'test') AND host() = '192.18.*.*' GROUP BY database, table, column ORDER BY size DESC;