You need to enable JavaScript to run this app.
导航
性能诊断之磁盘空间分析
最近更新时间:2025.03.05 11:03:41首次发布时间:2025.03.05 11:03:41
我的收藏
有用
有用
无用
无用

ByteHouse集群空间使用率是日常运维中重点关注的监控项之一。集群存储空间的不足可能导致严重后果,例如数据无法写入、无法备份,以及存储空间扩容任务耗时过长等。本文介绍如何通过SQL语句查看ByteHouse集群的磁盘空间使用情况。

示例环境

以下示例以 192.18.. 节点为基础环境,在实际使用过程中,请根据您的场景修改对应参数。如果您不知道如何获取节点名称,可以通过以下方式获取。

  • 通过SQL语句:您可以执行以下语句,获取集群所有节点的名称。
SELECT * FROM system.clusters;

查看表占用磁盘空间大小

通过查看表数据大小,可以识别表占用磁盘空间的大小,从而为您优化数据库性能以及合理规划存储资源提供有效分析依据。

  • 查看表数据详情

查看192.18.*.*节点下每个表活跃数据的情况。

SELECT
    `database`,
    table,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
    round(usize / size, 2) AS compr_rate, --压缩率
    sum(rows) AS rows, --总行数
    count() AS part_count --part数量
FROM clusterAllReplicas('<cluster_name>', system.parts)
WHERE (active = 1) AND (table LIKE '%') AND (`database` LIKE '%') 
AND host() = '192.18.*.*'
GROUP BY
    `database`,
    table
ORDER BY size DESC;
  • 查看副本表数据详情

查看每个副本中的表数据。

SELECT
    hostname() AS h,
   `database` ,
    table,
    count(*) AS data_part_cnt, --数据部分总量
    sum(rows) AS total_rows, --总行数
    formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --压缩数据大小
    sum(data_uncompressed_bytes) AS total_uncompressed_bytes  --未压缩数据大小
FROM clusterAllReplicas('<cluster_name>', system.parts) 
WHERE active = 1
GROUP BY h, `database`, table
ORDER BY total_rows DESC;
  • 查看表占用磁盘空间排行

查看集群中占用磁盘空间排名前十的表。

SELECT
    `database`,
    table,
    sum(bytes_on_disk) AS bytes_on_disk
FROM clusterAllReplicas('<cluster_name>', system.parts)
WHERE active AND (`database` != 'system')
GROUP BY
    `database`,
    table
ORDER BY bytes_on_disk DESC
LIMIT 10;

查看表各列的存储信息

查询模板

SELECT
    `database`,
    table,
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) rows_cnt,
    round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size
FROM clusterAllReplicas('<cluster_name>', system.parts_columns) 
WHERE (active = <active_type>) AND (table LIKE '<table_name>')  
   AND host() = '<node_name>'
GROUP BY
    `database`,
    table,
    column
ORDER BY size DESC;

参数说明

参数

说明

cluster_name

集群名称

table_name

目标数据表表名。
值为%时,表示匹配所有表。

node_name

目标集群节点名。

active_type

数据是否活跃。

  • 0:不活跃。
  • 1:活跃。

示例
查看192.18.*.*节点上query_log表中存储活跃数据的列。

SELECT
    `database`,
    table,
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) rows_cnt,
    round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size
FROM clusterAllReplicas('<cluster_name>', system.parts_columns) 
WHERE (active = 1) 
AND (table LIKE 'query_log')  
AND host() = '192.18.*.*'
GROUP BY
    `database`,
    table,
    column
ORDER BY size DESC;

查看表分区信息

查询模板

SELECT 
    partition AS `分区`,
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM clusterAllReplicas('<cluster_name>', system.parts)
WHERE (database IN ('<database_name>')) 
AND (table IN ('<table_name>')) 
AND (partition LIKE '<partition_prefix>')
GROUP BY partition
ORDER BY partition ASC

参数说明

参数

说明

cluster_name

集群名称

database_name

数据库名。

table_name

数据表表名。

partition_prefix

分区前缀。

示例
查看<cluster name>数据库中test表分区前缀为2019-12-的分区信息。

SELECT 
    partition AS `分区`,
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM clusterAllReplicas('<cluster_name>', system.parts)
WHERE (`database` IN ('<cluster name>')) 
AND (table IN ('test')) 
AND (partition LIKE '2019-12-%')
GROUP BY partition
ORDER BY partition ASC

查看datapart数据的大小

在ByteHouse中,system.part表存储了datapart的状态、大小、创建时间等信息。您可以通过此表了解datapart的详细信息。

查看活跃datapart数据的大小

活跃的datapart是表中当前被活跃使用的数据。了解活跃数据的数据量大小,可以帮助您识别表的实际数据大小,从而识别表占用磁盘空间的大小。
查看192.18.*.*节点上所有非系统表的活跃数据分区信息。

SELECT 
    `database`, 
    table, 
    count(*) AS data_part_cnt,  --活跃数据分区的数量
    sum(rows) AS total_rows,  --总行数
    formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --磁盘上的总压缩数据大小
    sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩的总数据大小
FROM clusterAllReplicas('<cluster_name>', system.parts )
WHERE active = 1 AND `database` != 'system' 
AND host() = '192.18.*.*'
GROUP BY `database`, table 
ORDER BY total_rows DESC;

查看非活跃datapart数据大小

非活跃的datapart可能包含过时或已标记为删除的数据。若这些数据不再需要,建议及时清理,以减少磁盘占用。
查看192.18.*.*节点上所有非系统表的非活跃数据分区信息。

SELECT 
    `database`, 
    table, 
    count(*) AS data_part_cnt, --非活跃数据分区数量
    sum(rows) AS total_rows,  --总行数
    formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes,  --磁盘上的压缩数据总大小
    sum(data_uncompressed_bytes) AS total_uncompressed_bytes  --未压缩数据总大小
FROM clusterAllReplicas('<cluster_name>', system.parts )
WHERE active = 0 
AND `database` != 'system' 
AND host() = '192.18.*.*'
GROUP BY `database`, table;

Projection占用磁盘空间大小

Projections是ByteHouse中一种用于优化查询性能的数据结构,其类似于物化视图,存储了预先计算的聚合或者数据变换结果。了解Projections的大小有助于评估其对磁盘空间的影响。
查看192.18.*.*节点上test表的Projections占用磁盘空间大小。

SELECT
    database,
    table,
    name,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
    round(usize / size, 2) AS compr_rate, --压缩比率
    sum(rows) AS rows,  --总行数
    count() AS part_count  --part总数
FROM clusterAllReplicas('<cluster_name>', system.projection_parts )
WHERE (table = 'test') 
AND (active = 1) AND host() = '192.18.*.*'
GROUP BY
    database,
    table,
    name
ORDER BY size DESC;

查看192.18.*.*节点上test表的每个Projection列占用磁盘空间大小。

SELECT
    database,
    table,
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed, --压缩数据大小
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
    round(usize / size, 2) AS compr_rate --压缩比率
FROM clusterAllReplicas('<cluster_name>', system.projection_parts_columns )
WHERE (active = 1) AND (table LIKE 'test') 
AND host() = '192.18.*.*'
GROUP BY
    database,
    table,
    column
ORDER BY size DESC;