当前 ByteHouse 自研的查询优化器使用统计信息来提高查询性能。当前支持用户手动 create stats 来收集数据表的统计信息或修改查询设计,从而提高查询性能。后续将支持自动收集和更新能力。支持语法如下:
CREATE STATS [IF NOT EXISTS] (<db>.* | ALL | [<db>.]<table>['(' <col1>, <col2>, ..., <coln> ')']) [WITH (FULLSCAN | SAMPLE [<rows> ROWS] [<ratio> RATIO])]
SHOW (STATS|COLUMN_STATS) (<db>.* | ALL | [<db>.]<table>'('<col1>, <col2>, ..., <coln> ')' )
DROP STATS (<db>.* | ALL | [<db>.]<table>['(' <col1>, <col2>, ..., <coln> ')' ])
对于数据工程师而言,通过执行计划诊断问题是很重要的排障手段。例如 Join order 的问题,统计信息是否缺失,估算是否合理,下推是否生效等等。ByteHouse 提供展示物理执行计划能力,用 Explain + Query 的方式对执行进行分析,便于更加直观地对查询进行分析和优化,并配合 Query Profiler 查询详情进行可视化展示。
注:当前需要先手动收集统计信息后,再生成执行计划
EXPLAIN (QUERY Statement) ;
注:当前 EXPLAIN 语义仅支持 DQL SELECT 语法的分析。
CREATE DATABASE sampledb; CREATE TABLE sampledb.t1 ( id Int64, name Int64 ) ENGINE = CnchMergeTree() ORDER BY id; CREATE TABLE sampledb.t2 ( uid Int64, name Int64 ) ENGINE = CnchMergeTree() ORDER BY uid;
INSERT INTO sampledb.t1 VALUES (1,'bob'),(2,'alice'),(3,'peter'),(4,'jay'),(5,'yiyang'),(6,'aurora'); INSERT INTO sampledb.t2 VALUES (1,'smith'),(2,'parker'),(3,'taylor'),(4,'davis'),(5,'joines');
CREATE stats sampledb.t1; CREATE stats sampledb.t2;
展示如下结果
table_name | row_count | elapsed_time |
---|---|---|
t1 | 6 | 0.026777215 |
SHOW stats sampledb.t1;
USE sampledb; EXPLAIN SELECT * FROM t1 JOIN t2 ON (t1.id = t2.uid);
将展示如下信息:
Gather Exchange est. 6 rows └─ Projection est. 6 rows │ Expressions: [id, name, uid], t2.name:=name_1 └─ Inner Join est. 6 rows │ Condition: id == uid ├─ Local Exchange est. 6 rows │ └─ ReadFromStorage 2100069045.sampledb.t1 est. 6 rows │ Outputs: [id, name] └─ Broadcast Exchange est. 5 rows └─ ReadFromStorage 2100069045.sampledb.t2 est. 5 rows Outputs: [uid], name_1:=name
DROP stats sampledb.t1; DROP stats sampledb.t1;