本文将介绍使用 GIS 时空分析功能所涉及的基本概念、SQL 语句,以及具体使用方法。
使用之前请确认已经开启服务。
Geometry('Point',
<SRID>
)
,或Geometry('Point')
(缺省SRID
为 0)。Geometry('LineString',
<SRID>
)
,或Geometry('LineString')
(缺省SRID
为 0)。Geometry('Polygon',
<SRID>
)
,或Geometry('Polygon')
(缺省SRID为
0)。ByteHouse 提供了常用的 GIS 函数,可参考下文”SQL参考-函数“章节。
如果想创建一个 GIS 相关的表,需要在建表的时候声明对应的 geometry 列。
常见的声明数据类型包括:
Geometry('Point', 4326)
;Geometry('LineString', 4326)
,其中 4326 为 SRID 值,不填写的话 Geometry('Point')默认值为 0。如果想利用空间索引,需要在 order by 里面指定 geometry 列为主键索引。
CREATE TABLE IF NOT EXISTS gis_table ON CLUSTER XXX ( `id` UInt64, `point` Geometry('Point', 4326) ) ENGINE = HaMergeTree ORDER BY point SETTINGS index_granularity = 8192;
您也可以在可视化建表中使用本功能。
Geometry
。Geometry
类型,以加快查询速度。声明点类型的方式为:Geometry('Point')
或者Geometry('Point', SRID)
,其中 SRID
是一个 UInt16
的整型代表了该几何列所对应的坐标系统。
Geometry('Point') 属于 DataTypeGeometry
类型,DataTypeGeometry
内部存储了一个DataTypeTuple
类型用来表示Point
数据,一个Point
底层存储为一个Tuple
,即(Float64, Float64)。
当插入数据的时候,我们可以通过 insert into select 的方式把其他 table 中的两列数据通过(x, y)的形式插入 Point 几何列中,也可以通过字符串 Tuple 的形式插入到 point 中。
CREATE TABLE IF NOT EXISTS test_gis_tmp ( `id` UInt64, `lon` Float64, `lat` Float64 ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_tmp values (1, 1, 1), (2, 2, 2), (3, 3.1, 3.1); CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point select id, (lon, lat) from test_gis_tmp; insert into test_gis_point values (4, '(4, 4)'), (5, '(5.5, 5.5)'); select * from test_gis_point order by id;
预期结果:
1 (1,1) 2 (2,2) 3 (3.1,3.1) 4 (4,4) 5 (5.5,5.5)
LineString
类型与 Point
类型用法类似,区别在于LineString
中存储的是坐标数组。
-- LineString CREATE TABLE IF NOT EXISTS test_gis_linestring_type ( `p` Geometry('LineString', 4326) ) ENGINE = MergeTree ORDER BY p SETTINGS index_granularity = 8192; insert into test_gis_linestring_type values ([(0,0), (0, 0)]), ('[(2,2), (2,2)]'), (ST_GeomFromText('LINESTRING (3 3, 3 3)')); select ST_AsText(p) from test_gis_linestring_type order by p; DROP TABLE IF EXISTS test_gis_linestring_type;
与前面类似,Polygon
中存储的是多个坐标数组。
CREATE TABLE IF NOT EXISTS test_gis_polygon_type ( `p` Geometry('Polygon', 4326) ) ENGINE = MergeTree ORDER BY p SETTINGS index_granularity = 8192; insert into test_gis_polygon_type values ([[(0,0), (1,0), (1,1), (0,1), (0,0)]]), ('[[(0,0), (2,0), (2,2), (0,2), (0,0)]]'), (ST_GeomFromText('POLYGON ((0 0, 3 0, 3 3, 0 3, 0 0))'));
ST_GeomFromText(WKT_Geometry)或者 ST_GeomFromText(WKT_Geometry, SRID)函数用来将 OGC 标准定义的几何文本表示字符串转化为上述的几何数据。
ST_AsText(GeometryType)与 ST_GeomFromText 相反,用来将 ck 中的几何数据转化为 WKT 表示的字符串形式。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (0 0)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (3.1 3.1)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (4 4)', 4326)); insert into test_gis_point values(6, ST_GeomFromText('POINT (5.1 5.1)', 4326)); select ST_AsText(p) from test_gis_point order by id;
预期结果:
POINT (0 0) POINT (1 1) POINT (2 2) POINT (3.1 3.1) POINT (4 4) POINT (5.1 5.1)
ST_Distance(p1, p2) 函数用来计算笛卡尔空间中的两个几何之间的距离,返回为 Float64 类型。
其中,p1 和 p2 可以是常量或非常量类型。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (0 0)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (3 3)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (4 4)')); select ST_Distance(p, ST_GeomFromText('POINT (1 1)')) from test_gis_point order by id; select '--'; select ST_Distance(ST_GeomFromText('POINT (1 1)'), p) from test_gis_point order by id;
预期结果:
1.4142135623730951 0 1.4142135623730951 2.8284271247461903 4.242640687119285 -- 1.4142135623730951 0 1.4142135623730951 2.8284271247461903 4.242640687119285
ST_Within(p1, p2)代表了几何 p1 是否被 p2 所包含,返回值为布尔类型,代表了是否被包含。目前函数仅支持 p1 为 Point 类型,p2 为常量的 Polygon 类型。
如下图所示,分别代表了不同几何类型之间的被包含关系。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 3)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 4)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 5)')); optimize table test_gis_point final; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))')); select '--'; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 3 0, 3 3, 0 3, 0 0))')); select '--'; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 5 0, 5 5, 0 5, 0 0))'));
预期结果:
1 -- 1 2 3 -- 1 2 3 4 5
ST_MultiAddressFilter(point, WKT_Geometry, distance, longitude1, latitude1, longitude2, latitude2, ...)
ST_MultiAddressFilter 函数用来给定一个中心点和多个半径,过滤出位于该多个圆中的几何数据,其中 WKT_Geometry 代表了 WKT 表示的一个矩形范围,用来利用 r tree 索引做主键过滤。distance 代表了圆的半径,(longitude, latitude)代表了不同的圆心。
ST_MultiAddressFilter 函数返回为布尔类型。
CREATE TABLE test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 1)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 1)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 1)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 1)')); insert into test_gis_point values(6, ST_GeomFromText('POINT (6 1)')); insert into test_gis_point values(7, ST_GeomFromText('POINT (7 1)')); insert into test_gis_point values(8, ST_GeomFromText('POINT (8 1)')); insert into test_gis_point values(9, ST_GeomFromText('POINT (9 1)')); insert into test_gis_point values(10, ST_GeomFromText('POINT (10 1)')); optimize table test_gis_point final; SELECT p from test_gis_point WHERE ST_MultiAddressFilter(p, ST_GeomFromText('POLYGON ((0 0, 20 0, 20 20, 0 20, 0 0))'), 1., 5., 1.); select '--'; SELECT p from test_gis_point WHERE ST_MultiAddressFilter(p, ST_GeomFromText('POLYGON ((0 0, 20 0, 20 20, 0 20, 0 0))'), 120000., 5., 1.);
预期:
(5,1) -- (4,1) (5,1) (6,1)
ST_X/ST_Y 函数主要针对 Point 类型,即返回 Point 的 x 坐标值和 y 坐标值。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 2)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 3)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 4)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 5)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 6)')); optimize table test_gis_point final; select ST_X(p), ST_Y(p) from test_gis_point;
预期:
1 2 2 3 3 4 4 5 5 6
GIS 空间查询时,建议将 GIS 字段设置为 INDEX 索引。当我们按照几何列进行排序(即 order by point),ByteHouse 会为该表构建 R-tree 索引,用来加速空间查询。
R-tree 索引是 mark 级别的,即每一个 mark 中的所有几何数据的最小 bouding box 会构成 R-tree 叶子结点中的一个条目。
当我们为空间几何函数配置了 R-tree 索引时,查询会先通过传入的几何参数通过 R 树索引进行主键过滤,然后过滤出的数据才会交给空间函数真正执行。
例如下面例子中,table 按照几何列p
进行排序,因此该列上会构建 R-tree 索引,同时我们也为 ST_Within 配置了 R-tree 索引。因此,当我们进行 select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))'))
查询时,在主键索引过滤过程中,ST_Within 会提取出 p 列和 ST_GeomFromText('POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))')
常量,首先通过 R-tree 索引过滤 p 列中的数据是否与'POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))'
相交,过滤后的数据才会交给 ST_Within
做真正的计算。
CREATE TABLE IF NOT EXISTS test_gis_point ( `id` UInt64, `p` Geometry('Point', 4326) ) ENGINE = MergeTree ORDER BY p SETTINGS index_granularity = 1; insert into test_gis_point values(1, ST_GeomFromText('POINT (1 1)')); insert into test_gis_point values(2, ST_GeomFromText('POINT (2 2)')); insert into test_gis_point values(3, ST_GeomFromText('POINT (3 3)')); insert into test_gis_point values(4, ST_GeomFromText('POINT (4 4)')); insert into test_gis_point values(5, ST_GeomFromText('POINT (5 5)')); optimize table test_gis_point final; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 2 0, 2 2, 0 2, 0 0))')); select '--'; select id from test_gis_point where ST_Within(p, ST_GeomFromText('POLYGON ((0 0, 5 0, 5 5, 0 5, 0 0))'));
预期结果:
1 2 -- 1 2 3 4 5