Phoenix是构建在HBase上的一个SQL层,能让我们用标准的JDBC APIs而不是HBase客户端APIs来创建表,插入数据和对HBase数据进行查询。
Phoenix完全使用Java编写,作为HBase内嵌的JDBC驱动。Phoenix查询引擎会将SQL查询转换为一个或多个HBase扫描,并编排执行以生成标准的JDBC结果集。直接使用HBase API、协同处理器与自定义过滤器,对于简单查询来说,其性能量级是毫秒,对于百万级别的行数来说,其性能量级是秒。
创建user_phoenix.sql文件,包含建表信息
CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));
创建user_phoenix.csv数据文件,包含要插入的数据
NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332
通过命令建表并写入数据
/usr/lib/emr/current/phoenix/bin/psql.py <your_zookeeper_quorum> us_population.sql us_population.csv
/usr/lib/emr/current/phoenix/bin/sqlline.py <your_zookeeper_quorum>
SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;