本文介绍如何通过 Hive 在 E-MapReduce(EMR) 集群上创建库和表等操作。
已创建集群,详情请参见 创建集群。
su hive
hive
返回信息如下所示时,表示进入 Hive 命令行
Hive Session ID = aaa9c23d-4975-4c10-bb9a-1817c5fa36e6
Logging initialized using configuration in file:/etc/emr/hive/conf/hive-log4j2.properties Async: true
Hive Session ID = 258437d2-f601-42c9-bab3-731b975b0642
本文示例中的数据库以 testdb 为例介绍。
CREATE DATABASE IF NOT EXISTS testdb;
当返回信息为:
OK
Time taken: 0.617 seconds
表示创建成功。
DESC DATABASE testdb;
查看信息如下所示:
OK
testdb hdfs://iv-ybtfk74d08kdvbpw7bf3:8020/warehouse/tablespace/managed/hive/testdb.db hive USER
Time taken: 0.155 seconds, Fetched: 1 row(s)
USE testdb;
返回信息如下所示:
OK
Time taken: 0.034 seconds
DROP DATABASE IF EXISTS testdb;
返回信息如下所示,表示删除成功:
OK
Time taken: 0.104 seconds
本文示例中的表以 test_table 为例介绍。
CREATE TABLE IF NOT EXISTS test_table ( id BIGINT, value STRING ) STORED AS ORC;
返回信息如下所示,表示表创建成功:
OK
Time taken: 0.426 seconds
DESC FORMATTED test_table;
返回信息如下:
OK
col_name data_type commentid bigint
value stringDetailed Table InformationDatabase: testdb
OwnerType: USER
Owner: hive
CreateTime: Mon Aug 01 17:48:56 CST 2022
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://iv-ybtfk74d08kdvbpw7bf3:8020/warehouse/tablespace/managed/hive/testdb.db/test_table
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true","value":"true"}}
bucketing_version 2
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1659347336Storage InformationSerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.259 seconds, Fetched: 32 row(s)
SHOW TABLES;
返回信息如下:
OK
test_table
Time taken: 0.036 seconds, Fetched: 1 row(s)
DROP TABLE IF EXISTS test_table;
OK
Time taken: 0.227 seconds
INSERT INTO TABLE test_table VALUES(1, 'value1');
返回信息包含:
OK
Time taken: 15.676 seconds
表示插入成功
SELECT * FROM test_table LIMIT 5;
返回信息如下:
OK
1 value1
Time taken: 0.195 seconds, Fetched: 1 row(s)
SELECT value, COUNT(*) FROM test_table GROUP BY value;
返回信息如下:
OK
value1 1
Time taken: 4.207 seconds, Fetched: 1 row(s)