本章节为您提供从零开始使用安全集群并执行 MapReduce 用例、Spark 用例、Hive 等用例的操作指导。
火山引擎 E-MapReduce(EMR)通过集成 Apache Kerby 服务为集群提供 Kerberos 能力,如果在创建集群时启用了 Kerberos 认证,则集群在创建时会集成安装 Kerby 组件,并提供基础的 Kerberos 使用环境。
其余集群操作详见创建集群。
现阶段仅支持创建集群时开启 Kerberos,不支持普通集群转 Kerberos 集群。
用户管理模块已对接 Kerby 组件,即在用户管理中添加的用户,会同时在 Kerby 中注册相应的 principal 信息,支持下载该用户的 keytab 文件。
请参考用户管理章节创建认证用户,和下载相应的 keytab 文件。
参考用户管理章节创建 user01 用户,并下载其 keytab 文件,将其上传到 ECS 的 /etc/krb5/keytab/user 目录下**。**
生成用户的票据
kinit -k -t /etc/krb5/keytab/user/user01.keytab user01
执行 MapReduce 的 WordCount 用例
hadoop fs -mkdir /tmp/input hadoop fs -put /usr/lib/emr/current/hadoop//README.txt /tmp/input/data yarn jar /usr/lib/emr/current/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4-ve-1.jar wordcount /tmp/input/data /tmp/output
说明
示例中hadoop-mapreduce-examples-3.3.4-ve-1.jar
,不同 EMR 的版本,该路径中 hadoop 的版本号也不一样,需要根据具体环境配置。
生成用户的票据
kinit -k -t /etc/krb5/keytab/user/user01.keytab user01
也可以在命令行中添加用户认证信息:示例:
spark-submit --conf spark.kerberos.principal=user01 --conf spark.kerberos.keytab=/etc/krb5/keytab/user/user01.keytab ....
执行 Spark 的 SparkPi 用例
spark-submit --master yarn --class org.apache.spark.examples.SparkPi --conf spark.kerberos.principal=user01 --conf spark.kerberos.keytab=/etc/krb5/keytab/user/user01.keytab /usr/lib/emr/current/spark/examples/jars/spark-examples_2.12-3.2.1.jar
说明
不同 EMR 版本中示例中 spark-examples_2.12-3.2.1.jar
的 Spark 版本号有所不同,需根据实际环境填写。
执行 SparkSql 用例
输入 spark-sql,进入 SparkSql 控制台:
spark-sql --conf spark.kerberos.principal=user01 -conf spark.kerberos.keytab=/etc/krb5/keytab/user/user01.keytab
然后输入下面的 SQL 语句:
CREATE TABLE IF NOT EXISTS t1(i INT, s STRING); INSERT INTO t1 VALUES(0, 'a'); SELECT * FROM t1;
生成用户的票据
kinit -k -t /etc/krb5/keytab/user/user01.keytab user01
执行 Hive cli
输入 Hive 命,进入 Hive 控制台,然后输入下面的 HQL 语句:
CREATE TABLE IF NOT EXISTS t2(i INT, s STRING); INSERT INTO t1 VALUES(0, 'a'); SELECT * FROM t2;
连接 HiveServer2,执行 HQL 语句
输入下面命令:
beeline -u "jdbc:hive2://${zookeeper的地址}/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=${HiveServer2的principal}"
示例:
beeline -u "jdbc:hive2://10.0.255.9:2181,10.0.255.7:2181,10.0.255.8:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@2FDE5608A274D0A2320C.EMR.COM"
说明
示例中2FDE5608A274D0A2320C.EMR.COM
是 KDC 的域名,需根据具体环境填写。
然后输入下面的 HQL 语句:
CREATE TABLE IF NOT EXISTS t3(i INT, s STRING); INSERT INTO t3 VALUES(0, 'a'); SELECT * FROM t3;
生成用户的票据
kinit -k -t /etc/krb5/keytab/user/user01.keytab user01
执行 Flink jar 命令
执行 Flink 自带的 WordCount 用例
hadoop fs -put /usr/lib/emr/current/hadoop//README.txt /tmp/flink_data.txt flink run -m yarn-cluster /usr/lib/emr/current/flink/examples/streaming/WordCount.jar --input hdfs://emr-cluster/tmp/flink_data.txt --output hdfs://emr-cluster/tmp/flinktest_out
说明
HDFS 的路径需根据实际环境修改。
查看 WordCount 执行结果
hadoop fs -cat /tmp/flinktest_out/*/*
生成用户的票据
kinit -k -t /etc/krb5/keytab/user/user01.keytab user01
执行 hive cli
输入 hbase shell 命,进入控制台,然后输入下面的语句:
create 't1', 'f1', SPLITS => ['10', '20', '30', '40'] put 't1', 'r1', 'f1:c1', 'value' scan 't1'
说明
由于 Presto 和 Trino 的使用方式相同,下面以 Presto 为例介绍。
执行 Presto 命令:
presto \ --krb5-principal=${用户的principal} \ --krb5-keytab-path=${principal对应的keytab文件} \
示例:
presto --krb5-principal=user01 --krb5-keytab-path=/etc/krb5/keytab/user/user01.keytab
输入 Presto 的 sql 语句select * from tpch.sf1.nation
:
presto> select * from tpch.sf1.nation; nationkey | name | regionkey | comment -----------+----------------+-----------+-------------------------------------------------------------------------------------------------------------------- 0 | ALGERIA | 0 | haggle. carefully final deposits detect slyly agai 1 | ARGENTINA | 1 | al foxes promise slyly according to the regular accounts. bold requests alon 2 | BRAZIL | 1 | y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special 3 | CANADA | 1 | eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold 4 | EGYPT | 4 | y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d 5 | ETHIOPIA | 0 | ven packages wake quickly. regu 6 | FRANCE | 3 | refully final requests. regular, ironi 7 | GERMANY | 3 | l platelets. regular accounts x-ray: unusual, regular acco 8 | INDIA | 2 | ss excuses cajole slyly across the packages. deposits print aroun 9 | INDONESIA | 2 | slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull 10 | IRAN | 4 | efully alongside of the slyly final dependencies. 11 | IRAQ | 4 | nic deposits boost atop the quickly final requests? quickly regula 12 | JAPAN | 2 | ously. final, express gifts cajole a 13 | JORDAN | 4 | ic deposits are blithely about the carefully regular pa 14 | KENYA | 0 | pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t 15 | MOROCCO | 0 | rns. blithely bold courts among the closely regular packages use furiously bold platelets? 16 | MOZAMBIQUE | 0 | s. ironic, unusual asymptotes wake blithely r 17 | PERU | 1 | platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun 18 | CHINA | 2 | c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos 19 | ROMANIA | 3 | ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account 20 | SAUDI ARABIA | 4 | ts. silent requests haggle. closely express packages sleep across the blithely 21 | VIETNAM | 2 | hely enticingly express accounts. even, final 22 | RUSSIA | 3 | requests against the platelets use never according to the quickly regular pint 23 | UNITED KINGDOM | 3 | eans boost carefully special requests. accounts are. carefull 24 | UNITED STATES | 1 | y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be (25 rows) Query 20230720_031745_00001_v2hmn, FINISHED, 2 nodes Splits: 32 total, 32 done (100.00%) [Latency: client-side: 225ms, server-side: 192ms] [25 rows, 0B] [130 rows/s, 0B/s]
准备客户端配置文件:
编辑 kafka_client_jaas.conf
文件(例如放在 /etc/emr/kafka/conf
目录。若目录中有重名文件,可更换目录、文件名或修改已存在的重名文件的文件名)
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTicket=true; };
编辑 client.properties
文件(例如放在 /etc/emr/kafka/conf
目录。若目录中有重名文件,可更换目录、文件名或修改已存在的重名文件的文件名)
security.protocol=SASL_PLAINTEXT sasl.mechanism=GSSAPI sasl.kerberos.service.name=kafka
生成用户的票据
kinit -k -t /etc/krb5/keytab/user/user01.keytab user01
创建 topic
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/emr/kafka/conf/kafka_client_jaas.conf" && /usr/lib/emr/current/kafka/bin/kafka-topics.sh --create --bootstrap-server {broker地址}:9092 --command-config /etc/emr/kafka/conf/client.properties --topic test --replication-factor 3 --partitions 12
说明
示例中{broker地址},是指Kafka的Broker的IP地址,需根据具体环境配置。
查看 topic 列表
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/emr/kafka/conf/kafka_client_jaas.conf" && /usr/lib/emr/current/kafka/bin/kafka-topics.sh --list --bootstrap-server {broker地址}:9092 --command-config /etc/emr/kafka/conf/client.properties
生产消息
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/emr/kafka/conf/kafka_client_jaas.conf" && /usr/lib/emr/current/kafka/bin/kafka-console-producer.sh --broker-list {broker地址}:9092 --topic test --producer.config /etc/emr/kafka/conf/client.properties
消费消息
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/emr/kafka/conf/kafka_client_jaas.conf" && /usr/lib/emr/current/kafka/bin/kafka-console-consumer.sh --bootstrap-server {broker地址}:9092 --group testGroup --topic test --from-beginning earliest --consumer.config /etc/emr/kafka/conf/client.properties
生成用户的票据
kinit -k -t /etc/krb5/keytab/user/user01.keytab user01
连接 Kyuubi
输入下面命令:
beeline -u "jdbc:hive2://${zookeeper的地址}/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=${Kyuubi服务参数中的kyuubi.kinit.principal参数值}"
示例:
beeline -u "jdbc:hive2://10.0.255.9:2181,10.0.255.7:2181,10.0.255.8:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;principal=hive/_HOST@2FDE5608A274D0A2320C.EMR.COM"
然后输入下面的 SQL 语句:
CREATE TABLE IF NOT EXISTS t3(i INT, s STRING); INSERT INTO t3 VALUES(0, 'a'); SELECT * FROM t3;
在 /etc/emr/flume/conf 目录下添加配置文件 flume_jaas.conf,内容如下:
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/etc/krb5/keytab/serv/kafka.keytab" storeKey=true serviceName="kafka" principal="${principal}"; };
说明
示例中 ${principal},需要和 kafka.keytab 文件匹配,通过命令 klist -k -t /etc/krb5/keytab/serv/kafka.keytab
获取。
在 /etc/emr/flume/conf 目录下配置 Flume 读取 Kafka 的参数文件:flume-kerberos-kafka.properties,内容如下:
a1.sources = source1 a1.sinks = k1 a1.channels = c1 a1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource a1.sources.source1.channels = c1 a1.sources.source1.kafka.consumer.group.id = flume-test-group a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs://emr-cluster/tmp/flume/test-data a1.sinks.k1.hdfs.fileType=DataStream # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1001 a1.channels.c1.transactionCapacity = 1000 # Bind the source and sink to the channel a1.sources.source1.channels = c1 a1.sinks.k1.channel = c1 a1.sources.source1.kafka.bootstrap.servers = ${broker地址}:9092 a1.sources.source1.kafka.topics = ${topic名字} a1.sources.source1.kafka.consumer.security.protocol=SASL_PLAINTEXT a1.sources.source1.kafka.consumer.sasl.mechanism=GSSAPI a1.sources.source1.kafka.consumer.sasl.kerberos.service.name=kafka
说明
在配置文件 flume-env.sh 中追加参数:
export JAVA_OPTS="$JAVA_OPTS -Djava.security.krb5.conf=/etc/krb5.conf" export JAVA_OPTS="$JAVA_OPTS -Djava.security.auth.login.config=/etc/emr/flume/conf/flume_jaas.conf
启动 flume
nohup /usr/lib/emr/current/flume/bin/flume-ng agent --name a1 --conf /etc/emr/flume/conf/ --conf-file flume-kerberos-kafka.properties > nohup.out 2>&1 &