使用弹性容器实例(VCI)运行 Spark 数据处理任务,可以不受限于容器服务(VKE)集群的节点计算容量,能够按需灵活动态地创建 Pod,有效地降低计算成本。本文主要介绍在 VKE 集群中安装 Spark Operator,并使用 VCI 运行 Spark 数据处理任务的操作。
本文将使用 Kubernetes 原生 Spark Operator 方式,介绍使用 VCI 运行 Spark 任务的操作。主要流程如下:
如果您需要使用更多 VCI 的高级功能,可以通过设置 Annotation(注解)对 VCI 进行参数配置。详情请参见 Pod Annotation 说明。
说明
创建集群过程中添加节点时,建议您结合自己业务实际需求和安装运行 Spark Operator 的需求,选择合适的节点规格,保证节点的 vCPU、内存等满足您的业务运行需求。
.kube
目录,修改 config 文件。cd .kube vi config
:wq
命令保存文件。返回类似如下信息,表示 kubectl 已连接集群。kubectl get nodes
通过 kubectl 客户端安装 Spark Operator。
kubectl create serviceaccount spark kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
kubectl create namespace spark-operator
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator helm install my-release spark-operator/spark-operator --namespace spark-operator --set enableBatchScheduler=true --set enableWebhook=true
注意
安装 Spark Operator 时如果出现拉取 spark-operator 镜像失败,可以直接在 容器服务控制台 目标集群的 无状态负载 页面,将 spark-operator 命名空间下的 my-release 负载的镜像地址更新为doc-cn-beijing.cr.volces.com/vke/spark:v1beta2-1.3.8-3.1.1
,即可正常安装 Spark Operator。
本示例以创建名为 spark-pi 的无状态负载(Deployment)YAML 文件为例。
说明
Git 代码内镜像地址拉取稳定性不高,故将镜像拉取到火山引擎镜像仓库(CR)。本示例可直接使用如下镜像地址:
cr-share-cn-shanghai.cr.volces.com/spark/spark-operator:v3.1.1
# Copyright 2017 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-pi namespace: default spec: type: Scala mode: cluster image: "cr-share-cn-shanghai.cr.volces.com/spark/spark-operator:v3.1.1" #修改了官方地址为火山引擎镜像仓库地址 imagePullPolicy: Always mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar" sparkVersion: "3.1.1" restartPolicy: type: Never volumes: - name: "test-volume" hostPath: path: "/tmp" type: Directory driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.1.1 serviceAccount: spark volumeMounts: - name: "test-volume" mountPath: "/tmp" executor: annotations: vke.volcengine.com/burst-to-vci: enforce #指定 Spark 任务运行在 VCI 上。 cores: 1 instances: 1 memory: "512m" labels: version: 3.1.1 volumeMounts: - name: "test-volume" mountPath: "/tmp"
返回类似如下信息,表示 Spark 任务运行完成。kubectl get sparkapplication
预期返回结果:kubectl describe sparkapplication spark-pi
Name: spark-pi Namespace: default Labels: <none> Annotations: <none> API Version: sparkoperator.k8s.io/v1beta2 Kind: SparkApplication Metadata: Creation Timestamp: 2023-11-27T15:45:20Z Generation: 1 Resource Version: 24933 UID: ad8fa50c-1d45-4a33-97bd-2c1de4155f7b Spec: Driver: Core Limit: 1200m Cores: 1 Labels: Version: 3.1.1 Memory: 512m Service Account: spark Volume Mounts: Mount Path: /tmp Name: test-volume Executor: Annotations: vke.volcengine.com/burst-to-vci: enforce Cores: 1 Instances: 1 Labels: Version: 3.1.1 Memory: 512m Volume Mounts: Mount Path: /tmp Name: test-volume Image: cr-share-cn-shanghai.cr.volces.com/spark/spark-operator:v3.1.1 Image Pull Policy: Always Main Application File: local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar Main Class: org.apache.spark.examples.SparkPi Mode: cluster Restart Policy: Type: Never Spark Version: 3.1.1 Type: Scala Volumes: Host Path: Path: /tmp Type: Directory Name: test-volume Status: Application State: State: COMPLETED Driver Info: Pod Name: spark-pi-driver Web UI Address: 10.234.70.207:0 Web UI Port: 4040 Web UI Service Name: spark-pi-ui-svc Execution Attempts: 1 Executor State: spark-pi-4a08948c1174fd43-exec-1: FAILED Last Submission Attempt Time: 2023-11-27T15:45:23Z Spark Application Id: spark-aadd883935dd422eb0526412b846c9b1 Submission Attempts: 1 Submission ID: 923c3ed6-50de-4b65-a0e2-7c506b72dd2a Termination Time: 2023-11-27T15:48:10Z Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SparkApplicationAdded 18m spark-operator SparkApplication spark-pi was added, enqueuing it for submission Normal SparkApplicationSubmitted 18m spark-operator SparkApplication spark-pi was submitted successfully Normal SparkDriverRunning 18m spark-operator Driver spark-pi-driver is running Normal SparkExecutorPending 18m spark-operator Executor [spark-pi-4a08948c1174fd43-exec-1] is pending Normal SparkExecutorRunning 16m spark-operator Executor [spark-pi-4a08948c1174fd43-exec-1] is running Normal SparkDriverCompleted 16m spark-operator Driver spark-pi-driver completed Normal SparkApplicationCompleted 16m spark-operator SparkApplication spark-pi completed Warning SparkExecutorFailed 15m spark-operator Executor [spark-pi-4a08948c1174fd43-exec-1 %!s(int=-1) Unknown (Container not Found)] failed with ExitCode: %!d(MISSING), Reason: %!s(MISSING)
可通过火山引擎日志服务采集弹性容器 VCI 日志。更多信息,请参见 采集 VCI 容器日志。
Spark 镜像如果较大(2GB 以上),则拉取需要较长时间,您可以通过 ImageCache 加速镜像拉取。更多信息,请参见 镜像缓存。