使用弹性容器实例(VCI)运行 Spark 数据处理任务,可以不受限于容器服务(VKE)集群的节点计算容量,能够按需灵活动态地创建 Pod,有效地降低计算成本。本文主要介绍在 VKE 集群中安装 Spark Operator,并使用 VCI 运行 Spark 数据处理任务的操作。
本文将使用 Kubernetes 原生 Spark Operator 方式,介绍使用 VCI 运行 Spark 任务的操作。主要流程如下:
如果您需要使用更多 VCI 的高级功能,可以通过设置 Annotation(注解)对 VCI 进行参数配置。详情请参见 Pod Annotation 说明。
下文主要介绍 VCI 测试并验证通过的实践内容,为了获得符合预期的结果,同时符合 VCI 的 使用限制,请按照本文方案(或在本文推荐的资源上)操作。如需替换方案,您可以联系对应的火山引擎客户经理咨询。
说明
创建集群过程中添加节点时,建议您结合自己业务实际需求和安装运行 Spark Operator 的需求,选择合适的节点规格,保证节点的 vCPU、内存等满足您的业务运行需求。
.kube
目录,修改 config 文件。cd .kube
vi config
:wq
命令保存文件。返回类似如下信息,表示 kubectl 已连接集群。kubectl get nodes
通过 kubectl 客户端安装 Spark Operator。
kubectl create serviceaccount spark kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
kubectl create namespace spark-operator
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install my-release spark-operator/spark-operator --namespace spark-operator --set enableBatchScheduler=true --set enableWebhook=true
注意
安装 Spark Operator 时如果出现拉取 spark-operator 镜像失败,可以直接在 容器服务控制台 目标集群的 无状态负载 页面,将 spark-operator 命名空间下的 my-release 负载的镜像地址更新为doc-cn-beijing.cr.volces.com/vke/spark:v1beta2-1.3.8-3.1.1
,即可正常安装 Spark Operator。
本示例以创建名为 spark-pi 的无状态负载(Deployment)YAML 文件为例。
说明
Git 代码内镜像地址拉取稳定性不高,故将镜像拉取到火山引擎镜像仓库(CR)。本示例可直接使用如下镜像地址:
cr-share-cn-shanghai.cr.volces.com/spark/spark-operator:v3.1.1
# Copyright 2017 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: "cr-share-cn-shanghai.cr.volces.com/spark/spark-operator:v3.1.1" #修改了官方地址为火山引擎镜像仓库地址
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
sparkVersion: "3.1.1"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
annotations:
vke.volcengine.com/burst-to-vci: enforce #指定 Spark 任务运行在 VCI 上。
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.1.1
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
返回类似如下信息,表示 Spark 任务运行完成。kubectl get sparkapplication
预期返回结果:kubectl describe sparkapplication spark-pi
Name: spark-pi
Namespace: default
Labels: <none>
Annotations: <none>
API Version: sparkoperator.k8s.io/v1beta2
Kind: SparkApplication
Metadata:
Creation Timestamp: 2023-11-27T15:45:20Z
Generation: 1
Resource Version: 24933
UID: ad8fa50c-1d45-4a33-97bd-2c1de4155f7b
Spec:
Driver:
Core Limit: 1200m
Cores: 1
Labels:
Version: 3.1.1
Memory: 512m
Service Account: spark
Volume Mounts:
Mount Path: /tmp
Name: test-volume
Executor:
Annotations:
vke.volcengine.com/burst-to-vci: enforce
Cores: 1
Instances: 1
Labels:
Version: 3.1.1
Memory: 512m
Volume Mounts:
Mount Path: /tmp
Name: test-volume
Image: cr-share-cn-shanghai.cr.volces.com/spark/spark-operator:v3.1.1
Image Pull Policy: Always
Main Application File: local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar
Main Class: org.apache.spark.examples.SparkPi
Mode: cluster
Restart Policy:
Type: Never
Spark Version: 3.1.1
Type: Scala
Volumes:
Host Path:
Path: /tmp
Type: Directory
Name: test-volume
Status:
Application State:
State: COMPLETED
Driver Info:
Pod Name: spark-pi-driver
Web UI Address: 10.234.70.207:0
Web UI Port: 4040
Web UI Service Name: spark-pi-ui-svc
Execution Attempts: 1
Executor State:
spark-pi-4a08948c1174fd43-exec-1: FAILED
Last Submission Attempt Time: 2023-11-27T15:45:23Z
Spark Application Id: spark-aadd883935dd422eb0526412b846c9b1
Submission Attempts: 1
Submission ID: 923c3ed6-50de-4b65-a0e2-7c506b72dd2a
Termination Time: 2023-11-27T15:48:10Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SparkApplicationAdded 18m spark-operator SparkApplication spark-pi was added, enqueuing it for submission
Normal SparkApplicationSubmitted 18m spark-operator SparkApplication spark-pi was submitted successfully
Normal SparkDriverRunning 18m spark-operator Driver spark-pi-driver is running
Normal SparkExecutorPending 18m spark-operator Executor [spark-pi-4a08948c1174fd43-exec-1] is pending
Normal SparkExecutorRunning 16m spark-operator Executor [spark-pi-4a08948c1174fd43-exec-1] is running
Normal SparkDriverCompleted 16m spark-operator Driver spark-pi-driver completed
Normal SparkApplicationCompleted 16m spark-operator SparkApplication spark-pi completed
Warning SparkExecutorFailed 15m spark-operator Executor [spark-pi-4a08948c1174fd43-exec-1 %!s(int=-1) Unknown (Container not Found)] failed with ExitCode: %!d(MISSING), Reason: %!s(MISSING)
可通过火山引擎日志服务采集弹性容器 VCI 日志。更多信息,请参见 采集 VCI 容器日志。
Spark 镜像如果较大(2GB 以上),则拉取需要较长时间,您可以通过 ImageCache 加速镜像拉取。更多信息,请参见 镜像缓存。