You need to enable JavaScript to run this app.
导航
使用弹性容器实例(VCI)调度作业
最近更新时间:2025.01.27 15:03:43首次发布时间:2024.07.29 19:02:40

如果您有使用弹性容器实例(VCI)运行作业的需求,您可以通过在控制台开启对应开关来开启相应功能:
Image

使用弹性容器实例(VCI)调度 Spark 作业

当您开启 Spark 服务开关后,如果您使用 SparkApplication 方式提交作业,对应作业的 Driver 及 Executor 将使用弹性容器(VCI)来运行。EMR 将采用 VCI 提供的默认规格族来创建资源。如果您有其他资源规格的诉求(比如 GPU 机型),可参考下述配置以便进行自定义。

自定义 VCI 规格族运行 Spark 作业

前提条件

已拥有 VKE 集群,并已创建 EMR On VKE Spark 集群,开启 VCI 调度开关。

提交作业指定 VCI 实例规格族

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-wordcount
spec:
  type: Scala
  sparkVersion: 3.2.1
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "xxx/spark-examples_2.12-3.3.3.jar"
  arguments:
    - "1000"
  driver:
    annotations:
      vci.vke.volcengine.com/preferred-instance-family: vci.n3i
    nodeSelector: {}
    cores: 1
    coreLimit: 1000m
    memory: 2g
  executor:
    annotations:
      vci.vke.volcengine.com/preferred-instance-family: vci.n3i
    nodeSelector: {}
    cores: 1
    coreLimit: 1000m
    memory: 2g
    memoryOverhead: 1g
    instances: 1

如果您需要了解更多关于实例规格族的相关信息,请参考:Pod Annotation 说明--容器服务-火山引擎

使用弹性容器实例(VCI)调度 Ray 作业

当您开启 Ray开关后,如果您使用 RayCluster 或者 RayJob 方式提交作业,对应作业的 Head 及 Worker 将使用弹性容器(VCI)来运行。EMR 将采用 VCI 提供的默认规格族来创建资源。如果您有其他资源规格的诉求(比如 GPU 机型),可参考下述配置以便进行自定义。

自定义 VCI 规格族运行 Ray 作业

前提条件

已拥有 VKE 集群,并已创建 EMR On VKE Ray 集群,开启 VCI 调度开关。

提交作业指定 VCI 实例规格族

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$1 
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kuberay
    helm.sh/chart: ray-cluster-1.0.0
  name: raycluster
spec:
  enableInTreeAutoscaling: false
  headGroupSpec:
    rayStartParams:
      num-cpus: "0"
      dashboard-host: 0.0.0.0
    serviceType: ClusterIP
    template:
      metadata:
        annotations:
          vci.vke.volcengine.com/preferred-instance-family: vci.n3i
          prometheus.io/path: /metrics
          prometheus.io/port: "8080"
          prometheus.io/scrape: "true"
        labels:
          app.kubernetes.io/managed-by: Helm
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-1.0.0
      spec:
        terminationGracePeriodSeconds: 600
        affinity: {}
        containers:
        - env:
          - name: LOG_UPDATE_INTERVAL_S
            value: '5'
          - name: VOLC_REGION
            value: cn-beijing
          - name: EMR_TOS_BUCKET_TAG_ENABLED
            value: "true"
          image: emr-vke-qa-cn-beijing.cr.volces.com/emr/ray:2.36.0-py3.11-ubuntu20.04-278
          imagePullPolicy: IfNotPresent
          name: ray-head
          resources:
            limits:
              cpu: "1"
              memory: 2Gi
            requests:
              cpu: "1"
              memory: 2Gi
          securityContext:
            capabilities:
              add:
              - SYS_PTRACE
          volumeMounts:
          - mountPath: /opt/hadoop/etc/hadoop
            name: core-site-volume
        imagePullSecrets:
        - name: emr-image-regsecret
        tolerations: []
        volumes:
        - configMap:
            defaultMode: 420
            name: ray-cluster-core-site
          name: core-site-volume
  workerGroupSpecs:
  - groupName: workergroup
    maxReplicas: 2147483647
    minReplicas: 0
    rayStartParams: {}
    replicas: 1
    template:
      metadata:
        annotations:
          vci.vke.volcengine.com/preferred-instance-family: vci.n3i
          prometheus.io/path: /metrics
          prometheus.io/port: "8080"
          prometheus.io/scrape: "true"
        labels:
          app.kubernetes.io/managed-by: Helm
          app.kubernetes.io/name: kuberay
          helm.sh/chart: ray-cluster-1.0.0
      spec:
        affinity: {}
        containers:
        - env:
          - name: LOG_UPDATE_INTERVAL_S
            value: '5'
          - name: VOLC_REGION
            value: cn-beijing
          - name: EMR_TOS_BUCKET_TAG_ENABLED
            value: "true"
          image: emr-vke-qa-cn-beijing.cr.volces.com/emr/ray:2.36.0-py3.11-ubuntu20.04-278
          imagePullPolicy: IfNotPresent
          name: ray-worker
          resources:
            limits:
              cpu: "1"
              memory: 1Gi
            requests:
              cpu: "1"
              memory: 1Gi
          securityContext:
            capabilities:
              add:
              - SYS_PTRACE
          volumeMounts:
          - mountPath: /opt/hadoop/etc/hadoop
            name: core-site-volume
        imagePullSecrets:
        - name: emr-image-regsecret
        tolerations: []
        volumes:
        - configMap:
            defaultMode: 420
            name: ray-cluster-core-site
          name: core-site-volume
---
apiVersion: v1
data:
  core-site.xml: |
    <configuration>
    <property>
        <name>fs.AbstractFileSystem.tos.impl</name>
        <value>io.proton.tos.TOS</value>
    </property>
    <property>
        <name>fs.tos.impl</name>
        <value>io.proton.fs.RawFileSystem</value>
    </property>
    <property>
        <name>fs.tos.endpoint</name>
        <value>tos-cn-beijing.ivolces.com</value>
    </property>
    <property>
        <name>mapreduce.outputcommitter.factory.scheme.tos</name>
        <value>io.proton.commit.CommitterFactory</value>
    </property>
    <property>
        <name>fs.tos.credentials.provider</name>
        <value>io.proton.common.object.tos.auth.DefaultCredentialsProviderChain</value>
    </property>
    </configuration>
kind: ConfigMap
metadata:
  name: ray-cluster-core-site
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    meta.helm.sh/release-name: ingress-release-name
    meta.helm.sh/release-namespace: ingress-namespace
    nginx.ingress.kubernetes.io/rewrite-target: /$1
  labels:
    app.kubernetes.io/instance: ingress-release-name
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kuberay
    helm.sh/chart: ray-cluster-1.0.0
  name: ingress-release-name
  namespace: ingress-namespace
spec:
  ingressClassName: nginx
  rules:
  - http:
      paths:
      - backend:
          service:
            name: ingress-release-name-head-svc
            port:
              number: 8265
        path: /ingress-namespace/ingress-release-name/(.*)
        pathType: Exact
      - backend:
          service:
            name: ingress-release-name-head-svc
            port:
              number: 8080
        path: /ingress-namespace/ingress-release-name-metrics/(.*)
        pathType: Exact

如果您需要了解更多关于实例规格族的相关信息,请参考:Pod Annotation 说明--容器服务-火山引擎