EMR on VKE服务提供对外的Ray服务使用的镜像列表,便于用户基于这些镜像做二次开发。
说明
(目前emr-vke-public-{region}.cr.volces.com/emr/ray
镜像未提供有用户名和密码;emr-vke-public-{region}.cr.volces.com/emr/ray-ds
和emr-vke-public-{region}.cr.volces.com/emr/ray-ml
需要用户名和密码访问)
Ray是一个开源的分布式计算框架,用于构建和运行分布式应用程序。它支持多种编程模型,包括任务并行和数据并行,提供高性能和易用性。Ray通过其灵活的API和高效的运行时,使得并行和分布式计算更加简单和高效。
关于Ray提供三种场景的镜像仓库:
emr-vke-public-{region}.cr.volces.com/emr/ray
:包含Ray及其所需依赖的镜像,内置有Conda、Java8、Hadoop服务。emr-vke-public-{region}.cr.volces.com/emr/ray-ds
:在 ray 的基础镜像之上,提供Spark、RayDP及其依赖,可用于数据处理场景。emr-vke-public-{region}.cr.volces.com/emr/ray-ml
:在 ray-ds 的镜像之上,提供Torch、Tensorflow及其依赖,可用于机器学习场景。其中{region}
是Ray集群所在的region英文名。目前支持region地区有华北、华东、华南、亚太东南(柔佛)
Region(中文名称) | Region(英文名称) | 不同场景下镜像仓库名称 |
---|---|---|
华北 | cn-beijing |
|
华东 | cn-shanghai |
|
华南 | cn-guangzhou |
|
亚太东南(柔佛) | ap-southeast-1 |
|
同时,不同的EMR 产品版本,提供的Ray版本、操作系统版本可能也有所不同。通过镜像Tag来区分:
{ray.version}-py{python.version}-ubuntu{ubuntu.version}[-{build.version}]-{emr.version}
{ray.version}-cu{cuda.version}-py{python.version}-ubuntu{ubuntu.version}[-{build.version}]-{emr.version}
参数 | 说明 |
---|---|
ray.version | Ray的版本号。示例:2.9.3 |
python.version | Python版本。示例:3.9 |
ubuntu.version | Ubuntu系统的版本。示例:20.04 |
cuda.version | CUDA 版本。示例:11.8.0 |
emr.version | EMR on VKE服务的产品版本。示例1.2.0 |
build.version | Ray编译的version信息,是EMR内部维护的版本信息 |
支持的Tag列表:
2.9.3-py3.9-ubuntu20.04-1.2.0
适配EMR产品版本1.2.0及以上版本2.9.3-cu11.8.0-py3.9-ubuntu20.04-1.2.0
适配EMR产品版本1.2.0及以上版本2.22.0-py3.9-ubuntu20.04-1.4.0
适配EMR产品版本1.2.0及以上版本2.22.0-py3.9-ubuntu20.04-155-1.4.0
适配EMR产品版本1.2.0及以上版本2.22.0-cu11.8.0-py3.9-ubuntu20.04-155-1.4.0
适配EMR产品版本1.2.0及以上版本2.30.0-py3.11-ubuntu20.04-207-1.5.0
适配EMR产品版本1.2.0及以上版本2.30.0-cu11.8.0-py3.11-ubuntu20.04-207-1.5.0
适配EMR产品版本1.2.0及以上版本2.30.0-cu12.1.0-py3.11-ubuntu20.04-211-1.5.0
适配EMR产品版本1.2.0及以上版本2.33.0-py3.11-ubuntu20.04-244-1.6.0
适配EMR产品版本1.2.0及以上版本2.33.0-cu11.8.0-py3.11-ubuntu20.04-244-1.6.0
适配EMR产品版本1.2.0及以上版本2.36.0-py3.11-ubuntu20.04-329-3.12.0
适配EMR产品版本3.12.0及以上版本2.36.0-cu11.8.0-py3.11-ubuntu20.04-329-3.12.0
适配EMR产品版本3.12.0及以上版本以华北 Region为例:
Ray的版本 | 镜像名称 | 大小 | 备注依赖包 |
---|---|---|---|
2.36.0 | emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.36.0-py3.11-ubuntu20.04-329-3.12.0 | 4.11GB |
|
emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.36.0-cu11.8.0-py3.11-ubuntu20.04-329-3.12.0 | 13.6GB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.36.0-py3.11-ubuntu20.04-329-3.12.0 | 4.6GB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.36.0-cu11.8.0-py3.11-ubuntu20.04-329-3.12.0 | 14.1GB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.36.0-py3.11-ubuntu20.04-329-3.12.0 | 8.84GB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.36.0-cu11.8.0-py3.11-ubuntu20.04-329-3.12.0 | 22.1GB |
| |
2.33.0 | emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.33.0-cu11.8.0-py3.11-ubuntu20.04-244-1.6.0 | 6.67GiB |
|
emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.33.0-py3.11-ubuntu20.04-244-1.6.0 | 1.75GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.33.0-cu11.8.0-py3.11-ubuntu20.04-244-1.6.0 | 7.02GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.33.0-py3.11-ubuntu20.04-244-1.6.0 | 2.11GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.33.0-cu11.8.0-py3.11-ubuntu20.04-244-1.6.0 | 10.44GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.33.0-py3.11-ubuntu20.04-244-1.6.0 | 3.48GiB |
| |
2.30.0 | emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.30.0-py3.11-ubuntu20.04-207-1.5.0 | 1.76GiB |
|
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.30.0-py3.11-ubuntu20.04-207-1.5.0 | 2.11GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.30.0-py3.11-ubuntu20.04-207-1.5.0 | 3.46GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.30.0-cu11.8.0-py3.11-ubuntu20.04-207-1.5.0 | 6.67GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.30.0-cu11.8.0-py3.11-ubuntu20.04-207-1.5.0 | 7.02GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.30.0-cu11.8.0-py3.11-ubuntu20.04-207-1.5.0 | 10.42GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.30.0-cu12.1.0-py3.11-ubuntu20.04-211-1.5.0 | 6.66GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.30.0-cu12.1.0-py3.11-ubuntu20.04-211-1.5.0 | 7.01GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.30.0-cu12.1.0-py3.11-ubuntu20.04-211-1.5.0 | 10.82GiB |
| |
2.22.0 | emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.22.0-py3.9-ubuntu20.04-178-1.5.0 | 1.66GiB |
|
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.22.0-py3.9-ubuntu20.04-178-1.5.0 | 2.01GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.22.0-py3.9-ubuntu20.04-178-1.5.0 | 3.45GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.22.0-cu11.8.0-py3.9-ubuntu20.04-178-1.5.0 | 6.56GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.22.0-cu11.8.0-py3.9-ubuntu20.04-178-1.5.0 | 6.91GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.22.0-cu11.8.0-py3.9-ubuntu20.04-178-1.5.0 | 12.09GiB |
| |
2.9.3 | emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.9.3-py3.9-ubuntu20.04-1.2.0 | 1.66GiB |
|
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.9.3-py3.9-ubuntu20.04-1.2.0 | 2.01GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.9.3-py3.9-ubuntu20.04-1.2.0 | 3.45GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.9.3-cu11.8.0-py3.9-ubuntu20.04-1.2.0 | 6.56GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.9.3-cu11.8.0-py3.9-ubuntu20.04-1.2.0 | 6.91GiB |
| |
emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ml:2.9.3-cu11.8.0-py3.9-ubuntu20.04-1.2.0 | 12.09GiB |
|
下面以直接使用为例进行解释,也可以参考使用自定义 Docker 镜像运行作业。
您可以使用下述命令生成秘钥:
kubectl create secret docker-registry my-docker-secret --docker-server={DOCKER_REGISTRY_SERVER} --docker-username={DOCKER_USER} --docker-password={DOCKER_PASSWORD} -n {YOUR_NAMESPACE}
其中
my-docker-secret
:秘钥名称。 用于下面Yaml文件中imagePullSecrets
的值。YOUR_NAMESPACE
:执行Ray作业的命名空间。DOCKER_REGISTRY_SERVER
、DOCKER_USER
、DOCKER_PASSWORD
:docker镜像仓库地址、用户名和密码,可以联系下EMR服务工程师获取相关参数值。以test.yaml文件示例
apiVersion: ray.io/v1 kind: RayJob metadata: name: rayjob-sample spec: entrypoint: python /home/ray/samples/sample_code.py # shutdownAfterJobFinishes specifies whether the RayCluster should be deleted after the RayJob finishes. Default is false. shutdownAfterJobFinishes: true # ttlSecondsAfterFinished specifies the number of seconds after which the RayCluster will be deleted after the RayJob finishes. ttlSecondsAfterFinished: 300 submitterPodTemplate: spec: imagePullSecrets: - name: my-docker-secret restartPolicy: OnFailure containers: - name: submit-image image: emr-vke-public-cn-beijing.cr.volces.com/emr/ray-ds:2.9.3-py3.9-ubuntu20.04-1.2.0 # rayClusterSpec specifies the RayCluster instance to be created by the RayJob controller. rayClusterSpec: rayVersion: '2.9.3' # should match the Ray version in the image of the containers # Ray head pod template headGroupSpec: rayStartParams: dashboard-host: '0.0.0.0' template: spec: imagePullSecrets: - name: my-docker-secret restartPolicy: OnFailure containers: - name: ray-head image: emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.9.3-py3.9-ubuntu20.04-1.2.0 ports: - containerPort: 6379 name: gcs-server - containerPort: 8265 # Ray dashboard name: dashboard - containerPort: 10001 name: client resources: limits: cpu: "1" requests: cpu: "200m" volumeMounts: - mountPath: /home/ray/samples name: code-sample volumes: - name: code-sample configMap: # Provide the name of the ConfigMap you want to mount. name: ray-job-code-sample # An array of keys from the ConfigMap to create as files items: - key: sample_code.py path: sample_code.py workerGroupSpecs: # the pod replicas in this group typed worker - replicas: 1 minReplicas: 1 maxReplicas: 5 # logical group name, for this called small-group, also can be functional groupName: small-group # The `rayStartParams` are used to configure the `ray start` command. # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay. # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`. rayStartParams: {} #pod template template: spec: imagePullSecrets: - name: my-docker-secret restartPolicy: OnFailure containers: - name: ray-worker image: emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.9.3-py3.9-ubuntu20.04-1.2.0 lifecycle: preStop: exec: command: [ "/bin/sh","-c","ray stop" ] resources: limits: cpu: "1" requests: cpu: "200m" --- apiVersion: v1 kind: ConfigMap metadata: name: ray-job-code-sample data: sample_code.py: | import ray import os import requests ray.init() @ray.remote class Counter: def __init__(self): self.name = "test_counter" self.counter = 0 def inc(self): self.counter += 1 def get_counter(self): return "{} got {}".format(self.name, self.counter) counter = Counter.remote() for _ in range(5): ray.get(counter.inc.remote()) print(ray.get(counter.get_counter.remote()))
提交RayJob:
kubectl apply -f test.yaml -n {YOUR_NAMESPACE}