在 Kubernetes 中使用 Ingress 控制器来管理外部访问集群服务的路由,本章节介绍下,如何通过Ingress方式提供RayCluster的Dashboard UI,这样即可通过Ray的UI查看任务情况。
apiVersion: ray.io/v1 kind: RayJob metadata: name: rayjob-sample annotations: #此annotation必须要加 否则访问dashboard会报404 nginx.ingress.kubernetes.io/rewrite-target: /$1 spec: entrypoint: python /home/ray/samples/sample_code.py # shutdownAfterJobFinishes specifies whether the RayCluster should be deleted after the RayJob finishes. Default is false. shutdownAfterJobFinishes: true # ttlSecondsAfterFinished specifies the number of seconds after which the RayCluster will be deleted after the RayJob finishes. ttlSecondsAfterFinished: 3000 runtimeEnvYAML: | env_vars: counter_name: "test_counter" # rayClusterSpec specifies the RayCluster instance to be created by the RayJob controller. rayClusterSpec: rayVersion: '2.9.3' # should match the Ray version in the image of the containers # Ray head pod template headGroupSpec: # 开启 ingress enableIngress: true rayStartParams: dashboard-host: '0.0.0.0' template: spec: containers: - name: ray-head image: emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.9.3-py3.9-ubuntu20.04-1.2.0 ports: - containerPort: 6379 name: gcs-server - containerPort: 8265 # Ray dashboard name: dashboard - containerPort: 10001 name: client resources: limits: cpu: "1" requests: cpu: "200m" volumeMounts: - mountPath: /home/ray/samples name: code-sample volumes: - name: code-sample configMap: # Provide the name of the ConfigMap you want to mount. name: ray-job-code-sample # An array of keys from the ConfigMap to create as files items: - key: sample_code.py path: sample_code.py workerGroupSpecs: # the pod replicas in this group typed worker - replicas: 1 minReplicas: 1 maxReplicas: 5 # logical group name, for this called small-group, also can be functional groupName: small-group # The `rayStartParams` are used to configure the `ray start` command. # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay. # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`. rayStartParams: {} #pod template template: spec: containers: - name: ray-worker image: emr-vke-public-cn-beijing.cr.volces.com/emr/ray:2.9.3-py3.9-ubuntu20.04-20240402 lifecycle: preStop: exec: command: [ "/bin/sh","-c","ray stop" ] resources: limits: cpu: "1" requests: cpu: "200m" --- apiVersion: v1 kind: ConfigMap metadata: name: ray-job-code-sample data: sample_code.py: | import ray import os import requests ray.init() @ray.remote class Counter: def __init__(self): self.name = os.getenv("counter_name") assert self.name == "test_counter" self.counter = 0 def inc(self): self.counter += 1 def get_counter(self): return "{} got {}".format(self.name, self.counter) counter = Counter.remote() for _ in range(5): ray.get(counter.inc.remote()) print(ray.get(counter.get_counter.remote())) assert requests.__version__ == "2.31.0"
yaml文件中nginx.ingress.kubernetes.io/rewrite-target: /$1
是一个注解,专门用于 Nginx Ingress 控制器,通过 Ingress 路由的请求指定一个新的 URL 目标路径,/$1
表示 Nginx Ingress 控制器将会捕获进入请求的 URI 中的第一部分(通常是路径的第一个部分),并将其作为参数传递给后端服务。
执行yaml文件
kubectl apply -f rayjob.ingress.yaml -n <命名空间>
访问Ray Dashboard ui
可以通过下述方式获取ingress访问的入口endpoint信息:
kubectl get ingress -n <命名空间> kubectl describe ingress ingress名称 -n <命名空间>
将上述信息组合即为最终版URL,以上图为例:http://<ingress地址>/rayjob-sample-raycluster-c59cq/ 即可访问Dashboard.
删除RayJob作业
kubectl delete -f rayjob.ingress.yaml -n <命名空间>