KubeRay 1.0: Distributed Computing on Kubernetes with Ray

K8s Guru
2 min read
KubeRay 1.0: Distributed Computing on Kubernetes with Ray

Introduction

KubeRay 1.0, released on November 25, 2023, is most relevant if you’re operating Kubernetes in production and want fewer surprises. This post walks through the highlights and the kinds of operational scenarios where the changes tend to matter first.


Ray Operator Features

  • Ray cluster management provides declarative configuration for Ray clusters on Kubernetes.
  • Autoscaling enables dynamic scaling of Ray workers based on workload demands.
  • Resource management provides better resource allocation and limits for Ray workloads.
  • Service integration enables seamless integration with Kubernetes services and ingress.

Distributed Computing Capabilities

  1. Distributed training enables scalable machine learning model training across multiple nodes.
  2. Distributed inference enables high-throughput model serving with horizontal scaling.
  3. Hyperparameter tuning provides efficient hyperparameter search using distributed computing.
  4. Distributed data processing enables large-scale data processing workloads.

Kubernetes Integration

  • CRD support provides Kubernetes-native resources for Ray clusters and jobs.
  • RBAC integration provides fine-grained permissions for Ray operations.
  • Monitoring integration enables better visibility into Ray cluster health and metrics.
  • Logging improvements provide better log aggregation and analysis for Ray workloads.

Getting Started

kubectl create -k https://github.com/ray-project/kuberay/ray-operator/config/default

Create a RayCluster:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: ray-cluster
spec:
  headGroupSpec:
    serviceType: ClusterIP
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:latest
          resources:
            limits:
              cpu: "1"
              memory: "1Gi"
            requests:
              cpu: "1"
              memory: "1Gi"
  workerGroupSpecs:
  - replicas: 2
    minReplicas: 1
    maxReplicas: 4
    groupName: worker-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:latest
          resources:
            limits:
              cpu: "1"
              memory: "1Gi"
            requests:
              cpu: "1"
              memory: "1Gi"

Summary

AspectDetails
Release DateNovember 25, 2023
Headline FeaturesRay operator for Kubernetes, distributed computing capabilities, Kubernetes integration
Why it MattersDelivers scalable distributed computing and ML workloads on Kubernetes with native operator support

KubeRay 1.0 provides teams with powerful distributed computing capabilities for machine learning and data processing workloads on Kubernetes.