Comet Kubernetes Support

Comet Docker Images

Run the following command from the root of this repository to build the Comet Docker image, or use a published Docker image.

docker build -t apache/datafusion-comet -f kube/Dockerfile .

Example Spark Submit

The exact syntax will vary depending on the Kubernetes distribution, but an example spark-submit command can be found here.

Helm chart

Install helm Spark operator for Kubernetes

# Add the Helm repository
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update

# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace --wait

Check the operator is deployed

helm status --namespace spark-operator spark-operator

NAME: my-release
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None

Create example Spark application file spark-pi.yaml

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: apache/datafusion-comet:0.7.0-spark3.5.4-scala2.12-java11
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
  sparkConf:
    "spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar"
    "spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar"
    "spark.plugins": "org.apache.spark.CometPlugin"
    "spark.comet.enabled": "true"
    "spark.comet.exec.enabled": "true"
    "spark.comet.cast.allowIncompatible": "true"
    "spark.comet.exec.shuffle.enabled": "true"
    "spark.comet.exec.shuffle.mode": "auto"
    "spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
  sparkVersion: 3.5.4
  driver:
    labels:
      version: 3.5.4
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    labels:
      version: 3.5.4
    instances: 1
    cores: 1
    coreLimit: 1200m
    memory: 512m

Refer to Comet builds

Run Apache Spark application with Comet enabled

kubectl apply -f spark-pi.yaml
sparkapplication.sparkoperator.k8s.io/spark-pi created

Check application status

kubectl get sparkapp spark-pi

NAME       STATUS    ATTEMPTS   START                  FINISH       AGE
spark-pi   RUNNING   1          2025-03-18T21:19:48Z   <no value>   65s

To check more runtime details

kubectl describe sparkapplication spark-pi

....
Events:
  Type    Reason                     Age    From                          Message
  ----    ------                     ----   ----                          -------
  Normal  SparkApplicationSubmitted  8m15s  spark-application-controller  SparkApplication spark-pi was submitted successfully
  Normal  SparkDriverRunning         7m18s  spark-application-controller  Driver spark-pi-driver is running
  Normal  SparkExecutorPending       7m11s  spark-application-controller  Executor [spark-pi-68732195ab217303-exec-1] is pending
  Normal  SparkExecutorRunning       7m10s  spark-application-controller  Executor [spark-pi-68732195ab217303-exec-1] is running
  Normal  SparkExecutorCompleted     7m5s   spark-application-controller  Executor [spark-pi-68732195ab217303-exec-1] completed
  Normal  SparkDriverCompleted       7m4s   spark-application-controller  Driver spark-pi-driver completed

Get Driver Logs

kubectl logs spark-pi-driver

More info on Kube Spark operator