Warning

This is out-of-date documentation. The latest Comet release is version 0.10.0.

Comet Kubernetes Support

Comet Docker Images

Run the following command from the root of this repository to build the Comet Docker image, or use a published Docker image.

docker build -t apache/datafusion-comet -f kube/Dockerfile .

Example Spark Submit

The exact syntax will vary depending on the Kubernetes distribution, but an example spark-submit command can be found here.

Helm chart

Install helm Spark operator for Kubernetes

# Add the Helm repository
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update

# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace --wait

Check the operator is deployed

helm status --namespace spark-operator spark-operator

NAME: my-release
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None

Create example Spark application file spark-pi.yaml

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: apache/datafusion-comet:0.9.1-spark3.5.5-scala2.12-java11
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar
  sparkConf:
    "spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.9.1.jar"
    "spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.9.1.jar"
    "spark.plugins": "org.apache.spark.CometPlugin"
    "spark.comet.enabled": "true"
    "spark.comet.exec.enabled": "true"
    "spark.comet.cast.allowIncompatible": "true"
    "spark.comet.exec.shuffle.enabled": "true"
    "spark.comet.exec.shuffle.mode": "auto"
    "spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
  sparkVersion: 3.5.6
  driver:
    labels:
      version: 3.5.6
    cores: 1
    coreLimit: 1200m
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    labels:
      version: 3.5.6
    instances: 1
    cores: 1
    coreLimit: 1200m
    memory: 512m

Refer to Comet builds

Run Apache Spark application with Comet enabled

kubectl apply -f spark-pi.yaml
sparkapplication.sparkoperator.k8s.io/spark-pi created

Check application status

kubectl get sparkapp spark-pi

NAME       STATUS    ATTEMPTS   START                  FINISH       AGE
spark-pi   RUNNING   1          2025-03-18T21:19:48Z   <no value>   65s

To check more runtime details

kubectl describe sparkapplication spark-pi

....
Events:
  Type    Reason                     Age    From                          Message
  ----    ------                     ----   ----                          -------
  Normal  SparkApplicationSubmitted  8m15s  spark-application-controller  SparkApplication spark-pi was submitted successfully
  Normal  SparkDriverRunning         7m18s  spark-application-controller  Driver spark-pi-driver is running
  Normal  SparkExecutorPending       7m11s  spark-application-controller  Executor [spark-pi-68732195ab217303-exec-1] is pending
  Normal  SparkExecutorRunning       7m10s  spark-application-controller  Executor [spark-pi-68732195ab217303-exec-1] is running
  Normal  SparkExecutorCompleted     7m5s   spark-application-controller  Executor [spark-pi-68732195ab217303-exec-1] completed
  Normal  SparkDriverCompleted       7m4s   spark-application-controller  Driver spark-pi-driver completed

Get Driver Logs

kubectl logs spark-pi-driver

More info on Kube Spark operator