Warning
This is out-of-date documentation. The latest Comet release is version 0.10.0.
Comet Kubernetes Support¶
Comet Docker Images¶
Run the following command from the root of this repository to build the Comet Docker image, or use a published Docker image.
docker build -t apache/datafusion-comet -f kube/Dockerfile .
Example Spark Submit¶
The exact syntax will vary depending on the Kubernetes distribution, but an example spark-submit
command can be
found here.
Helm chart¶
Install helm Spark operator for Kubernetes
# Add the Helm repository
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace --wait
Check the operator is deployed
helm status --namespace spark-operator spark-operator
NAME: my-release
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
Create example Spark application file spark-pi.yaml
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: apache/datafusion-comet:0.9.1-spark3.5.5-scala2.12-java11
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar
sparkConf:
"spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.9.1.jar"
"spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.9.1.jar"
"spark.plugins": "org.apache.spark.CometPlugin"
"spark.comet.enabled": "true"
"spark.comet.exec.enabled": "true"
"spark.comet.cast.allowIncompatible": "true"
"spark.comet.exec.shuffle.enabled": "true"
"spark.comet.exec.shuffle.mode": "auto"
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
sparkVersion: 3.5.6
driver:
labels:
version: 3.5.6
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.6
instances: 1
cores: 1
coreLimit: 1200m
memory: 512m
Refer to Comet builds
Run Apache Spark application with Comet enabled
kubectl apply -f spark-pi.yaml
sparkapplication.sparkoperator.k8s.io/spark-pi created
Check application status
kubectl get sparkapp spark-pi
NAME STATUS ATTEMPTS START FINISH AGE
spark-pi RUNNING 1 2025-03-18T21:19:48Z <no value> 65s
To check more runtime details
kubectl describe sparkapplication spark-pi
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SparkApplicationSubmitted 8m15s spark-application-controller SparkApplication spark-pi was submitted successfully
Normal SparkDriverRunning 7m18s spark-application-controller Driver spark-pi-driver is running
Normal SparkExecutorPending 7m11s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] is pending
Normal SparkExecutorRunning 7m10s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] is running
Normal SparkExecutorCompleted 7m5s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] completed
Normal SparkDriverCompleted 7m4s spark-application-controller Driver spark-pi-driver completed
Get Driver Logs
kubectl logs spark-pi-driver
More info on Kube Spark operator