Comet Kubernetes Support¶
Comet Docker Images¶
Run the following command from the root of this repository to build the Comet Docker image, or use a published Docker image.
docker build -t apache/datafusion-comet -f kube/Dockerfile .
Example Spark Submit¶
The exact syntax will vary depending on the Kubernetes distribution, but an example spark-submit
command can be
found here.
Helm chart¶
Install helm Spark operator for Kubernetes
# Add the Helm repository
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace --wait
Check the operator is deployed
helm status --namespace spark-operator spark-operator
NAME: my-release
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
Create example Spark application file spark-pi.yaml
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: apache/datafusion-comet:0.7.0-spark3.5.4-scala2.12-java11
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
sparkConf:
"spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar"
"spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.5_2.12-0.7.0.jar"
"spark.plugins": "org.apache.spark.CometPlugin"
"spark.comet.enabled": "true"
"spark.comet.exec.enabled": "true"
"spark.comet.cast.allowIncompatible": "true"
"spark.comet.exec.shuffle.enabled": "true"
"spark.comet.exec.shuffle.mode": "auto"
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
sparkVersion: 3.5.4
driver:
labels:
version: 3.5.4
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.4
instances: 1
cores: 1
coreLimit: 1200m
memory: 512m
Refer to Comet builds
Run Apache Spark application with Comet enabled
kubectl apply -f spark-pi.yaml
sparkapplication.sparkoperator.k8s.io/spark-pi created
Check application status
kubectl get sparkapp spark-pi
NAME STATUS ATTEMPTS START FINISH AGE
spark-pi RUNNING 1 2025-03-18T21:19:48Z <no value> 65s
To check more runtime details
kubectl describe sparkapplication spark-pi
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SparkApplicationSubmitted 8m15s spark-application-controller SparkApplication spark-pi was submitted successfully
Normal SparkDriverRunning 7m18s spark-application-controller Driver spark-pi-driver is running
Normal SparkExecutorPending 7m11s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] is pending
Normal SparkExecutorRunning 7m10s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] is running
Normal SparkExecutorCompleted 7m5s spark-application-controller Executor [spark-pi-68732195ab217303-exec-1] completed
Normal SparkDriverCompleted 7m4s spark-application-controller Driver spark-pi-driver completed
Get Driver Logs
kubectl logs spark-pi-driver
More info on Kube Spark operator