Comet Kubernetes Support¶
Comet Docker Images¶
Run the following command from the root of this repository to build the Comet Docker image, or use a published Docker image
docker build -t apache/datafusion-comet -f kube/Dockerfile .
Example Spark Submit¶
The exact syntax will vary depending on the Kubernetes distribution, but an example spark-submit
command can be
found here.
Helm chart¶
Install helm Spark operator for Kubernetes
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set webhook.enable=true
Check the operator is deployed
helm status --namespace spark-operator my-release
NAME: my-release
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
Create example Spark application file spark-pi.yaml
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: ghcr.io/apache/datafusion-comet:spark-3.4-scala-2.12-0.2.0
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.4.2.jar
sparkConf:
"spark.executor.extraClassPath": "/opt/spark/jars/comet-spark-spark3.4_2.12-0.2.0.jar"
"spark.driver.extraClassPath": "/opt/spark/jars/comet-spark-spark3.4_2.12-0.2.0.jar"
"spark.plugins": "org.apache.spark.CometPlugin"
"spark.comet.enabled": "true"
"spark.comet.exec.enabled": "true"
"spark.comet.cast.allowIncompatible": "true"
"spark.comet.exec.shuffle.enabled": "true"
"spark.comet.exec.shuffle.mode": "auto"
"spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager"
sparkVersion: 3.4.3
driver:
labels:
version: 3.4.3
cores: 1
coreLimit: 1200m
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.4.3
instances: 1
cores: 2
coreLimit: 1200m
memory: 512m
Refer to Comet builds
Run Apache Spark application with Comet enabled
kubectl apply -f spark-pi.yaml
Check application status
kubectl describe sparkapplication --namespace=spark-operator
More info on Kube Spark operator