Apache DataFusion Comet: Benchmarks Derived From TPC-DS#

The following benchmarks were performed on an EKS cluster (r6i.24xlarge instances with EBS storage) with data stored in S3.

The tracking issue for improving TPC-DS performance is #858.

Configuration#

Common:

spark.executor.instances=32
spark.executor.cores=16
spark.memory.fraction=0.6
spark.memory.storageFraction=0.2
# Kubernetes CPU constraints
spark.kubernetes.executor.request.cores=8
spark.kubernetes.executor.limit.cores=8

Spark:

spark.executor.memory=64G
spark.executor.memoryOverhead=10G

Comet:

spark.executor.memory=64G
spark.executor.memoryOverhead=10G
spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=32G
spark.comet.memoryPool.fraction=0.8

Benchmark Results#

Here is a breakdown showing relative performance of Spark and Comet for each query.

The following chart shows how much Comet currently accelerates each query from the benchmark in relative terms.

The following chart shows how much Comet currently accelerates each query from the benchmark in absolute terms.