Comet Overview¶
Apache DataFusion Comet is a high-performance accelerator for Apache Spark, built on top of the powerful Apache DataFusion query engine. Comet is designed to significantly enhance the performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the Spark ecosystem without requiring any code changes.
The following diagram provides an overview of Comet’s architecture.
Architecture¶
The following diagram shows how Comet integrates with Apache Spark.
Feature Parity with Apache Spark¶
The project strives to keep feature parity with Apache Spark, that is, users should expect the same behavior (w.r.t features, configurations, query results, etc) with Comet turned on or turned off in their Spark jobs. In addition, Comet extension should automatically detect unsupported features and fallback to Spark engine.
Comparison with other open-source Spark accelerators¶
There are two other major open-source Spark accelerators:
We have a detailed guide comparing Apache DataFusion Comet with Apache Gluten.
Spark RAPIDS is a solution that provides hardware acceleration on NVIDIA GPUs. Comet does not require specialized hardware.
Getting Started¶
Refer to the Comet Installation Guide to get started.