Apache DataFusion Comet 0.10.0 Release
Posted on: Tue 16 September 2025 by pmc
The Apache DataFusion PMC is pleased to announce version 0.10.0 of the Comet subproject.
Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.
This release covers approximately ten weeks of development work and is the result of merging 183 PRs from 26 contributors. See the change log for more information.
Release Highlights¶
Improved Support for Apache Iceberg¶
It is now possible to use Comet with Apache Iceberg 1.8.1 to accelerate reads of Iceberg Parquet tables. Please refer to Comet's Iceberg Guide for information on building Iceberg with Comet.
Improved Spark 4.0.0 Support¶
Comet no longer falls back to Spark for all queries when ANSI mode is enabled (which is the default in Spark 4.0.0). Instead, Comet will now only fall back to Spark for arithmetic and aggregates expressions that support ANSI mode.
Setting spark.comet.ansi.ignore=true
will override this behavior and force these expressions to continue to be
accelerated by Comet. Full support for ANSI mode will be available in a future release.
Comet will now use the native_iceberg_compat
scan for Spark 4.0.0 in most cases, which supports reading complex types.
New Functionality¶
The following SQL functions are now supported:
array_min
map_entries
map_from_array
randn
from_unixtime
monotonically_increasing_id
spark_partition_id
try_add
try_divide
try_mod
try_multiply
try_subtract
Other new features include:
- Support for array literals
- Support for limit with offset
UX Improvements¶
- Improved reporting of reasons why Comet cannot accelerate some operators and expressions
- New
spark.comet.logFallbackReasons.enabled
configuration setting for logging all fallback reasons - CometScan nodes in the physical plan now show which scan implementation is being used (
native_comet
,native_datafusion
, ornative_iceberg_compat
)
Bug Fixes¶
- Improved memory safety for FFI transfers
- Fixed a double-free issue in the shuffle unified memory pool
- Fixed an FFI issue with non-zero offsets
- Fixed an issue with buffered reads from HDFS
Benchmarking¶
Benchmarking scripts for benchmarks based on TPC-H and TPS-DS are now available in the repository under dev/benchmarks
.
Documentation Updates¶
- The documentation for supported operators and expressions is now more complete, and Spark-compatibility status per operator/expression is now documented.
- The documentation now contains a roadmap section.
- New guide comparing Comet with Apache Gluten (incubating) + Velox
- User guides are now available for multiple Comet versions
Spark Compatibility¶
- Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13
- Spark 3.5.4 through 3.5.6 with JDK 11 & 17, Scala 2.12 & 2.13
- Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13
We are looking for help from the community to fully support Spark 4.0.0. See EPIC: Support 4.0.0 for more information.
Getting Involved¶
The Comet project welcomes new contributors. We use the same Slack and Discord channels as the main DataFusion project and have a weekly DataFusion video call.
The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the Getting Started guide for instructions on downloading and installing Comet.
There are also many good first issues waiting for contributions.
Comments
We use Giscus for comments, powered by GitHub Discussions. To respect your privacy, Giscus and comments will load only if you click "Show Comments"