Apache DataFusion Comet 0.10.0 Release

Posted on: Tue 16 September 2025 by pmc

The Apache DataFusion PMC is pleased to announce version 0.10.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately ten weeks of development work and is the result of merging 183 PRs from 26 contributors. See the change log for more information.

Release Highlights¶

Improved Support for Apache Iceberg¶

It is now possible to use Comet with Apache Iceberg 1.8.1 to accelerate reads of Iceberg Parquet tables. Please refer to Comet's Iceberg Guide for information on building Iceberg with Comet.

Improved Spark 4.0.0 Support¶

Comet no longer falls back to Spark for all queries when ANSI mode is enabled (which is the default in Spark 4.0.0). Instead, Comet will now only fall back to Spark for arithmetic and aggregates expressions that support ANSI mode.

Setting spark.comet.ansi.ignore=true will override this behavior and force these expressions to continue to be accelerated by Comet. Full support for ANSI mode will be available in a future release.

Comet will now use the native_iceberg_compat scan for Spark 4.0.0 in most cases, which supports reading complex types.

New Functionality¶

The following SQL functions are now supported:

array_min
map_entries
map_from_array
randn
from_unixtime
monotonically_increasing_id
spark_partition_id
try_add
try_divide
try_mod
try_multiply
try_subtract

Other new features include:

Support for array literals
Support for limit with offset

UX Improvements¶

Improved reporting of reasons why Comet cannot accelerate some operators and expressions
New spark.comet.logFallbackReasons.enabled configuration setting for logging all fallback reasons
CometScan nodes in the physical plan now show which scan implementation is being used (native_comet, native_datafusion, or native_iceberg_compat)

Bug Fixes¶

Improved memory safety for FFI transfers
Fixed a double-free issue in the shuffle unified memory pool
Fixed an FFI issue with non-zero offsets
Fixed an issue with buffered reads from HDFS

Benchmarking¶

Benchmarking scripts for benchmarks based on TPC-H and TPS-DS are now available in the repository under dev/benchmarks.

Documentation Updates¶

The documentation for supported operators and expressions is now more complete, and Spark-compatibility status per operator/expression is now documented.
The documentation now contains a roadmap section.
New guide comparing Comet with Apache Gluten (incubating) + Velox
User guides are now available for multiple Comet versions

Spark Compatibility¶

Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13
Spark 3.5.4 through 3.5.6 with JDK 11 & 17, Scala 2.12 & 2.13
Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13

We are looking for help from the community to fully support Spark 4.0.0. See EPIC: Support 4.0.0 for more information.

Getting Involved¶

The Comet project welcomes new contributors. We use the same Slack and Discord channels as the main DataFusion project and have a weekly DataFusion video call.

The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the Getting Started guide for instructions on downloading and installing Comet.

There are also many good first issues waiting for contributions.

Comments

Copyright 2025, The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache® and the Apache feather logo are trademarks of The Apache Software Foundation.