Apache DataFusion Comet 0.10.0 Release

Posted on: Tue 16 September 2025 by pmc

The Apache DataFusion PMC is pleased to announce version 0.10.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately ten weeks of development work and is the result of merging 183 PRs from 26 contributors. See the change log for more information.

Release Highlights

Improved Support for Apache Iceberg

It is now possible to use Comet with Apache Iceberg 1.8.1 to accelerate reads of Iceberg Parquet tables. Please refer to Comet's Iceberg Guide for information on building Iceberg with Comet.

Improved Spark 4.0.0 Support

Comet no longer falls back to Spark for all queries when ANSI mode is enabled (which is the default in Spark 4.0.0). Instead, Comet will now only fall back to Spark for arithmetic and aggregates expressions that support ANSI mode.

Setting spark.comet.ansi.ignore=true will override this behavior and force these expressions to continue to be accelerated by Comet. Full support for ANSI mode will be available in a future release.

Comet will now use the native_iceberg_compat scan for Spark 4.0.0 in most cases, which supports reading complex types.

New Functionality

The following SQL functions are now supported:

  • array_min
  • map_entries
  • map_from_array
  • randn
  • from_unixtime
  • monotonically_increasing_id
  • spark_partition_id
  • try_add
  • try_divide
  • try_mod
  • try_multiply
  • try_subtract

Other new features include:

  • Support for array literals
  • Support for limit with offset

UX Improvements

  • Improved reporting of reasons why Comet cannot accelerate some operators and expressions
  • New spark.comet.logFallbackReasons.enabled configuration setting for logging all fallback reasons
  • CometScan nodes in the physical plan now show which scan implementation is being used (native_comet, native_datafusion, or native_iceberg_compat)

Bug Fixes

  • Improved memory safety for FFI transfers
  • Fixed a double-free issue in the shuffle unified memory pool
  • Fixed an FFI issue with non-zero offsets
  • Fixed an issue with buffered reads from HDFS

Benchmarking

Benchmarking scripts for benchmarks based on TPC-H and TPS-DS are now available in the repository under dev/benchmarks.

Documentation Updates

  • The documentation for supported operators and expressions is now more complete, and Spark-compatibility status per operator/expression is now documented.
  • The documentation now contains a roadmap section.
  • New guide comparing Comet with Apache Gluten (incubating) + Velox
  • User guides are now available for multiple Comet versions

Spark Compatibility

  • Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13
  • Spark 3.5.4 through 3.5.6 with JDK 11 & 17, Scala 2.12 & 2.13
  • Experimental support for Spark 4.0.0 with JDK 17, Scala 2.13

We are looking for help from the community to fully support Spark 4.0.0. See EPIC: Support 4.0.0 for more information.

Getting Involved

The Comet project welcomes new contributors. We use the same Slack and Discord channels as the main DataFusion project and have a weekly DataFusion video call.

The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the Getting Started guide for instructions on downloading and installing Comet.

There are also many good first issues waiting for contributions.


Comments