Apache DataFusion Comet 0.12.0 Release
Posted on: Thu 04 December 2025 by pmc
The Apache DataFusion PMC is pleased to announce version 0.12.0 of the Comet subproject.
Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.
This release covers approximately four weeks of development work and is the result of merging 105 PRs from 13 contributors. See the change log for more information.
Release Highlights¶
Experimental Native Apache Iceberg Scan Support¶
Comet has a new, experimental, native Iceberg scan. This work relies on iceberg-rust and the Parquet reader from arrow-rs that Comet already uses to great effect. Comet’s existing Iceberg integration relies on a modified Iceberg Java build to accelerate Parquet decoding. This new approach allows unmodified Iceberg Java to handle query planning (i.e., catalog access, partition pruning, etc.), then Comet serializes Iceberg FileScanTask objects directly to iceberg-rust, enabling native execution of Iceberg table scans through DataFusion.
This represents a significant step forward in Comet's support for data lakehouse architectures and expands the range of workloads that can benefit from native acceleration. Please take a look at the PR and Comet’s documentation to understand the current limitations and try it on your workloads! We are eager for feedback on this approach.
Code Architecture Improvements¶
This release includes significant refactoring to improve code maintainability and extensibility, and we will continue those efforts into 0.13.0 development:
- Unified operator serialization: The CometExecRule refactor unifies CometNativeExec creation with serialization through the new
CometOperatorSerdetrait - Expression serde refactoring: Multiple PRs (#2738, #2741, #2791) moved expression serialization logic out of
QueryPlanSerdeinto specialized traits - Aggregate expression improvements: Added getSupportLevel to CometAggregateExpressionSerde trait for better aggregate function handling
These architectural improvements make it easier for contributors to add new operators and expressions while reducing code complexity.
New SQL Functions¶
The following SQL functions are now supported:
concat- String concatenationabs- Absolute valuesha1- SHA-1 hash functioncot- Cotangent function- Hyperbolic trigonometric functions - sinh, cosh, tanh, and their inverse functions
New Operators¶
CometLocalTableScanExec- Native support for local table scans, eliminating fallback to Spark for small, in-memory datasets
Configuration and Usability Improvements¶
- Simplified on-heap configuration: Simplified on-heap memory configuration for easier setup
- Extended explain format: Renamed and improved COMET_EXTENDED_EXPLAIN_FORMAT with better defaults
- Environment variable support: Improved framework for setting configs with environment variables
- Native config passing: All Comet configs now passed to native plan
- Config categorization: Categorized testing configs and added notes about known timezone issues
- Removed legacy configs: Removed COMET_EXPR_ALLOW_INCOMPATIBLE config to simplify configuration
Bug Fixes¶
This release includes numerous bug fixes:
- Fixed None.get in stringDecode when binary child cannot be converted
- Proper fallback for lpad/rpad with unsupported arguments
- Fixed trunc/date_trunc with unsupported format strings
- Corrected single partition handling in native_datafusion
- Fixed LeftSemi join handling - do not replace SMJ with HJ
- Fixed CometLiteral class cast exception with arrays
- Fixed missing SortOrder fallback reason in range partitioning
- Improved checkSparkMaybeThrows to compare results in success case
- Fixed null handling in CometVector implementations
Documentation Improvements¶
- Added FFI documentation to contributor guide
- Updated contributor guide for adding new expressions and operators
- Improved documentation layout and navigation
- Added prettier enforcement for consistent markdown formatting
- CI check to ensure generated docs are in sync
- Various documentation updates for SortOrder expressions, LocalTableScan and WindowExec, and Spark SQL tests
Dependency Updates¶
- Upgraded to Spark 3.5.7
- Upgraded to DataFusion 50.3.0
- Upgraded Parquet from 56.0.0 to 56.2.0
- Various other dependency updates via Dependabot
Spark Compatibility¶
- Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13
- Spark 3.5.4 through 3.5.7 with JDK 11 & 17, Scala 2.12 & 2.13
- Spark 4.0.1 with JDK 17, Scala 2.13
We are looking for help from the community to fully support Spark 4.0.1. See EPIC: Support 4.0.0 for more information.
Getting Involved¶
The Comet project welcomes new contributors. We use the same Slack and Discord channels as the main DataFusion project and have a weekly DataFusion video call.
The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the Getting Started guide for instructions on downloading and installing Comet.
There are also many good first issues waiting for contributions.
Comments
We use Giscus for comments, powered by GitHub Discussions. To respect your privacy, Giscus and comments will load only if you click "Show Comments"