Apache DataFusion Comet 0.12.0 Release

Posted on: Thu 04 December 2025 by pmc

The Apache DataFusion PMC is pleased to announce version 0.12.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately four weeks of development work and is the result of merging 105 PRs from 13 contributors. See the change log for more information.

Release Highlights¶

Experimental Native Apache Iceberg Scan Support¶

Comet has a new, experimental, native Iceberg scan. This work relies on iceberg-rust and the Parquet reader from arrow-rs that Comet already uses to great effect. Comet’s existing Iceberg integration relies on a modified Iceberg Java build to accelerate Parquet decoding. This new approach allows unmodified Iceberg Java to handle query planning (i.e., catalog access, partition pruning, etc.), then Comet serializes Iceberg FileScanTask objects directly to iceberg-rust, enabling native execution of Iceberg table scans through DataFusion.

This represents a significant step forward in Comet's support for data lakehouse architectures and expands the range of workloads that can benefit from native acceleration. Please take a look at the PR and Comet’s documentation to understand the current limitations and try it on your workloads! We are eager for feedback on this approach.

Code Architecture Improvements¶

This release includes significant refactoring to improve code maintainability and extensibility, and we will continue those efforts into 0.13.0 development:

Unified operator serialization: The CometExecRule refactor unifies CometNativeExec creation with serialization through the new CometOperatorSerde trait
Expression serde refactoring: Multiple PRs (#2738, #2741, #2791) moved expression serialization logic out of QueryPlanSerde into specialized traits
Aggregate expression improvements: Added getSupportLevel to CometAggregateExpressionSerde trait for better aggregate function handling

These architectural improvements make it easier for contributors to add new operators and expressions while reducing code complexity.

New SQL Functions¶

The following SQL functions are now supported:

concat - String concatenation
abs - Absolute value
sha1 - SHA-1 hash function
cot - Cotangent function
Hyperbolic trigonometric functions - sinh, cosh, tanh, and their inverse functions

New Operators¶

CometLocalTableScanExec - Native support for local table scans, eliminating fallback to Spark for small, in-memory datasets

Configuration and Usability Improvements¶

Simplified on-heap configuration: Simplified on-heap memory configuration for easier setup
Extended explain format: Renamed and improved COMET_EXTENDED_EXPLAIN_FORMAT with better defaults
Environment variable support: Improved framework for setting configs with environment variables
Native config passing: All Comet configs now passed to native plan
Config categorization: Categorized testing configs and added notes about known timezone issues
Removed legacy configs: Removed COMET_EXPR_ALLOW_INCOMPATIBLE config to simplify configuration

Bug Fixes¶

This release includes numerous bug fixes:

Fixed None.get in stringDecode when binary child cannot be converted
Proper fallback for lpad/rpad with unsupported arguments
Fixed trunc/date_trunc with unsupported format strings
Corrected single partition handling in native_datafusion
Fixed LeftSemi join handling - do not replace SMJ with HJ
Fixed CometLiteral class cast exception with arrays
Fixed missing SortOrder fallback reason in range partitioning
Improved checkSparkMaybeThrows to compare results in success case
Fixed null handling in CometVector implementations

Documentation Improvements¶

Added FFI documentation to contributor guide
Updated contributor guide for adding new expressions and operators
Improved documentation layout and navigation
Added prettier enforcement for consistent markdown formatting
CI check to ensure generated docs are in sync
Various documentation updates for SortOrder expressions, LocalTableScan and WindowExec, and Spark SQL tests

Dependency Updates¶

Upgraded to Spark 3.5.7
Upgraded to DataFusion 50.3.0
Upgraded Parquet from 56.0.0 to 56.2.0
Various other dependency updates via Dependabot

Spark Compatibility¶

Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13
Spark 3.5.4 through 3.5.7 with JDK 11 & 17, Scala 2.12 & 2.13
Spark 4.0.1 with JDK 17, Scala 2.13

We are looking for help from the community to fully support Spark 4.0.1. See EPIC: Support 4.0.0 for more information.

Getting Involved¶

The Comet project welcomes new contributors. We use the same Slack and Discord channels as the main DataFusion project and have a weekly DataFusion video call.

The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the Getting Started guide for instructions on downloading and installing Comet.

There are also many good first issues waiting for contributions.

Comments

Copyright 2026, The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache® and the Apache feather logo are trademarks of The Apache Software Foundation.