Apache DataFusion Comet 0.14.0 Release

Posted on: Wed 18 March 2026 by pmc

The Apache DataFusion PMC is pleased to announce version 0.14.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately eight weeks of development work and is the result of merging 189 PRs from 21 contributors. See the change log for more information.

Key Features

Native Iceberg Improvements

Comet's fully-native Iceberg integration received several enhancements:

Per-Partition Plan Serialization: CometExecRDD now supports per-partition plan data, reducing serialization overhead for native Iceberg scans and enabling dynamic partition pruning (DPP).

Vended Credentials: Native Iceberg scans now support passing vended credentials from the catalog, improving integration with cloud storage services.

Upstream Reader Performance Improvements: The Comet team contributed a number of reader performance improvements to iceberg-rust 0.9.0, which Comet now uses. These improvements benefit all iceberg-rust users.

Performance Optimizations:

  • Single-pass FileScanTask validation for reduced planning overhead
  • Configurable data file concurrency via spark.comet.scan.icebergNative.dataFileConcurrencyLimit
  • Channel-based executor thread parking instead of yield_now() for reduced CPU overhead
  • Reuse of CometConf and native utility instances in batch decoding

Native Columnar-to-Row Conversion

Comet now uses a native columnar-to-row (C2R) conversion by default. This feature replaces Comet's JVM-based columnar-to-row transition with a native Rust implementation, reducing JVM memory overhead when data flows from Comet's native execution back to Spark operators that require row-based input.

New Expressions

This release adds support for the following expressions:

  • Date/time functions: make_date, next_day
  • String functions: right, string_split, luhn_check
  • Math functions: crc32
  • Map functions: map_contains_key, map_from_entries
  • Conversion functions: to_csv
  • Cast support: date to timestamp, numeric to timestamp, integer to binary, boolean to decimal, date to numeric

ANSI Mode Error Messages

ANSI SQL mode now produces proper error messages matching Spark's expected output, improving compatibility for workloads that rely on strict SQL error handling.

DataFusion Configuration Passthrough

DataFusion session-level configurations can now be set directly from Spark using the spark.comet.datafusion.* prefix. This enables tuning DataFusion internals such as batch sizes and memory limits without modifying Comet code.

Performance Improvements

This release includes extensive performance optimizations:

  • Sum aggregation: Specialized implementations for each eval mode eliminate per-row mode checks
  • Contains expression: SIMD-based scalar pattern search for faster string matching
  • Batch coalescing: Reduced IPC schema overhead in BufBatchWriter by coalescing small batches
  • Tokio runtime: Worker threads now initialize from spark.executor.cores for better resource utilization
  • Decimal expressions: Optimized decimal arithmetic operations
  • Row-to-columnar transition: Improved performance for JVM shuffle data conversion
  • Aligned pointer reads: Optimized SparkUnsafeRow field accessors using aligned memory reads

Deprecations and Removals

The deprecated native_comet scan mode has been removed. Use native_datafusion instead. Note that the native_iceberg_compat scan is now deprecated and will be removed from a future release.

Compatibility

This release upgrades to DataFusion 52.3, Arrow 57.3, and iceberg-rust 0.9.0. Published binaries now target x86-64-v3 and neoverse-n1 CPU architectures for improved performance on modern hardware.

Supported platforms include Spark 3.4.3, 3.5.4-3.5.8, and Spark 4.0.x with various JDK and Scala combinations.

The community encourages users to test Comet with existing Spark workloads and welcomes contributions to ongoing development.


Comments