Apache DataFusion Comet 0.11.0 Release

Posted on: Tue 21 October 2025 by pmc

The Apache DataFusion PMC is pleased to announce version 0.11.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately five weeks of development work and is the result of merging 131 PRs from 15 contributors. See the change log for more information.

Release Highlights

Parquet Modular Encryption Support

Spark supports Parquet Modular Encryption to independently encrypt column values and metadata. Furthermore, Spark supports custom encryption factories for users to provide their own key-management service (KMS) implementations. Thanks to a number of contributions in upstream DataFusion and arrow-rs, Comet now supports Parquet Modular Encryption with Spark KMS for native readers, enabling secure reading of encrypted Parquet files in production environments.

Improved Memory Management

Comet 0.11.0 introduces significant improvements to memory management, making it easier to deploy and more resilient to out-of-memory conditions:

These changes make Comet significantly easier to configure and deploy in production environments.

Improved Apache Spark 4.0 Support

Comet has improved its support for Apache Spark 4.0.1 with several important enhancements:

Spark 4.0 compatible jar files are now available on Maven Central. See the installation guide for instructions on using published jar files.

Complex Types for Columnar Shuffle

ashdnazg submitted a fantastic refactoring PR that simplified the logic for writing rows in Comet’s JVM-based, columnar shuffle. A benefit of this refactoring is better support for complex types (e.g., structs, lists, and arrays) in columnar shuffle. Comet no longer falls back to Spark to shuffle these types, enabling native acceleration for queries involving nested data structures. This enhancement significantly expands the range of queries that can benefit from Comet's columnar shuffle implementation.

RangePartitioning for Native Shuffle

Comet's native shuffle now supports RangePartitioning, providing better performance for operations that require range-based data distribution. Comet now matches Spark behavior for computing and distributing range boundaries, and serializes them to native execution for faster shuffle operations.

New Functionality

The following SQL functions are now supported:

New expression capabilities include:

Performance Improvements

Comet 0.11.0 TPC-H Performance

Comet 0.11.0 continues to deliver significant performance improvements over Spark. In our TPC-H benchmarks, Comet reduced overall query runtime from 687 seconds to 302 seconds when processing 100 GB of Parquet data using a single 8-core executor, achieving a 2.2x speedup.

TPC-H Overall Performance

The performance gains are consistent across individual queries, with most queries showing substantial improvements:

TPC-H Query-by-Query Comparison

You can reproduce these benchmarks using our Comet Benchmarking Guide. We encourage you to run your own performance tests with your workloads.

Apache Iceberg Support

UX Improvements

Bug Fixes

Documentation Updates

Spark Compatibility

  • Spark 3.4.3 with JDK 11 & 17, Scala 2.12 & 2.13
  • Spark 3.5.4 through 3.5.6 with JDK 11 & 17, Scala 2.12 & 2.13
  • Spark 4.0.1 with JDK 17, Scala 2.13

We are looking for help from the community to fully support Spark 4.0.1. See EPIC: Support 4.0.0 for more information.

Getting Involved

The Comet project welcomes new contributors. We use the same Slack and Discord channels as the main DataFusion project and have a weekly DataFusion video call.

The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or performance regressions that you find. See the Getting Started guide for instructions on downloading and installing Comet.

There are also many good first issues waiting for contributions.


Comments