Comet Roadmap#
Comet is an open-source project and contributors are welcome to work on any issues at any time, but we find it helpful to have a roadmap for some of the major items that require coordination between contributors.
Major Initiatives#
Iceberg Integration#
Iceberg integration is still a work-in-progress (#2060), with major improvements expected in the next few
releases. The default auto scan mode now uses native_iceberg_compat instead of native_comet, enabling
support for complex types.
Spark 4.0 Support#
Comet has experimental support for Spark 4.0, but there is more work to do (#1637), such as enabling more Spark SQL tests and fully implementing ANSI support (#313) for all supported expressions.
Removing the native_comet scan implementation#
The native_comet scan implementation is now deprecated and will be removed in a future release (#2186, #2177).
This is the original scan implementation that uses mutable buffers (which is incompatible with best practices around
Arrow FFI) and does not support complex types.
Now that the default auto scan mode uses native_iceberg_compat (which is based on DataFusion’s DataSourceExec),
we can proceed with removing the native_comet scan implementation, and then improve the efficiency of our use of
Arrow FFI (#2171).
Ongoing Improvements#
In addition to the major initiatives above, we have the following ongoing areas of work:
Adding support for more Spark expressions
Moving more expressions to the
datafusion-sparkcrate in the core DataFusion repositoryPerformance tuning
Nested type support improvements