Spark Operator Support#
This page is the complete reference for how Apache Comet handles each Spark physical operator. Comet replaces supported operators with native equivalents. Comet runs whole subtrees of native operators together, so if a query stage contains an operator Comet does not support, that stage falls back to regular Spark execution. Results are unaffected.
Operators marked ✅ Supported are enabled by default. Each can be turned off individually with
spark.comet.exec.OPERATOR.enabled=false (for example spark.comet.exec.sort.enabled=false), and
all native execution can be turned off with spark.comet.exec.enabled=false. See the
Comet Configuration Guide for the full list.
Status legend#
Status |
Meaning |
|---|---|
✅ Supported |
Native implementation, enabled by default; works in the common case. Some inputs or forms may fall back to Spark. |
⚠️ Supported (caveats) |
Experimental or disabled by default, or accelerates only a limited subset. See the Compatibility Guide. |
🔜 Planned |
Intended; tracked by an open issue or pull request. |
Not currently planned#
The following operator families fall back to Spark and are not on the current roadmap. They are omitted from the tables below and may be reconsidered based on demand:
Structured Streaming operators (
StateStoreSaveExec,StateStoreRestoreExec,StreamingSymmetricHashJoinExec, and similar): Comet targets batch execution.Cartesian / cross joins (
CartesianProductExec): rare and expensive, with little acceleration benefit.Sampling and range generation (
SampleExec,RangeExec): niche leaf operators.Pickled (non-Arrow) Python UDFs (
BatchEvalPythonExec): Comet accelerates Arrow-based Python UDFs only (#4234).
Scans#
Operator |
Status |
Notes |
|---|---|---|
|
✅ |
Parquet only. Some types and configurations fall back. See Parquet Scan Compatibility. |
|
✅ |
Parquet, Apache Iceberg Parquet, and CSV (native) scans. See Parquet Scan Compatibility and the Iceberg Guide. |
|
⚠️ |
Disabled by default; there is no acceleration advantage and this operator is typically only used in test code. Can be opted into via config (#4393). |
|
🔜 |
Cached / in-memory table scans fall back today. |
Projection and filtering#
Operator |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
Sorting and limiting#
Operator |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Aggregation#
Operator |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
Supports a limited set of aggregates, such as |
|
🔜 |
Falls back today; Comet currently accelerates hash aggregates. |
Joins#
Operator |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Falls back to Spark when the preserved side is broadcast (for example LEFT OUTER with BROADCAST on the left) (#4429). |
Exchanges#
Operator |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
Window#
Operator |
Status |
Notes |
|---|---|---|
|
⚠️ |
Runs natively, but only a subset of window functions is accelerated. The rest fall back. See the expression reference (#2721). |
|
🔜 |
Window-based limit pushdown falls back today. |
Generators and set operations#
Operator |
Status |
Notes |
|---|---|---|
|
✅ |
Supports |
|
✅ |
|
|
✅ |
|
|
✅ |
Writes#
Operator |
Status |
Notes |
|---|---|---|
|
⚠️ |
Experimental native Parquet writes, disabled by default (opt-in). |
Python and UDF#
Operator |
Status |
Notes |
|---|---|---|
|
🔜 |
Experimental accelerated PyArrow UDF support is in progress (#4234). |
See also#
Comet Compatibility Guide - known incompatibilities and edge cases.
Supported Spark Expressions - the equivalent reference for expressions.