Spark Expression Support#
This page is the complete reference for how Apache Comet handles each Spark built-in expression. Comet accelerates expressions either with a native (Rust) implementation or by dispatching to a Spark-compatible codegen path. When an expression is not supported, Comet transparently falls back to Spark for that part of the plan; results are unaffected.
Expressions marked ✅ Supported are enabled by default and produce Spark-compatible results.
Some ✅ Supported expressions have specific incompatible cases that fall back to Spark by
default. Those cases must be opted into per expression with
spark.comet.expression.EXPRNAME.allowIncompatible=true (where EXPRNAME is the Spark
expression class name, for example Cast). There is no global opt-in.
Most expressions can also be disabled with spark.comet.expression.EXPRNAME.enabled=false, where
EXPRNAME is the Spark expression class name (for example Length or StartsWith). See the
Comet Configuration Guide for the full list.
Status legend#
Status |
Meaning |
|---|---|
✅ Supported |
Comet produces Spark-compatible results by default. Some inputs or forms may fall back to Spark, and any incompatible behavior is opt-in (off by default). |
🔜 Planned |
Intended; tracked by an open issue or pull request. |
Not currently planned#
Comet focuses acceleration on mainstream relational, string, datetime, math, and collection expressions. The following function families are not currently planned for native acceleration (they are not on the 1.0 roadmap): specialized functionality with narrow real-world analytics use and high implementation cost. They fall back to Spark and may be reconsidered based on demand:
Probabilistic sketches and approximate top-k (
kll_sketch_*,hll_*,theta_*,count_min_sketch,bitmap_*,approx_top_k*): specialized data structures with exact-correctness traps.Geospatial (
st_*): brand-new Spark 4.1 functionality, specialized.Avro / Protobuf codecs (
from_avro,to_avro,from_protobuf,to_protobuf,schema_of_avro): format conversion belongs at the IO layer, not expression evaluation.JVM reflection (
java_method,reflect): niche, and they invoke arbitrary JVM methods (a security concern).UTF-8 validation (
is_valid_utf8,make_valid_utf8,validate_utf8,try_validate_utf8): niche Spark 4.x string-validation helpers.Miscellaneous niche (
histogram_numeric,version,sentences,quote): low-value or specialized functions with little benefit from native acceleration.
The file-metadata functions input_file_name, input_file_block_start, and input_file_block_length depend on scan-internal per-row file information rather than the expression layer; their support status is covered in the scan compatibility guide.
Note that approx_count_distinct, median, and mode are planned: they are mainstream (median and mode are exact aggregates). approx_percentile / percentile_approx are not currently planned because their approximate results cannot be made bit-identical to Spark.
The tables below list every Spark built-in expression with its current status.
agg_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
🔜 |
tracking #4098 |
|
🔜 |
Array aggregate (related to |
|
✅ |
Interval types fall back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
Grouping indicator for ROLLUP/CUBE/GROUPING SETS |
|
🔜 |
Grouping indicator for ROLLUP/CUBE/GROUPING SETS |
|
🔜 |
tracking #4098 |
|
✅ |
|
|
✅ |
|
|
🔜 |
String aggregation |
|
✅ |
|
|
🔜 |
|
|
✅ |
|
|
🔜 |
tracking #4098 |
|
✅ |
|
|
🔜 |
|
|
🔜 |
|
|
🔜 |
|
|
🔜 |
Percentile aggregate |
|
🔜 |
Percentile aggregate |
|
✅ |
Native: Spark rewrites to |
|
✅ |
Native: Spark rewrites to |
|
✅ |
Native: Spark rewrites to |
|
🔜 |
Falls back; can reuse |
|
🔜 |
Falls back; can reuse the |
|
🔜 |
Falls back; can reuse |
|
🔜 |
Falls back; can reuse |
|
🔜 |
Falls back; can reuse |
|
🔜 |
Falls back; can reuse |
|
🔜 |
tracking #4098 |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
String aggregation (alias of |
|
✅ |
|
|
🔜 |
tracking #4098 |
|
🔜 |
tracking #4098 |
|
✅ |
|
|
✅ |
|
|
✅ |
array_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
NaN/signed-zero handling may differ (details) |
|
✅ |
NaN/signed-zero handling may differ (details) |
|
✅ |
Incompatible; falls back by default (details) |
|
✅ |
|
|
✅ |
Incompatible; falls back by default (details) |
|
✅ |
Incompatible; falls back by default (details) |
|
✅ |
NaN ordering may differ (details) |
|
✅ |
NaN ordering may differ (details) |
|
✅ |
Binary/struct/map/null elements fall back |
|
🔜 |
Sibling of |
|
✅ |
|
|
✅ |
|
|
✅ |
NaN/signed-zero handling may differ (details) |
|
✅ |
|
|
✅ |
|
|
✅ |
MapType input falls back |
|
✅ |
Binary/struct/map elements fall back |
|
✅ |
|
|
✅ |
|
|
🔜 |
Random array shuffle |
|
✅ |
Native (#4149) |
|
✅ |
Nested struct/null arrays fall back |
bitwise_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Operator alias for |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
collection_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
MapType input falls back |
|
✅ |
Binary/array children fall back |
|
✅ |
Binary-element arrays fall back (Incompatible) (details) |
|
✅ |
MapType input falls back |
conditional_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Lowers to |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Lowers to |
conversion_funcs#
The type-name conversion functions (bigint, binary, boolean, date, decimal, double, float, int, smallint, string, timestamp, tinyint) are SQL aliases for CAST(... AS <type>) and share the support and caveats of cast.
Function |
Status |
Notes |
|---|---|---|
|
✅ |
Some casts fall back; float-to-decimal is opt-in (details) |
csv_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
datetime_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
Constant-folded to a literal (alias of |
|
✅ |
Constant-folded to a literal before Comet sees the plan |
|
🔜 |
Blocked on Spark 4.1 TIME type support (#4288) |
|
✅ |
Constant-folded to a literal before Comet sees the plan |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Legacy zone forms fall back (Incompatible) (details) |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
|
|
🔜 |
Produces legacy CalendarInterval; tracked by #4540 |
|
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
|
✅ |
|
|
✅ |
2-arg TIME form falls back |
|
✅ |
2-arg TIME form falls back |
|
🔜 |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Constant-folded to a literal (alias of |
|
✅ |
|
|
✅ |
|
|
🔜 |
Time-window grouping; tracked by #4553 |
|
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
|
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Rewrites to |
|
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
|
✅ |
Rewrites to |
|
✅ |
Rewrites to |
|
✅ |
Rewrites to |
|
✅ |
|
|
✅ |
Legacy zone forms fall back (Incompatible) (details) |
|
✅ |
|
|
🔜 |
Produces legacy CalendarInterval; tracked by #4540 |
|
✅ |
|
|
🔜 |
Rewrites to |
|
🔜 |
Spark 4.1 TIME type; tracked by #4288 |
|
🔜 |
Rewrites to |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
Time-window grouping; tracked by #4553 |
|
🔜 |
Time-window grouping; tracked by #4553 |
|
✅ |
generator_funcs#
explode and posexplode are supported via CometExplodeExec (operator-level, not
expression-level). The outer variants are wired but marked Incompatible; they require
spark.comet.exec.explode.enabled=true and allowIncompatible.
Function |
Status |
Notes |
|---|---|---|
|
✅ |
via |
|
✅ |
outer=true falls back (Incompatible) (audit) |
|
🔜 |
Operator-level generator (like |
|
🔜 |
Operator-level generator (like |
|
✅ |
via |
|
✅ |
outer=true falls back (Incompatible) (audit) |
|
🔜 |
Operator-level generator |
hash_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
json_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
Falls back by default; opt-in via allowIncompatible (audit) |
|
✅ |
Some inputs need allowIncompatible (audit) |
|
✅ |
Single-quoted/trailing JSON needs allowIncompatible (audit) |
|
✅ |
|
|
🔜 |
|
|
✅ |
|
|
✅ |
Options and map/array inputs fall back (audit) |
lambda_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
General lambda not yet wired; the |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
map_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
MapType input falls back |
|
🔜 |
Constructs a map |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
BinaryType key/value falls back (Incompatible) (details) |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Lowers to |
math_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
Interval multiplication falls back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Interval types fall back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Two-arg form falls back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Folds to a literal (like |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Two-arg form falls back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Alias for |
|
🔜 |
Random string (Spark 4.0+) |
|
✅ |
|
|
✅ |
Float/double inputs fall back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Datetime/interval form falls back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Constant-folded; literal arguments only (Spark 4.0+) |
|
✅ |
misc_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
Routed through the JVM codegen dispatcher |
|
✅ |
Routed through the JVM codegen dispatcher; nondeterministic IV by default |
|
🔜 |
Lowers to |
|
✅ |
Resolved to a literal by the analyzer ( |
|
✅ |
Resolved to a literal by the analyzer ( |
|
✅ |
Alias of |
|
✅ |
Resolved to a literal by the analyzer; same as |
|
✅ |
Lowers to |
|
🔜 |
tracking #4098 |
|
✅ |
|
|
🔜 |
tracking #4098 |
|
🔜 |
Raises a runtime error |
|
✅ |
Seed must be a literal |
|
✅ |
Seed must be a literal |
|
🔜 |
tracking #4098 |
|
🔜 |
tracking #4098 |
|
✅ |
Alias of |
|
✅ |
|
|
🔜 |
tracking #4098 |
|
✅ |
Routed through the JVM codegen dispatcher |
|
🔜 |
tracking #4098 |
|
🔜 |
tracking #4098 |
|
✅ |
Foldable; resolved to a literal before Comet sees the plan |
|
✅ |
Resolved to a literal by the Spark analyzer before reaching Comet |
|
🔜 |
Nondeterministic random UUID |
|
🔜 |
tracking #4098 |
predicate_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Falls back by default; opt-in via allowIncompatible (details) |
|
✅ |
Falls back by default; opt-in via allowIncompatible (details) |
|
✅ |
Falls back by default; opt-in via allowIncompatible (details) |
string_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
🔜 |
Lowers to |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
Spark collation (umbrella #2190) |
|
✅ |
Constant-folded to a literal (Spark 4.0+) |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
Lowers to |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Native via |
|
🔜 |
Data masking |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
tracking #4098 |
|
🔜 |
tracking #4098 |
|
🔜 |
tracking #4098 |
|
🔜 |
tracking #4098 |
|
✅ |
|
|
🔜 |
tracking #4098 |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
Lowers to |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Hex form accelerated; other formats fall back |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
🔜 |
Lowers to |
|
🔜 |
TRY variant of |
|
✅ |
|
|
✅ |
|
|
✅ |
struct_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
Duplicate field names fall back |
|
✅ |
url_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
window_funcs#
Window functions run via CometWindowExec. Window support is disabled by default due to known
correctness issues (tracking #2721).
When enabled, lag and lead are explicitly wired; aggregate window functions (count, min,
max, sum) are also supported. Ranking functions (rank, dense_rank, row_number,
ntile, percent_rank, cume_dist, nth_value) are not yet wired in the window serde and
fall back to Spark.
Function |
Status |
Notes |
|---|---|---|
|
🔜 |
Window function; tracked by #2721 |
|
🔜 |
Window function; tracked by #2721 |
|
✅ |
via |
|
✅ |
via |
|
🔜 |
Window function; tracked by #2721 |
|
🔜 |
Window function; tracked by #2721 |
|
🔜 |
Window function; tracked by #2721 |
|
🔜 |
Window function; tracked by #2721 |
|
🔜 |
Window function; tracked by #2721 |
xml_funcs#
Function |
Status |
Notes |
|---|---|---|
|
✅ |
Spark 4.0+ |
|
✅ |
Spark 4.0+ |
|
✅ |
Spark 4.0+ |
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
|
|
✅ |
Alias of |
|
✅ |
|
|
✅ |
Beyond SQL functions#
Comet also accelerates a number of Catalyst expressions that have no Spark SQL function name and therefore do not appear in the tables above. These arise from the DataFrame API, from SQL syntax other than function calls, or from the query optimizer. They include:
Operator and optimizer-injected expressions: runtime bloom-filter join probes (
BloomFilterMightContain,BloomFilterAggregate), optimizedINsets (InSet), scalar subqueries (ScalarSubquery), and floating-point normalization (KnownFloatingPointNormalized).Accessor expressions (subscript and field access, not functions): struct field access (
col.field), array element access (arr[i]), and map value access (map[key]).Internal decimal arithmetic:
CheckOverflow,MakeDecimal, andUnscaledValue, which the analyzer inserts around decimal operations.User-defined functions: Scala UDFs registered through the DataFrame or SQL API.
Structural expressions: aliases, attribute references, literals, sort orders, and
CASE WHEN.
This list is illustrative, not exhaustive: the per-function tables are not the complete set of expressions Comet can accelerate.
See also#
Comet Compatibility Guide - known incompatibilities and edge cases for supported expressions.
Expression Audits (contributor guide) - per-version (Spark 3.4 / 3.5 / 4.0 / 4.1) audit notes for audited expressions.