array_funcs Expression Audits#
Audit notes for expressions in this category that have been audited. Absence of an entry means the expression has not been audited yet, not that it is unsupported. See the user guide Spark Expression Support for current support status.
array#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
CreateArray(children, useStringTypeWhenEmpty); element type is the common type of children. Comet routes viaCometCreateArray(nativemake_array) and special-cases the empty-array case to dodge a known DataFusioncoerce_typesissue (#3338).Spark 4.0.1 (audited 2026-05-27): semantics unchanged.
Spark 4.1.1 (audited 2026-05-27): adds
contextIndependentFoldableoverride; runtime semantics unchanged.
array_append#
Spark 3.4.3 (audited 2026-05-27): standalone
BinaryExpression, evaluated directly. Comet routes viaCometArrayAppend.Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
Spark 4.0.1 (audited 2026-05-27): now
RuntimeReplaceableand rewritten toArrayInsert(arr, Literal(-1), elem).CometArrayAppendis therefore unreachable; dispatch goes throughCometArrayInsert(which carries its ownIncompatiblenotes documented at thearray_insertentry).Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
array_compact#
Spark 3.4.3 (audited 2026-05-27):
RuntimeReplaceable->ArrayFilter(arr, IsNotNull(lambda)). Comet receives the rewritten form, dispatches throughCometArrayFilter, which delegates back toCometArrayCompact.convertfor the actual proto emission. The native path uses Comet’sspark_array_compactUDF rather than DataFusion’sarray_remove_allbecause DataFusion 53 changedarray_remove_all’s NULL semantics.Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
Spark 4.0.1 (audited 2026-05-27): the replacement is wrapped in
KnownNotContainsNull(...)(analysis-only hint, no semantic change).Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
array_contains#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayContains(left, right) extends BinaryExpression with NullIntolerant with Predicate;inputTypesusesfindWiderTypeWithoutStringPromotionForTwo. Wired asCometScalarFunction("array_contains").Spark 4.0.1 (audited 2026-05-27):
NullIntoleranttrait replaced bynullIntolerant: Boolean;checkInputDataTypesadoptsDataTypeUtils.sameType(collation-aware in 4.x).Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
Known limitation: no NaN-canonicalization guard in
getSupportLevel. ForFloat/Doublearrays containing NaN, Spark’sSQLOrderingUtilmay produce different results than DataFusion’s IEEE comparison (https://github.com/apache/datafusion-comet/issues/4481).
array_distinct#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayDistinct(child)overArraySetLike; usesSQLOpenHashSetso NaN and+0.0/-0.0are canonicalized. Wired asCometScalarFunction("array_distinct").Spark 4.0.1 (audited 2026-05-27):
NullIntolerant->nullIntolerantfield refactor.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
Known divergence: DataFusion
array_distinctuses hash-based equality without NaN/signed-zero canonicalization, so float/double arrays may produce different results (https://github.com/apache/datafusion-comet/issues/4481).
array_except#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayExcept(left, right) extends ArrayBinaryLike with ComplexTypeMergingExpression; result preserves left-side first occurrences not present in right. Comet routes viaCometArrayExceptand unconditionally flagsIncompatible(“Null handling and ordering may differ from Spark”); also falls back forBinaryType/StructTypeelement types.Spark 4.0.1 (audited 2026-05-27):
nullIntolerant = truemoves intoArrayBinaryLike; the overflow path usesarrayFunctionWithElementsExceedLimitError.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
Known divergence: same NaN/signed-zero canonicalization gap as
array_distinctfor float/double arrays (https://github.com/apache/datafusion-comet/issues/4481).
array_insert#
Spark 3.4.3 audited 2026-04-02
Spark 3.5.8 audited 2026-04-02
Spark 4.0.1 audited 2026-04-02 (pos=0 error message differs from Spark)
Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
array_intersect#
Spark 3.4.3 audited 2026-04-24 (result element order may differ from Spark when the right array is longer than the left; DataFusion probes the longer side)
Spark 3.5.8 audited 2026-04-24 (same ordering incompatibility as 3.4.3)
Spark 4.0.1 audited 2026-04-24 (ordering incompatibility as above; collated strings now fall back to Spark)
Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
array_join#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayJoin(array, delimiter, nullReplacement). Comet routes viaCometArrayJointo DataFusion’sarray_to_stringand is unconditionally flaggedIncompatible(“Null handling may differ from Spark”, #3178).Spark 4.0.1 (audited 2026-05-27):
inputTypeswidened toAbstractArrayType(StringTypeWithCollation(supportsTrimCollation = true)); non-binary collations not propagated (https://github.com/apache/datafusion-comet/issues/2190).Spark 4.1.1 (audited 2026-05-27): adds
contextIndependentFoldableoverride; runtime unchanged.
array_max#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayMax(child) extends UnaryExpression with ImplicitCastInputTypes; skips NULL elements; for float/double Spark’sSQLOrderingUtiltreats NaN as greater than any non-NaN. Wired asCometScalarFunction("array_max").Spark 4.0.1 (audited 2026-05-27):
NullIntolerant->nullIntolerantfield refactor.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
Known divergence: DataFusion’s
array_maxuses Arrowpartial_cmp-based ordering, so float/double arrays containing NaN may produce different results (https://github.com/apache/datafusion-comet/issues/4482).
array_min#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): mirror of
ArrayMaxwithevalInternalreturning the minimum. Same NULL-skip and NaN-ordering semantics. Wired asCometScalarFunction("array_min").Spark 4.0.1 (audited 2026-05-27): same trait refactor as
array_max.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
Known divergence: same NaN-handling gap as
array_max(https://github.com/apache/datafusion-comet/issues/4482).
array_position#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayPosition(left, right); returns 1-basedLongTypeposition, 0 if not found, NULL if either input is NULL.CometArrayPositionfalls back for all-foldable args (constant folding handles those) and for unsupported element types (binary/struct/map/null).Spark 4.0.1 (audited 2026-05-27):
NullIntolerant->nullIntolerantfield refactor.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
array_remove#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayRemove(left, right); removes all occurrences equal toright. Wired asCometScalarFunction("array_remove"). Falls back viaArraysBase.isTypeSupportedfor binary/struct/map/null child types.Spark 4.0.1 (audited 2026-05-27):
NullIntolerant->nullIntolerantfield refactor.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
array_repeat#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayRepeat(left, right) extends BinaryExpression with ExpectsInputTypes;inputTypes = Seq(AnyDataType, IntegerType). NULL count yields NULL; count <= 0 yields empty array; count >MAX_ROUNDED_ARRAY_LENGTHthrows at runtime. Comet wraps the call inCaseWhen(IsNotNull(right), array_repeat(...), null).Spark 4.0.1 (audited 2026-05-27): error message uses
createArrayWithElementsExceedLimitError(prettyName, count); semantics unchanged.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
array_union#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArrayUnion(left, right) extends ArrayBinaryLike with ComplexTypeMergingExpression; result is left-side distinct elements followed by new right-side elements. Wired asCometScalarFunction("array_union").Spark 4.0.1 (audited 2026-05-27):
nullIntolerant = truemoves intoArrayBinaryLike; overflow path usesarrayFunctionWithElementsExceedLimitError.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
Known divergence: same NaN/signed-zero canonicalization gap as
array_distinct(https://github.com/apache/datafusion-comet/issues/4481). Result ordering versus DataFusion is also unverified; compare thearray_intersectordering caveat.
arrays_overlap#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArraysOverlap(left, right); three-valued logic (TRUE if any common non-null element, NULL if a null is present and no overlap is found in non-nulls, FALSE otherwise). Comet routes viaCometArraysOverlapto the nativespark_arrays_overlapUDF, which implements the same three-valued logic.Spark 4.0.1 (audited 2026-05-27):
NullIntolerant->nullIntolerantfield refactor.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
arrays_zip#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ArraysZip(children, names); returns an array of structs, padding shorter inputs with NULL. Comet routes viaCometArraysZipand rejects unsupported child element types (anything outside primitives, decimals, dates/timestamps, strings, binary, and nested arrays/structs of those).Spark 4.0.1 (audited 2026-05-27): the length-mismatch error switches from
IllegalArgumentExceptiontoSparkIllegalArgumentException("_LEGACY_ERROR_TEMP_3235"); runtime unchanged.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
element_at#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
ElementAt(left, right, defaultValueOutOfBound, failOnError); group labelmap_funcs. Comet supports onlyArrayTypeinput;MapTypeinput falls back.Spark 4.0.1 (audited 2026-05-27):
NullIntolerant->nullIntolerantfield refactor; group label changes tocollection_funcs; ANSI default flips totrueso out-of-bound throws by default. Comet wiresfailOnErrorthrough to nativeListExtract.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
flatten#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
Flatten(child) extends UnaryExpression; returns NULL if any inner sub-array is NULL. Comet routes viaCometFlattenand falls back for child types containingBinaryType/StructType/MapType(limitation ofArraysBase.isTypeSupported).Spark 4.0.1 (audited 2026-05-27):
NullIntolerant->nullIntolerantfield refactor.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
get#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
GetArrayItem(child, ordinal, failOnError);inputTypes = Seq(AnyDataType, IntegralType). Comet routes viaCometGetArrayItem, wiringfailOnErrorthrough to the proto.Spark 4.0.1 (audited 2026-05-27): semantics unchanged; ANSI default flips to
true.Spark 4.1.1 (audited 2026-05-27):
inputTypestightened toSeq(ArrayType, IntegralType)(analysis-time only); runtime unchanged.
sort_array#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
SortArray(base, ascendingOrder) extends BinaryExpression with ArraySortLike; the second arg must be aLiteral(_: Boolean, BooleanType). CometCometSortArrayflagsIncompatibleunder strict floating-point and falls back for nested arrays whose innermost element isStructorNull.Spark 4.0.1 (audited 2026-05-27): trait set changes substantively:
ArraySortLikeandNullIntolerantare removed,nullIntolerant = truebecomes an override, andascendingOrderis widened to accept any foldable boolean (not justLiteral). Comet’sCometSortArraystill requires aLiteral, so the new foldable form falls back at convert time.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.