conversion_funcs Expression Audits#
Audit notes for expressions in this category that have been audited. Absence of an entry means the expression has not been audited yet, not that it is unsupported. See the user guide Spark Expression Support for current support status.
cast#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8 modulo
Cast.canUpCastrefactored to delegate toUpCastRule.canUpCast.Spark 3.5.8 (audited 2026-05-27): baseline.
Cast(child, dataType, timeZoneId, evalMode); eval modes areLEGACY,ANSI,TRY. The legacyCast.canCastmatrix and theCast.canAnsiCastmatrix decide acceptance per type pair. Comet routes viaCometCast(spark/src/main/scala/org/apache/comet/expressions/CometCast.scala) using a per-source-type support matrix that returnsCompatible,Incompatible(reason), orUnsupported(reason); literal children are short-circuited toCompatible()soCometLiteralvalidates them. The serializedCastproto carriesdatatype,evalMode,timezone(defaultUTC),allowIncompat(fromspark.comet.expression.Cast.allowIncompatible), andisSpark4Plus. The native side (native/spark-expr/src/conversion_funcs/cast.rs) implements explicit per-eval-mode branches for narrowing numeric casts that match Spark’s overflow exceptions, and falls through to DataFusioncast_with_options(safe = !ANSI)for the rest.Spark 4.0.1 (audited 2026-05-27):
VariantTypeadded;StringTypeliterals replaced with_: StringTypeto accommodate collated strings.(TimestampType, ByteType|ShortType|IntegerType)added tocanAnsiCast.NullIntolerant->nullIntolerant: Booleanrefactor. NewToPrettyString.BinaryFormattersemantics forBinary -> Stringare replicated natively viaspark_binary_formatter. Numeric-to-numeric matrix unchanged.Spark 4.1.1 (audited 2026-05-27):
TimeTypeadded; manyTimeTypearms incanCast/canAnsiCast. GeospatialGeographyType/GeometryTypetypes added with their own conversion rules. Numeric-to-numeric matrix unchanged.Known divergences and gaps:
CAST(<binary> AS STRING)usesunsafe { String::from_utf8_unchecked }in native code, which is undefined behaviour for non-UTF8 inputs (https://github.com/apache/datafusion-comet/issues/4488).Spark 4.0 collated
StringTypeis not explicitly guarded; pattern equality is expected to keep collated-string casts falling back, but there is no test (https://github.com/apache/datafusion-comet/issues/4489; umbrella #2190).Spark 4.1
TimeTypecasts have no explicitUnsupportedarm; they fall back implicitly but do not appear in the auto-generated compatibility doc (https://github.com/apache/datafusion-comet/issues/4490).CAST(<map> AS <map>)falls back to Spark even though nativecast_map_to_mapexists (https://github.com/apache/datafusion-comet/issues/4491).spark.sql.legacy.castComplexTypesToString.enabled=trueis not honoured by Comet (https://github.com/apache/datafusion-comet/issues/4492).CAST(<float|double> AS DECIMAL)rounding may differ from Spark (Incompatible, gated byspark.comet.expression.Cast.allowIncompatible, tracked at https://github.com/apache/datafusion-comet/issues/1371).
Spark registers the type-name conversion functions (
bigint,binary,boolean,date,decimal,double,float,int,smallint,string,timestamp,tinyint) as cast aliases. Each lowers to the sameCastnode, so Comet handles it via thecastimplementation with the same compatibility profile.