Map Expressions#

MapSort (Spark 4.0+)#

Spark 4.0 inserts MapSort to normalize map values when they appear in shuffle hash partitioning keys, in try_element_at, and in other contexts where map ordering must be deterministic. Comet runs MapSort natively, so map shuffle and group-by-on-map stay on Comet under Spark 4.0.

When spark.comet.exec.strictFloatingPoint=true, MapSort falls back to Spark for maps whose keys contain Float or Double (consistent with SortOrder and SortArray). Arrow’s sort uses IEEE total ordering for floating-point, which differs from Spark’s Double.compare semantics for NaN and -0.0.

MapFromEntries#

By default, Comet accelerates MapFromEntries using JVM codegen dispatch, which runs Spark’s generated code inside Comet’s native pipeline and matches Spark exactly. Set spark.comet.expression.MapFromEntries.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • BinaryType is not supported as a map key in map_from_entries

  • BinaryType is not supported as a map value in map_from_entries