Map Expressions#

MapSort (Spark 4.0+)#

Spark 4.0 inserts MapSort to normalize map values when they appear in shuffle hash partitioning keys, in try_element_at, and in other contexts where map ordering must be deterministic. Comet runs MapSort natively, so map shuffle and group-by-on-map stay on Comet under Spark 4.0.

When spark.comet.exec.strictFloatingPoint=true, MapSort falls back to Spark for maps whose keys contain Float or Double (consistent with SortOrder and SortArray). Arrow’s sort uses IEEE total ordering for floating-point, which differs from Spark’s Double.compare semantics for NaN and -0.0.

MapFromEntries#

The following incompatibilities cause MapFromEntries to fall back to Spark by default. Set spark.comet.expression.MapFromEntries.allowIncompatible=true to enable Comet acceleration despite these differences.

  • Using BinaryType as Map keys is not allowed in map_from_entries

  • Using BinaryType as Map values is not allowed in map_from_entries