Aggregate Expressions#

Average#

The following cases are not supported by Comet:

  • YearMonthIntervalType and DayTimeIntervalType inputs are not supported

CollectSet#

The following incompatibilities cause CollectSet to fall back to Spark by default. Set spark.comet.expression.CollectSet.allowIncompatible=true to enable Comet acceleration despite these differences.

  • Comet deduplicates NaN values (treats NaN == NaN) while Spark treats each NaN as a distinct value. When spark.comet.exec.strictFloatingPoint=true, collect_set on floating-point types falls back to Spark unless spark.comet.expression.CollectSet.allowIncompatible=true is set.

First#

The following differences from Spark are always present and do not require any additional configuration:

  • This function is not deterministic. Results may not match Spark.

Last#

The following differences from Spark are always present and do not require any additional configuration:

  • This function is not deterministic. Results may not match Spark.

Percentile#

The following incompatibilities cause Percentile to fall back to Spark by default. Set spark.comet.expression.Percentile.allowIncompatible=true to enable Comet acceleration despite these differences.

  • Interpolated values may differ from Spark by up to (upper - lower) * 1e-6 because DataFusion quantizes the interpolation weight to 6 decimal places (#4719).

The following cases are not supported by Comet:

  • An array of percentages is not supported.

  • The percentage argument must be a literal.

  • A frequency argument is not supported.

  • Descending order in WITHIN GROUP (ORDER BY ... DESC) is not supported.

  • Only numeric input types are supported.