Date/Time Expressions#

  • Hour, Minute, Second: Incorrectly apply timezone conversion to TimestampNTZ inputs. TimestampNTZ stores local time without timezone, so no conversion should be applied. These expressions work correctly with Timestamp inputs. #3180

  • TruncTimestamp (date_trunc): In non-UTC sessions the native path is marked Incompatible and routes through the JVM codegen dispatcher by default, producing Spark-identical results. The native path is itself correct for dates within chrono-tz’s DST horizon (approximately year 2100; see “Date and Time Functions” below) and can be enabled by setting spark.comet.expression.TruncTimestamp.allowIncompatible=true. TimestampNTZ inputs are handled correctly regardless of session timezone (timezone-independent truncation). #2649

Date and Time Functions#

Comet’s native implementation of date and time functions may produce different results than Spark for dates far in the future (approximately beyond year 2100). This is because Comet uses the chrono-tz library for timezone calculations, which has limited support for Daylight Saving Time (DST) rules beyond the IANA time zone database’s explicit transitions.

For dates within a reasonable range (approximately 1970-2100), Comet’s date and time functions are compatible with Spark. For dates beyond this range, functions that involve timezone-aware calculations (such as date_trunc with timezone-aware timestamps) may produce results with incorrect DST offsets.

If you need to process dates far in the future with accurate timezone handling, consider:

  • Using timezone-naive types (timestamp_ntz) when timezone conversion is not required

  • Falling back to Spark for these specific operations

ConvertTimezone#

By default, Comet runs a Spark-compatible implementation of ConvertTimezone. Set spark.comet.expression.ConvertTimezone.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • Comet’s native timezone parser only accepts IANA zone IDs (e.g. America/Los_Angeles) and fixed offsets in +HH:MM form. Spark also accepts forms such as GMT+1, UTC+1, or three-letter abbreviations like PST; queries using those forms will throw a native parse error at execution time. See https://github.com/apache/datafusion-comet/issues/2013.

  • convert_timezone does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

DateFormatClass#

The following differences from Spark are always present and do not require any additional configuration:

  • Format strings in a curated allow-list run natively via DataFusion’s to_char for UTC sessions. Other format strings (including non-literal formats), as well as non-UTC sessions, route through Spark’s own DateFormatClass.doGenCode via the Arrow-direct codegen dispatcher when spark.comet.exec.scalaUDF.codegen.enabled=true (the default). When the codegen dispatcher is disabled the operator falls back to Spark in those cases.

By default, Comet runs a Spark-compatible implementation of DateFormatClass. Set spark.comet.expression.DateFormatClass.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • Non-UTC timezones may produce different results than Spark

  • date_format does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

Days#

The following cases are not supported by Comet:

  • Only DateType and TimestampType inputs are supported. TimestampNTZType is not supported.

FromUTCTimestamp#

By default, Comet runs a Spark-compatible implementation of FromUTCTimestamp. Set spark.comet.expression.FromUTCTimestamp.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • Comet’s native timezone parser only accepts IANA zone IDs (e.g. America/Los_Angeles) and fixed offsets in +HH:MM form. Spark also accepts forms such as GMT+1, UTC+1, or three-letter abbreviations like PST; queries using those forms will throw a native parse error at execution time. See https://github.com/apache/datafusion-comet/issues/2013.

FromUnixTime#

By default, Comet runs a Spark-compatible implementation of FromUnixTime. Set spark.comet.expression.FromUnixTime.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • Only supports the default datetime format pattern yyyy-MM-dd HH:mm:ss. DataFusion’s valid timestamp range differs from Spark (https://github.com/apache/datafusion/issues/16594)

  • from_unixtime does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

Hours#

The following cases are not supported by Comet:

  • Only TimestampType and TimestampNTZType inputs are supported.

MakeTimestamp#

By default, Comet runs a Spark-compatible implementation of MakeTimestamp. Set spark.comet.expression.MakeTimestamp.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • make_timestamp does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

NextDay#

The following incompatibilities cause NextDay to fall back to Spark by default. Set spark.comet.expression.NextDay.allowIncompatible=true to enable Comet acceleration despite these differences.

  • next_day does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

PreciseTimestampConversion#

The following cases are not supported by Comet:

  • Only reinterprets between TimestampType/TimestampNTZType and LongType are supported.

SecondsToTimestamp#

The following cases are not supported by Comet:

  • Only IntegerType, LongType, FloatType, and DoubleType inputs are supported. DecimalType, ByteType, and ShortType fall back to Spark.

ToUTCTimestamp#

By default, Comet runs a Spark-compatible implementation of ToUTCTimestamp. Set spark.comet.expression.ToUTCTimestamp.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • Comet’s native timezone parser only accepts IANA zone IDs (e.g. America/Los_Angeles) and fixed offsets in +HH:MM form. Spark also accepts forms such as GMT+1, UTC+1, or three-letter abbreviations like PST; queries using those forms will throw a native parse error at execution time. See https://github.com/apache/datafusion-comet/issues/2013.

ToUnixTimestamp#

By default, Comet runs a Spark-compatible implementation of ToUnixTimestamp. Set spark.comet.expression.ToUnixTimestamp.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • to_unix_timestamp does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

TruncDate#

By default, Comet runs a Spark-compatible implementation of TruncDate. Set spark.comet.expression.TruncDate.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • Non-literal format strings will throw an exception instead of returning NULL

  • trunc does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

The following cases are not supported by Comet:

  • Only the following formats are supported: year, yyyy, yy, quarter, mon, month, mm, week

TruncTimestamp#

By default, Comet runs a Spark-compatible implementation of TruncTimestamp. Set spark.comet.expression.TruncTimestamp.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • Produces incorrect results when used with non-UTC timezones. Compatible when timezone is UTC. (https://github.com/apache/datafusion-comet/issues/2649)

  • Non-literal format strings will throw an exception instead of returning NULL

  • date_trunc does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

The following cases are not supported by Comet:

  • Only the following formats are supported: year, yyyy, yy, quarter, mon, month, mm, week, day, dd, hour, minute, second, millisecond, microsecond

UnixTimestamp#

The following incompatibilities cause UnixTimestamp to fall back to Spark by default. Set spark.comet.expression.UnixTimestamp.allowIncompatible=true to enable Comet acceleration despite these differences.

  • unix_timestamp does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/4646)

The following cases are not supported by Comet:

  • Only DateType, TimestampType, and TimestampNTZType inputs are supported.