Spark Data Type Support#

This page is the complete reference for how Apache Comet handles each Spark data type. Comet’s native execution path is built on Apache Arrow, so the set of types Comet can express natively is constrained by Arrow’s type system. When a query references a type Comet does not support, the relevant operator falls back to Spark; results are unaffected.

For per-scan and per-operator type caveats (for example, Parquet read-time conversions or hash-aggregate group-key restrictions), see the Compatibility Guide.

Status legend#

Status

Meaning

✅ Supported

Native support; enabled by default.

⚠️ Supported (caveats)

Works, but with limits: certain values, contexts, or configurations fall back to Spark.

🔜 Planned

Intended; tracked by an open issue or pull request.

Not currently planned#

The following types fall back to Spark and are not on the current roadmap. They are omitted from the tables below and may be reconsidered based on demand:

  • UserDefinedType: user-defined types are application-specific and outside the scope of native acceleration; queries referencing UDTs fall back to Spark.

Numeric#

Type

Status

Notes

ByteType

ShortType

IntegerType

LongType

FloatType

NaN and signed-zero handling can diverge from Spark in comparisons and aggregations. See Floating-point Compatibility.

DoubleType

NaN and signed-zero handling can diverge from Spark in comparisons and aggregations. See Floating-point Compatibility.

DecimalType

String and binary#

Type

Status

Notes

StringType

Default UTF-8 binary collation is supported. Non-default collations (Spark 4.0+) fall back (#2190).

BinaryType

CharType

Spark normalizes CHAR(n) to StringType for evaluation; same caveats apply.

VarcharType

Spark normalizes VARCHAR(n) to StringType for evaluation; same caveats apply.

Boolean#

Type

Status

Notes

BooleanType

Datetime#

Type

Status

Notes

DateType

TimestampType

TimestampNTZType

TimeType

⚠️

Spark 4.1+. Native serialization is in place; some operators (sort, shuffle, min/max) are still being wired up (#4288).

Interval#

Interval types fall back to Spark today. Native acceleration is tracked by #4540.

Type

Status

Notes

YearMonthIntervalType

🔜

Tracked by #4540.

DayTimeIntervalType

🔜

Tracked by #4540.

CalendarIntervalType

🔜

Tracked by #4540.

Complex#

Type

Status

Notes

StructType

Empty structs (no fields) fall back.

ArrayType

MapType

Hash aggregate group keys cannot contain a MapType (transitively): Arrow’s row format used by DataFusion’s grouped hash aggregate does not support Map, so such groupings fall back.

Variant#

Type

Status

Notes

VariantType

🔜

Spark 4.0+. Native scan support is tracked by #4295; shredded Parquet read/write by #3983.

Other#

Type

Status

Notes

NullType

See also#