Spark Data Type Support#

This page is the complete reference for how Apache Comet handles each Spark data type. Comet’s native execution path is built on Apache Arrow, so the set of types Comet can express natively is constrained by Arrow’s type system. When a query references a type Comet does not support, the relevant operator falls back to Spark; results are unaffected.

For per-scan and per-operator type caveats (for example, Parquet read-time conversions or hash-aggregate group-key restrictions), see the Compatibility Guide.

Status legend#

Status	Meaning
✅ Supported	Native support; enabled by default.
⚠️ Supported (caveats)	Works, but with limits: certain values, contexts, or configurations fall back to Spark.
🔜 Planned	Intended; tracked by an open issue or pull request.

Not currently planned#

The following types fall back to Spark and are not on the current roadmap. They are omitted from the tables below and may be reconsidered based on demand:

UserDefinedType: user-defined types are application-specific and outside the scope of native acceleration; queries referencing UDTs fall back to Spark.

Numeric#

Type	Status	Notes
`ByteType`	✅
`ShortType`	✅
`IntegerType`	✅
`LongType`	✅
`FloatType`	✅	NaN and signed-zero handling can diverge from Spark in comparisons and aggregations. See Floating-point Compatibility.
`DoubleType`	✅	NaN and signed-zero handling can diverge from Spark in comparisons and aggregations. See Floating-point Compatibility.
`DecimalType`	✅

String and binary#

Type	Status	Notes
`StringType`	✅	Default UTF-8 binary collation is supported. Non-default collations (Spark 4.0+) fall back (#2190).
`BinaryType`	✅
`CharType`	✅	Spark normalizes `CHAR(n)` to `StringType` for evaluation; same caveats apply.
`VarcharType`	✅	Spark normalizes `VARCHAR(n)` to `StringType` for evaluation; same caveats apply.

Boolean#

Type	Status	Notes
`BooleanType`	✅

Datetime#

Type	Status	Notes
`DateType`	✅
`TimestampType`	✅
`TimestampNTZType`	✅
`TimeType`	⚠️	Spark 4.1+. Native serialization is in place; some operators (sort, shuffle, min/max) are still being wired up (#4288).

Interval#

Interval types fall back to Spark today. Native acceleration is tracked by #4540.

Type	Status	Notes
`YearMonthIntervalType`	🔜	Tracked by #4540.
`DayTimeIntervalType`	🔜	Tracked by #4540.
`CalendarIntervalType`	🔜	Tracked by #4540.

Complex#

Type	Status	Notes
`StructType`	✅	Empty structs (no fields) fall back.
`ArrayType`	✅
`MapType`	✅	Hash aggregate group keys cannot contain a `MapType` (transitively): Arrow’s row format used by DataFusion’s grouped hash aggregate does not support `Map`, so such groupings fall back.

Variant#

Type	Status	Notes
`VariantType`	🔜	Spark 4.0+. Native scan support is tracked by #4295; shredded Parquet read/write by #3983.

Other#

Type	Status	Notes
`NullType`	✅