Compatibility Guide

Comet aims to provide consistent results with the version of Apache Spark that is being used.

This guide offers information about areas of functionality where there are known differences.

ANSI mode

Comet currently ignores ANSI mode in most cases, and therefore can produce different results than Spark. By default, Comet will fall back to Spark if ANSI mode is enabled. To enable Comet to accelerate queries when ANSI mode is enabled, specify spark.comet.ansi.enabled=true in the Spark configuration. Comet’s ANSI support is experimental and should not be used in production.

There is an epic where we are tracking the work to fully implement ANSI support.

Cast

Cast operations in Comet fall into three levels of support:

  • Compatible: The results match Apache Spark

  • Incompatible: The results may match Apache Spark for some inputs, but there are known issues where some inputs will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting spark.comet.cast.allowIncompatible=true will allow all incompatible casts to run natively in Comet, but this is not recommended for production use.

  • Unsupported: Comet does not provide a native version of this cast expression and the query stage will fall back to Spark.

Compatible Casts

The following cast operations are generally compatible with Spark except for the differences noted here.

From Type

To Type

Notes

boolean

byte

boolean

short

boolean

integer

boolean

long

boolean

float

boolean

double

boolean

string

byte

boolean

byte

short

byte

integer

byte

long

byte

float

byte

double

byte

decimal

byte

string

short

boolean

short

byte

short

integer

short

long

short

float

short

double

short

decimal

short

string

integer

boolean

integer

byte

integer

short

integer

long

integer

float

integer

double

integer

string

long

boolean

long

byte

long

short

long

integer

long

float

long

double

long

string

float

boolean

float

byte

float

short

float

integer

float

long

float

double

float

decimal

float

string

There can be differences in precision. For example, the input “1.4E-45” will produce 1.0E-45 instead of 1.4E-45

double

boolean

double

byte

double

short

double

integer

double

long

double

float

double

decimal

double

string

There can be differences in precision. For example, the input “1.4E-45” will produce 1.0E-45 instead of 1.4E-45

decimal

byte

decimal

short

decimal

integer

decimal

long

decimal

float

decimal

double

string

boolean

string

byte

string

short

string

integer

string

long

string

binary

string

date

Only supports years between 262143 BC and 262142 AD

date

string

timestamp

long

timestamp

decimal

timestamp

string

timestamp

date

Incompatible Casts

The following cast operations are not compatible with Spark for all inputs and are disabled by default.

From Type

To Type

Notes

integer

decimal

No overflow check

long

decimal

No overflow check

string

timestamp

Not all valid formats are supported

binary

string

Only works for binary data representing valid UTF-8 strings

Unsupported Casts

Any cast not listed in the previous tables is currently unsupported. We are working on adding more. See the tracking issue for more details.