Compatibility Guide¶
Comet aims to provide consistent results with the version of Apache Spark that is being used.
This guide offers information about areas of functionality where there are known differences.
ANSI mode¶
Comet currently ignores ANSI mode in most cases, and therefore can produce different results than Spark. By default,
Comet will fall back to Spark if ANSI mode is enabled. To enable Comet to accelerate queries when ANSI mode is enabled,
specify spark.comet.ansi.enabled=true
in the Spark configuration. Comet’s ANSI support is experimental and should not
be used in production.
There is an epic where we are tracking the work to fully implement ANSI support.
Regular Expressions¶
Comet uses the Rust regexp crate for evaluating regular expressions, and this has different behavior from Java’s
regular expression engine. Comet will fall back to Spark for patterns that are known to produce different results, but
this can be overridden by setting spark.comet.regexp.allowIncompatible=true
.
Floating number comparison¶
Spark normalizes NaN and zero for floating point numbers for several cases. See NormalizeFloatingNumbers
optimization rule in Spark.
However, one exception is comparison. Spark does not normalize NaN and zero when comparing values
because they are handled well in Spark (e.g., SQLOrderingUtil.compareFloats
). But the comparison
functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., arrow::compute::kernels::cmp::eq).
So Comet will add additional normalization expression of NaN and zero for comparison.
Cast¶
Cast operations in Comet fall into three levels of support:
Compatible: The results match Apache Spark
Incompatible: The results may match Apache Spark for some inputs, but there are known issues where some inputs will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting
spark.comet.cast.allowIncompatible=true
will allow all incompatible casts to run natively in Comet, but this is not recommended for production use.Unsupported: Comet does not provide a native version of this cast expression and the query stage will fall back to Spark.
Compatible Casts¶
The following cast operations are generally compatible with Spark except for the differences noted here.
From Type |
To Type |
Notes |
---|---|---|
boolean |
byte |
|
boolean |
short |
|
boolean |
integer |
|
boolean |
long |
|
boolean |
float |
|
boolean |
double |
|
boolean |
string |
|
byte |
boolean |
|
byte |
short |
|
byte |
integer |
|
byte |
long |
|
byte |
float |
|
byte |
double |
|
byte |
decimal |
|
byte |
string |
|
short |
boolean |
|
short |
byte |
|
short |
integer |
|
short |
long |
|
short |
float |
|
short |
double |
|
short |
decimal |
|
short |
string |
|
integer |
boolean |
|
integer |
byte |
|
integer |
short |
|
integer |
long |
|
integer |
float |
|
integer |
double |
|
integer |
string |
|
long |
boolean |
|
long |
byte |
|
long |
short |
|
long |
integer |
|
long |
float |
|
long |
double |
|
long |
string |
|
float |
boolean |
|
float |
byte |
|
float |
short |
|
float |
integer |
|
float |
long |
|
float |
double |
|
float |
decimal |
|
float |
string |
There can be differences in precision. For example, the input “1.4E-45” will produce 1.0E-45 instead of 1.4E-45 |
double |
boolean |
|
double |
byte |
|
double |
short |
|
double |
integer |
|
double |
long |
|
double |
float |
|
double |
decimal |
|
double |
string |
There can be differences in precision. For example, the input “1.4E-45” will produce 1.0E-45 instead of 1.4E-45 |
decimal |
byte |
|
decimal |
short |
|
decimal |
integer |
|
decimal |
long |
|
decimal |
float |
|
decimal |
double |
|
decimal |
string |
There can be formatting differences in some case due to Spark using scientific notation where Comet does not |
string |
boolean |
|
string |
byte |
|
string |
short |
|
string |
integer |
|
string |
long |
|
string |
binary |
|
string |
date |
Only supports years between 262143 BC and 262142 AD |
date |
string |
|
timestamp |
long |
|
timestamp |
decimal |
|
timestamp |
string |
|
timestamp |
date |
Incompatible Casts¶
The following cast operations are not compatible with Spark for all inputs and are disabled by default.
From Type |
To Type |
Notes |
---|---|---|
integer |
decimal |
No overflow check |
long |
decimal |
No overflow check |
string |
float |
Does not support inputs ending with ‘d’ or ‘f’. Does not support ‘inf’. Does not support ANSI mode. |
string |
double |
Does not support inputs ending with ‘d’ or ‘f’. Does not support ‘inf’. Does not support ANSI mode. |
string |
decimal |
Does not support inputs ending with ‘d’ or ‘f’. Does not support ‘inf’. Does not support ANSI mode. Returns 0.0 instead of null if input contains no digits |
string |
timestamp |
Not all valid formats are supported |
binary |
string |
Only works for binary data representing valid UTF-8 strings |
Unsupported Casts¶
Any cast not listed in the previous tables is currently unsupported. We are working on adding more. See the tracking issue for more details.