Compatibility Guide#

Comet aims to provide consistent results with the version of Apache Spark that is being used.

This guide offers information about areas of functionality where there are known differences.

Parquet#

Comet has the following limitations when reading Parquet files:

Comet does not support reading decimals encoded in binary format.
No support for default values that are nested types (e.g., maps, arrays, structs). Literal default values are supported.

ANSI Mode#

Comet will fall back to Spark for the following expressions when ANSI mode is enabled. Thes expressions can be enabled by setting spark.comet.expression.EXPRNAME.allowIncompatible=true, where EXPRNAME is the Spark expression class name. See the Comet Supported Expressions Guide for more information on this configuration setting.

Average
Sum
Cast (in some cases)

There is an epic where we are tracking the work to fully implement ANSI support.

Floating-point Number Comparison#

Spark normalizes NaN and zero for floating point numbers for several cases. See NormalizeFloatingNumbers optimization rule in Spark. However, one exception is comparison. Spark does not normalize NaN and zero when comparing values because they are handled well in Spark (e.g., SQLOrderingUtil.compareFloats). But the comparison functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., arrow::compute::kernels::cmp::eq). So Comet adds additional normalization expression of NaN and zero for comparisons, and may still have differences to Spark in some cases, especially when the data contains both positive and negative zero. This is likely an edge case that is not of concern for many users. If it is a concern, setting spark.comet.exec.strictFloatingPoint=true will make relevant operations fall back to Spark.

Incompatible Expressions#

Expressions that are not 100% Spark-compatible will fall back to Spark by default and can be enabled by setting spark.comet.expression.EXPRNAME.allowIncompatible=true, where EXPRNAME is the Spark expression class name. See the Comet Supported Expressions Guide for more information on this configuration setting.

Regular Expressions#

Comet uses the Rust regexp crate for evaluating regular expressions, and this has different behavior from Java’s regular expression engine. Comet will fall back to Spark for patterns that are known to produce different results, but this can be overridden by setting spark.comet.regexp.allowIncompatible=true.

Window Functions#

Comet’s support for window functions is incomplete and known to be incorrect. It is disabled by default and should not be used in production. The feature will be enabled in a future release. Tracking issue: #2721.

Cast#

Cast operations in Comet fall into three levels of support:

Compatible: The results match Apache Spark
Incompatible: The results may match Apache Spark for some inputs, but there are known issues where some inputs will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting spark.comet.expression.Cast.allowIncompatible=true will allow all incompatible casts to run natively in Comet, but this is not recommended for production use.
Unsupported: Comet does not provide a native version of this cast expression and the query stage will fall back to Spark.

Compatible Casts#

The following cast operations are generally compatible with Spark except for the differences noted here.

From Type	To Type	Notes
boolean	byte
boolean	short
boolean	integer
boolean	long
boolean	float
boolean	double
boolean	string
byte	boolean
byte	short
byte	integer
byte	long
byte	float
byte	double
byte	decimal
byte	string
short	boolean
short	byte
short	integer
short	long
short	float
short	double
short	decimal
short	string
integer	boolean
integer	byte
integer	short
integer	long
integer	float
integer	double
integer	decimal
integer	string
long	boolean
long	byte
long	short
long	integer
long	float
long	double
long	decimal
long	string
float	boolean
float	byte
float	short
float	integer
float	long
float	double
float	string	There can be differences in precision. For example, the input “1.4E-45” will produce 1.0E-45 instead of 1.4E-45
double	boolean
double	byte
double	short
double	integer
double	long
double	float
double	string	There can be differences in precision. For example, the input “1.4E-45” will produce 1.0E-45 instead of 1.4E-45
decimal	boolean
decimal	byte
decimal	short
decimal	integer
decimal	long
decimal	float
decimal	double
decimal	decimal
decimal	string	There can be formatting differences in some case due to Spark using scientific notation where Comet does not
string	boolean
string	byte
string	short
string	integer
string	long
string	binary
string	date	Only supports years between 262143 BC and 262142 AD
binary	string
date	string
timestamp	long
timestamp	string
timestamp	date

Incompatible Casts#

The following cast operations are not compatible with Spark for all inputs and are disabled by default.

From Type	To Type	Notes
float	decimal	There can be rounding differences
double	decimal	There can be rounding differences
string	float	Does not support inputs ending with ‘d’ or ‘f’. Does not support ‘inf’. Does not support ANSI mode.
string	double	Does not support inputs ending with ‘d’ or ‘f’. Does not support ‘inf’. Does not support ANSI mode.
string	decimal	Does not support inputs ending with ‘d’ or ‘f’. Does not support ‘inf’. Does not support ANSI mode. Returns 0.0 instead of null if input contains no digits
string	timestamp	Not all valid formats are supported

Unsupported Casts#

Any cast not listed in the previous tables is currently unsupported. We are working on adding more. See the tracking issue for more details.