String Expressions#
BitLength#
The following cases are not supported by Comet:
BinaryTypeinput is not supported
Concat#
By default, Comet runs a Spark-compatible implementation of Concat. Set spark.comet.expression.Concat.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
concat does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/2190)
The following cases are not supported by Comet:
CONCAT supports only string input parameters
GetJsonObject#
By default, Comet runs a Spark-compatible implementation of GetJsonObject. Set spark.comet.expression.GetJsonObject.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Spark allows single-quoted JSON and unescaped control characters which Comet does not support
InitCap#
By default, Comet runs a Spark-compatible implementation of InitCap. Set spark.comet.expression.InitCap.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Treats hyphen as a word separator (e.g.
robert rose-smithproducesRobert Rose-Smithinstead of Spark’sRobert Rose-smith) (https://github.com/apache/datafusion-comet/issues/1052)
Left#
The following cases are not supported by Comet:
Only supports
BinaryTypeandStringTypeinputThe length argument must be a literal value
Length#
The following cases are not supported by Comet:
BinaryTypeinput is not supported
Lower#
By default, Comet runs a Spark-compatible implementation of Lower. Set spark.comet.caseConversion.enabled=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Results can vary depending on locale and character set
OctetLength#
The following cases are not supported by Comet:
BinaryTypeinput is not supported
RLike#
By default, Comet runs a Spark-compatible implementation of RLike. Set spark.comet.expression.RLike.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Uses Rust regexp engine, which has different behavior to Java regexp engine
RegExpReplace#
By default, Comet runs a Spark-compatible implementation of RegExpReplace. Set spark.comet.expression.RegExpReplace.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Regexp pattern may not be compatible with Spark
Reverse#
By default, Comet runs a Spark-compatible implementation of Reverse. Set spark.comet.expression.Reverse.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
native reverse does not support arrays whose element type contains binary, struct, or map
reverse does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/2190)
Right#
The following cases are not supported by Comet:
Only supports
StringTypeinput
StringLPad#
The following cases are not supported by Comet:
Scalar values are not supported for the
strargument.Only scalar values are supported for the
padargument.
StringRPad#
The following cases are not supported by Comet:
Scalar values are not supported for the
strargument.Only scalar values are supported for the
padargument.
StringRepeat#
The following differences from Spark are always present and do not require any additional configuration:
A negative argument for the number of times to repeat throws an exception instead of returning an empty string as Spark does
StringReplace#
By default, Comet runs a Spark-compatible implementation of StringReplace. Set spark.comet.expression.StringReplace.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Produces different results from Spark when the search string is empty
StringSplit#
By default, Comet runs a Spark-compatible implementation of StringSplit. Set spark.comet.expression.StringSplit.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Regex engine differences between Java and Rust
StringTranslate#
The following incompatibilities cause StringTranslate to fall back to Spark by default. Set spark.comet.expression.StringTranslate.allowIncompatible=true to enable Comet acceleration despite these differences.
DataFusion’s translate iterates over Unicode graphemes (Spark uses code points) and substitutes U+0000 instead of treating it as a deletion sentinel
Upper#
By default, Comet runs a Spark-compatible implementation of Upper. Set spark.comet.caseConversion.enabled=true to use Comet’s faster native implementation instead, which has the following differences from Spark:
Results can vary depending on locale and character set