String Expressions#

Concat#

The following incompatibilities cause Concat to fall back to Spark by default. Set spark.comet.expression.Concat.allowIncompatible=true to enable Comet acceleration despite these differences.

  • concat does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/2190)

The following cases are not supported by Comet:

  • CONCAT supports only string input parameters

Left#

The following cases are not supported by Comet:

  • Only supports BinaryType and StringType input

  • The length argument must be a literal value

Length#

The following cases are not supported by Comet:

  • BinaryType input is not supported

RLike#

The following incompatibilities cause RLike to fall back to Spark by default. Set spark.comet.expression.RLike.allowIncompatible=true to enable Comet acceleration despite these differences.

  • Uses Rust regexp engine, which has different behavior to Java regexp engine

RegExpReplace#

The following incompatibilities cause RegExpReplace to fall back to Spark by default. Set spark.comet.expression.RegExpReplace.allowIncompatible=true to enable Comet acceleration despite these differences.

  • Regexp pattern may not be compatible with Spark

The following cases are not supported by Comet:

  • Only supports regexp_replace with an offset of 1 (no offset)

Reverse#

The following incompatibilities cause Reverse to fall back to Spark by default. Set spark.comet.expression.Reverse.allowIncompatible=true to enable Comet acceleration despite these differences.

  • reverse on array containing binary is not supported

  • reverse does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/2190)

StringLPad#

The following cases are not supported by Comet:

  • Scalar values are not supported for the str argument. Only scalar values are supported for the pad argument.

StringRPad#

The following cases are not supported by Comet:

  • Scalar values are not supported for the str argument. Only scalar values are supported for the pad argument.

StringRepeat#

The following differences from Spark are always present and do not require any additional configuration:

  • A negative argument for the number of times to repeat throws an exception instead of returning an empty string as Spark does

StringSplit#

The following incompatibilities cause StringSplit to fall back to Spark by default. Set spark.comet.expression.StringSplit.allowIncompatible=true to enable Comet acceleration despite these differences.

  • Regex engine differences between Java and Rust