String Expressions#

BitLength#

The following cases are not supported by Comet:

  • BinaryType input is not supported

Concat#

The following incompatibilities cause Concat to fall back to Spark by default. Set spark.comet.expression.Concat.allowIncompatible=true to enable Comet acceleration despite these differences.

  • concat does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/2190)

The following cases are not supported by Comet:

  • CONCAT supports only string input parameters

Left#

The following cases are not supported by Comet:

  • Only supports BinaryType and StringType input

  • The length argument must be a literal value

Length#

The following cases are not supported by Comet:

  • BinaryType input is not supported

OctetLength#

The following cases are not supported by Comet:

  • BinaryType input is not supported

Reverse#

By default, Comet accelerates Reverse using JVM codegen dispatch, which runs Spark’s generated code inside Comet’s native pipeline and matches Spark exactly. Set spark.comet.expression.Reverse.allowIncompatible=true to use Comet’s faster native implementation instead, which has the following differences from Spark:

  • reverse on array containing binary is not supported

  • reverse does not support non-UTF8_BINARY collations (https://github.com/apache/datafusion-comet/issues/2190)

StringLPad#

The following cases are not supported by Comet:

  • Scalar values are not supported for the str argument.

  • Only scalar values are supported for the pad argument.

StringRPad#

The following cases are not supported by Comet:

  • Scalar values are not supported for the str argument.

  • Only scalar values are supported for the pad argument.

StringRepeat#

The following differences from Spark are always present and do not require any additional configuration:

  • A negative argument for the number of times to repeat throws an exception instead of returning an empty string as Spark does

StringTranslate#

The following incompatibilities cause StringTranslate to fall back to Spark by default. Set spark.comet.expression.StringTranslate.allowIncompatible=true to enable Comet acceleration despite these differences.

  • DataFusion’s translate iterates over Unicode graphemes (Spark uses code points) and substitutes U+0000 instead of treating it as a deletion sentinel