JSON Compatibility#

Comet can evaluate JSON expressions (get_json_object, from_json, to_json, json_array_length) two ways:

  • Codegen dispatcher (default): Spark’s own doGenCode for the expression runs inside the Comet pipeline (via Comet’s Arrow-direct codegen dispatcher), giving byte-exact compatibility with Spark at the cost of a JNI roundtrip per batch. This rides the codegen dispatcher (spark.comet.exec.scalaUDF.codegen.enabled, enabled by default); if the dispatcher is disabled, the operator falls back to Spark.

  • Native (rust) path: the native DataFusion implementation. Faster, but has known compatibility gaps with Spark on certain inputs, so it is opt-in per expression via the expression’s allowIncompatible config. Any expression or input case with no native implementation falls back to the codegen dispatcher.

Expression coverage#

SQL

Native (rust) path

Opt-in config

get_json_object

Supported, with gaps on single-quoted JSON and unescaped control characters

spark.comet.expression.GetJsonObject.allowIncompatible

from_json

Supported with restrictions (PERMISSIVE mode only, simple schema types only)

spark.comet.expression.JsonToStructs.allowIncompatible

to_json

Supported for struct inputs only, no options

spark.comet.expression.StructsToJson.allowIncompatible

json_array_length

Supported, with gaps on single-quoted JSON, unescaped control characters, and trailing content

spark.comet.expression.LengthOfJsonArray.allowIncompatible

When the native path is enabled but an expression or input case has no native implementation (for example to_json with map or array inputs, or from_json with an unsupported schema), Comet falls back to the codegen dispatcher for that case.

When to use the native path#

  • You want the faster native path and your inputs avoid the known compatibility gaps above.

  • Enable it per expression, for example spark.comet.expression.GetJsonObject.allowIncompatible=true. Cases the native path does not cover still fall back to the codegen dispatcher.