hash_funcs Expression Audits#
Audit notes for expressions in this category that have been audited. Absence of an entry means the expression has not been audited yet, not that it is unsupported. See the user guide Spark Expression Support for current support status.
crc32#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
Crc32(child) extends UnaryExpression;inputTypes = Seq(BinaryType) -> LongType. Wired asCometScalarFunction("crc32").Spark 4.0.1 (audited 2026-05-27): semantics unchanged;
NullIntoleranttrait replaced bynullIntolerant: Booleanoverride.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
hash#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
Murmur3Hash(children, seed) extends HashExpression[Int]; produces a Murmur3 hash with a configurable Int seed andIntegerTyperesult. Comet routes viaCometMurmur3Hashto the nativemurmur3_hashUDF.Spark 4.0.1 (audited 2026-05-27): semantics unchanged; some inner helper refactors only.
Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
Known limitation:
DecimalTypechildren with precision > 18 fall back because Spark hashes them through JavaBigDecimal;TimeType(Spark 4.0+) is also unsupported. The same limitations apply toxxhash64,sha1,sha2through the sharedHashUtils.
md5#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
Md5(child) extends UnaryExpression;inputTypes = Seq(BinaryType) -> StringType. Wired asCometScalarFunction("md5").Spark 4.0.1 (audited 2026-05-27): semantics unchanged; trait set gains
DefaultStringProducingExpressionand thenullIntolerant: Booleanrefactor.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
sha#
Spark 3.4.3 (audited 2026-05-27): registry alias of
Sha1. Same support assha1.Spark 3.5.8 (audited 2026-05-27): identical to 3.4.3.
Spark 4.0.1 (audited 2026-05-27): identical to 3.4.3.
Spark 4.1.1 (audited 2026-05-27): identical to 3.4.3.
sha1#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
Sha1(child) extends UnaryExpression with NullIntolerant;inputTypes = Seq(BinaryType) -> StringType. Comet routes viaCometSha1to the nativesha1UDF.Spark 4.0.1 (audited 2026-05-27): trait set gains
DefaultStringProducingExpressionandNullIntolerantis replaced bynullIntolerant: Boolean. Runtime unchanged.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
sha2#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
Sha2(left, right) extends BinaryExpression;inputTypes = Seq(BinaryType, IntegerType) -> StringType. ThenumBitsargument selects SHA-224/256/384/512 (0 is treated as 256); other values return NULL. Comet routes viaCometSha2to the nativesha2UDF; non-foldablenumBitsfalls back to Spark.Spark 4.0.1 (audited 2026-05-27): trait set gains
DefaultStringProducingExpressionand thenullIntolerant: Booleanrefactor. Runtime unchanged.Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.
xxhash64#
Spark 3.4.3 (audited 2026-05-27): identical to 3.5.8.
Spark 3.5.8 (audited 2026-05-27): baseline.
XxHash64(children, seed) extends HashExpression[Long]; produces an xxHash64 hash with a configurable Long seed andLongTyperesult. Comet routes viaCometXxHash64to the nativexxhash64UDF.Spark 4.0.1 (audited 2026-05-27): semantics unchanged.
Spark 4.1.1 (audited 2026-05-27): identical to 4.0.1.