Updating the DataFusion / protobuf schema version#

Three things must move together when bumping DataFusion:

  1. native/Cargo.toml — the datafusion crate dependency.

  2. pom.xml — the <datafusion.version> Maven property. Must equal the Cargo version; a mismatch means JVM-built protobuf plans won’t deserialize on the native side.

  3. pom.xml — the <sha512> checksums on the two download-maven-plugin executions. These pin the downloaded .proto files; the build fails if upstream silently re-tags them, which is the desired behavior.

Recipe#

# 1. Bump the Cargo dep
$EDITOR native/Cargo.toml             # set datafusion = "<new>"
(cd native && cargo update -p datafusion)

# 2. Bump the Maven property to match
$EDITOR pom.xml                       # set <datafusion.version>

# 3. Compute the new SHA-512 hashes for both `.proto` files from the
#    upstream tag you just set in step 2, then paste them into the two
#    <sha512> elements in pom.xml.
NEW=$(grep -m1 -oE '<datafusion.version>[^<]+' pom.xml | cut -d'>' -f2)
curl -sL "https://raw.githubusercontent.com/apache/datafusion/$NEW/datafusion/proto-common/proto/datafusion_common.proto" | shasum -a 512 | awk '{print $1}'
curl -sL "https://raw.githubusercontent.com/apache/datafusion/$NEW/datafusion/proto/proto/datafusion.proto" | shasum -a 512 | awk '{print $1}'
$EDITOR pom.xml                       # paste the two hashes into the <sha512> elements

# Drop the local download cache so the next build re-downloads against
# the new hashes.
rm -rf ~/.m2/repository/.cache/download-maven-plugin target/proto

# 4. Verify
make && make test

Why the protobuf runtime version is separate#

The protobuf runtime version (<protobuf.version> in pom.xml) tracks the Java ecosystem (security and JDK compatibility), not DataFusion. Bump it independently when there is a reason.