Apache DataFusion Comet: Release Process#

This documentation explains the release process for Apache DataFusion Comet. Some preparation tasks can be performed by any contributor, while certain release tasks can only be performed by a DataFusion Project Management Committee (PMC) member.

Checklist#

The following is a quick-reference checklist for the full release process. See the detailed sections below for instructions on each step.

  • Release preparation: review expression support status and user guide

  • Create release branch

  • Generate release documentation

  • Update Maven version in release branch

  • Update version in main for next development cycle

  • Generate the change log and create PR against main

  • Cherry-pick the change log commit into the release branch

  • Build the jars

  • Tag the release candidate

  • Update documentation for the new release

  • Publish Maven artifacts to staging

  • Create the release candidate tarball

  • Start the email voting thread

  • Once the vote passes:

    • Publish source tarball

    • Create GitHub release

    • Promote Maven artifacts to production

    • Push the release tag

    • Close the vote and announce the release

  • Post release:

    • Register the release with Apache Reporter

    • Delete old RCs and releases from SVN

    • Write a blog post

Release Preparation#

Before starting the release process, review the user guide to ensure it accurately reflects the current state of the project:

  • Review the supported expressions and operators lists in the user guide. Verify that any expressions added since the last release are included and that their support status is accurate.

  • Spot-check the support status of individual expressions by running tests or queries to confirm they work as documented.

  • Look for any expressions that may have regressed or changed behavior since the last release and update the documentation accordingly.

It is also recommended to run benchmarks (such as TPC-H and TPC-DS) comparing performance against the previous release to check for regressions. See the Comet Benchmarking Guide for instructions.

These are tasks where agentic coding tools can be particularly helpful — for example, scanning the codebase for newly registered expressions and cross-referencing them against the documented list, or generating test queries to verify expression support status.

Any issues found should be addressed before creating the release branch.

Creating the Release Candidate#

This part of the process can be performed by any committer.

Here are the steps, using the 0.13.0 release as an example.

Create Release Branch#

This document assumes that GitHub remotes are set up as follows:

$ git remote -v
apache	git@github.com:apache/datafusion-comet.git (fetch)
apache	git@github.com:apache/datafusion-comet.git (push)
origin	git@github.com:yourgithubid/datafusion-comet.git (fetch)
origin	git@github.com:yourgithubid/datafusion-comet.git (push)

Create a release branch from the latest commit in main and push to the apache repo:

git fetch apache
git checkout main
git reset --hard apache/main
git checkout -b branch-0.13
git push apache branch-0.13

Generate Release Documentation#

Generate the documentation content for this release. The docs on main contain only template markers, so we need to generate the actual content (config tables, compatibility matrices) for the release branch:

./dev/generate-release-docs.sh
git add docs/source/user-guide/latest/
git commit -m "Generate docs for 0.13.0 release"
git push apache branch-0.13

This freezes the documentation to reflect the configs and expressions available in this release.

Update Maven Version#

Update the pom.xml files in the release branch to update the Maven version from 0.13.0-SNAPSHOT to 0.13.0.

There is no need to update the Rust crate versions because they will already be 0.13.0.

Update Version in main#

Create a PR against the main branch to prepare for developing the next release:

  • Update the Rust crate version to 0.14.0.

  • Update the Maven version to 0.14.0-SNAPSHOT (both in the pom.xml files and also in the diff files under dev/diffs).

Generate the Change Log#

Generate a change log to cover changes between the previous release and the release branch HEAD by running the provided dev/release/generate-changelog.py.

It is recommended that you set up a virtual Python environment and then install the dependencies:

cd dev/release
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

To generate the changelog, set the GITHUB_TOKEN environment variable to a valid token and then run the script providing two commit ids or tags followed by the version number of the release being created. The following example generates a change log of all changes between the previous version and the current release branch HEAD revision.

export GITHUB_TOKEN=<your-token-here>
python3 generate-changelog.py 0.12.0 HEAD 0.13.0 > ../changelog/0.13.0.md

Create a PR against the main branch to add this change log and once this is approved and merged, cherry-pick the commit into the release branch.

Build the jars#

Setup to do the build#

The build process requires Docker. Download the latest Docker Desktop from https://www.docker.com/products/docker-desktop/. If you have multiple docker contexts running switch to the context of the Docker Desktop. For example -

$ docker context ls
NAME              DESCRIPTION                               DOCKER ENDPOINT                               ERROR
default           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock
desktop-linux     Docker Desktop                            unix:///Users/parth/.docker/run/docker.sock
my_custom_context *                                         tcp://192.168.64.2:2376

$ docker context use desktop-linux

Run the build script#

The build-release-comet.sh script will create a docker image for each architecture and use the image to build the platform specific binaries. These builder images are created every time this script is run. The script optionally allows overriding of the repository and branch to build the binaries from (Note that the local git repo is not used in the building of the binaries, but it is used to build the final uber jar).

Usage: build-release-comet.sh [options]

This script builds comet native binaries inside a docker image. The image is named
"comet-rm" and will be generated by this script

Options are:

-r [repo]   : git repo (default: https://github.com/apache/datafusion-comet.git)
-b [branch] : git branch (default: release)
-t [tag]    : tag for the spark-rm docker image to use for building (default: "latest").

Example:

cd dev/release && ./build-release-comet.sh && cd ../..

Build output#

The build output is installed to a temporary local maven repository. The build script will print the name of the repository location at the end. This location will be required at the time of deploying the artifacts to a staging repository

Tag the Release Candidate#

Ensure that the Maven version update and changelog cherry-pick have been pushed to the release branch before tagging.

Tag the release branch with 0.13.0-rc1 and push to the apache repo

git fetch apache
git checkout branch-0.13
git reset --hard apache/branch-0.13
git tag 0.13.0-rc1
git push apache 0.13.0-rc1

Note that pushing a release candidate tag will trigger a GitHub workflow that will build a Docker image and publish it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet

Publishing Documentation#

In docs directory:

  • Update docs/source/index.rst and add a new navigation menu link for the new release in the section _toc.user-guide-links-versioned

  • Add a new line to build.sh to delete the locally cloned comet-* branch for the new release e.g. comet-0.13

  • Update the main method in generate-versions.py:

    latest_released_version = "0.13.0"
    previous_versions = ["0.11.0", "0.12.0"]

Test the documentation build locally, following the instructions in docs/README.md.

Once verified, create a PR against the main branch with these documentation changes. After merging, the docs will be deployed to https://datafusion.apache.org/comet/ by the documentation publishing workflow.

Note that the download links in the installation guide will not work until the release is finalized, but having the documentation available could be useful for anyone testing out the release candidate during the voting period.

Publishing the Release Candidate#

This part of the process can mostly only be performed by a PMC member.

Publish the maven artifacts#

Setup maven#

One time project setup#

Setting up your project in the ASF Nexus Repository from here: https://infra.apache.org/publishing-maven-artifacts.html

Release Manager Setup#

Set up your development environment from here: https://infra.apache.org/publishing-maven-artifacts.html

Build and publish a release candidate to nexus.#

The script publish-to-maven.sh will publish the artifacts created by the build-release-comet.sh script. The artifacts will be signed using the gpg key of the release manager and uploaded to the maven staging repository.

Note that installed GPG keys can be listed with gpg --list-keys. The gpg key is a 40 character hex string.

Note: This script needs xmllint to be installed. On macOS xmllint is available by default.

On Ubuntu apt-get install -y libxml2-utils

On RedHat yum install -y xmlstarlet

./dev/release/publish-to-maven.sh -h
usage: publish-to-maven.sh options

Publish signed artifacts to Maven.

Options
-u ASF_USERNAME - Username of ASF committer account
-r LOCAL_REPO - path to temporary local maven repo (created and written to by 'build-release-comet.sh')

The following will be prompted for -
ASF_PASSWORD - Password of ASF committer account
GPG_KEY - GPG key used to sign release artifacts
GPG_PASSPHRASE - Passphrase for GPG key

example

./dev/release/publish-to-maven.sh -u release_manager_asf_id -r /tmp/comet-staging-repo-VsYOX
ASF Password :
GPG Key (Optional):
GPG Passphrase :
Creating Nexus staging repository
...

In the Nexus repository UI (https://repository.apache.org/) locate and verify the artifacts in staging (https://central.sonatype.org/publish/release/#locate-and-examine-your-staging-repository).

If the artifacts appear to be correct, then close and release the repository so it is made visible (this should actually happen automatically when running the script).

Create the Release Candidate Tarball#

The create-tarball.sh script creates a signed source tarball and uploads it to the dev subversion repository.

Prerequisites#

Before running this script, ensure you have:

  1. A GPG key set up for signing, with your public key uploaded to https://pgp.mit.edu/

  2. Apache SVN credentials (you must be logged into the Apache SVN server)

  3. The requests Python package installed (pip3 install requests)

Run the script#

Run the create-tarball script on the release candidate tag (0.13.0-rc1):

./dev/release/create-tarball.sh 0.13.0 1

This will generate an email template for starting the vote.

Start an Email Voting Thread#

Send the email that is generated in the previous step to dev@datafusion.apache.org.

The verification procedure for voters is documented in Verifying Release Candidates. Voters can also use the dev/release/verify-release-candidate.sh script to assist with verification:

./dev/release/verify-release-candidate.sh 0.13.0 1

If the Vote Fails#

If the vote does not pass, address the issues raised, increment the release candidate number, and repeat from the Tag the Release Candidate step. For example, the next attempt would be tagged 0.13.0-rc2.

Publishing Binary Releases#

Once the vote passes, we can publish the source and binary releases.

Publishing Source Tarball#

Run the release-tarball script to move the tarball to the release subversion repository.

./dev/release/release-tarball.sh 0.13.0 1

Create a release in the GitHub repository#

Go to https://github.com/apache/datafusion-comet/releases and create a release for the release tag, and paste the changelog in the description.

Publishing Maven Artifacts#

Promote the Maven artifacts from staging to production by visiting https://repository.apache.org/#stagingRepositories and selecting the staging repository and then clicking the “release” button.

Push a release tag to the repo#

Push a release tag (0.13.0) to the apache repository.

git fetch apache
git checkout 0.13.0-rc1
git tag 0.13.0
git push apache 0.13.0

Note that pushing a release tag will trigger a GitHub workflow that will build a Docker image and publish it to GitHub Container Registry at https://github.com/apache/datafusion-comet/pkgs/container/datafusion-comet

Reply to the vote thread to close the vote and announce the release. The announcement email should include:

  • The release version

  • A link to the release notes / changelog

  • A link to the download page or Maven coordinates

  • Thanks to everyone who contributed and voted

Post Release#

Register the release#

Register the release with the Apache Reporter Service using a version such as COMET-0.13.0.

Delete old RCs and Releases#

See the ASF documentation on when to archive for more information.

Deleting old release candidates from dev svn#

Release candidates should be deleted once the release is published.

Get a list of DataFusion Comet release candidates:

svn ls https://dist.apache.org/repos/dist/dev/datafusion | grep comet

Delete a release candidate:

svn delete -m "delete old DataFusion Comet RC" https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-comet-0.13.0-rc1/

Deleting old releases from release svn#

Only the latest release should be available. Delete old releases after publishing the new release.

Get a list of DataFusion releases:

svn ls https://dist.apache.org/repos/dist/release/datafusion | grep comet

Delete a release:

svn delete -m "delete old DataFusion Comet release" https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-0.12.0

Write a blog post#

Writing a blog post about the release is a great way to generate more interest in the project. We typically create a Google document where the community can collaborate on a blog post. Once the content is agreed then a PR can be created against the datafusion-site repository to add the blog post. Any contributor can drive this process.