Building Comet From Source

It is sometimes preferable to build from source for a specific platform.

Using a Published Source Release

Official source releases can be downloaded from https://dist.apache.org/repos/dist/release/datafusion/

# Pick the latest version
export COMET_VERSION=0.3.0
# Download the tarball
curl -O "https://dist.apache.org/repos/dist/release/datafusion/datafusion-comet-$COMET_VERSION/apache-datafusion-comet-$COMET_VERSION.tar.gz"
# Unpack
tar -xzf apache-datafusion-comet-$COMET_VERSION.tar.gz
cd apache-datafusion-comet-$COMET_VERSION

Build

make release-nogit PROFILES="-Pspark-3.4"

Building from the GitHub repository

Clone the repository:

git clone https://github.com/apache/datafusion-comet.git

Build Comet for a specific Spark version:

cd datafusion-comet
make release PROFILES="-Pspark-3.4"

Note that the project builds for Scala 2.12 by default but can be built for Scala 2.13 using an additional profile:

make release PROFILES="-Pspark-3.4 -Pscala-2.13"

To build Comet from the source distribution on an isolated environment without an access to github.com it is necessary to disable git-commit-id-maven-plugin, otherwise you will face errors that there is no access to the git during the build process. In that case you may use:

make release-nogit PROFILES="-Pspark-3.4"