Ballista Development#

We welcome participation from everyone and encourage you to join us, ask questions, and get involved.

All participation in the Apache DataFusion Ballista project is governed by the Apache Software Foundation’s code of conduct.

Development Environment#

The easiest way to get started if you are using VSCode or IntelliJ IDEA is to open the provided Dev Container which will install all the required dependencies including Rust, Docker, Node.js and Yarn. A Dev Container is a development environment that runs in a Docker container. It is configured with all the required dependencies to build and test the project. It also includes VS Code and the Rust and Node.js extensions. Other supporting tools that use Dev Containers can be seen here

To use the Dev Container, open the project in VS Code and then click the “Reopen in Container” button in the bottom right corner of the IDE.

If you are not using the Dev Container or VScode, you will need to install these dependencies yourself.

Build the project#

From the root of the project, build release binaries.

cargo build --release

Testing the project#

cargo test

Running the examples#

cd examples
cargo run --example standalone_sql --features=ballista/standalone

Building the Python Client from Source#

The Python client (ballista on PyPI) lives in the python/ directory and is versioned and released independently from the main Ballista project, so it is intentionally not part of the default Cargo workspace. Building it from source uses maturin to compile the Rust extension module and install it into a Python virtual environment.

Prerequisites#

All commands in this section are run from the python/ directory:

cd python

Create a virtual environment#

Using pip:

python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt

The pip upgrade is required because maturin develop invokes pip install --group, which needs pip 25.1 or newer. The pip shipped with python3 -m venv is often older than that.

Using uv:

uv sync --dev --no-install-package ballista

Build and install#

maturin develop compiles the Rust extension and installs it into the active virtual environment as an editable package, so subsequent Python imports pick up your local changes.

Using pip:

maturin develop            # debug build
maturin develop --release  # release build (slower to compile, faster at runtime)

Using uv:

uv run --no-project maturin develop --uv

To produce a wheel without installing it (for example, to share or publish):

uv run --no-project maturin build --release --strip

Run the tests#

Using pip:

python3 -m pytest

Using uv:

uv run --no-project pytest

For more detail on the underlying workflow, including tips for improving build speed, see the DataFusion Python contributor guide.

Benchmarking#

For performance testing and benchmarking with TPC-H and other datasets, see the benchmarks README.

This includes instructions for:

  • Generating TPC-H test data

  • Running benchmarks against DataFusion and Ballista

  • Comparing performance with Apache Spark

  • Running load tests