Ballista Development

We welcome participation from everyone and encourage you to join us, ask questions, and get involved.

All participation in the Apache DataFusion Ballista project is governed by the Apache Software Foundation’s code of conduct.

Development Environment

The easiest way to get started if you are using VSCode or IntelliJ IDEA is to open the provided Dev Container which will install all the required dependencies including Rust, Docker, Node.js and Yarn. A Dev Container is a development environment that runs in a Docker container. It is configured with all the required dependencies to build and test the project. It also includes VS Code and the Rust and Node.js extensions. Other supporting tools that use Dev Containers can be seen here

To use the Dev Container, open the project in VS Code and then click the “Reopen in Container” button in the bottom right corner of the IDE.

If you are not using the Dev Container or VScode, you will need to install these dependencies yourself.

Build the project

From the root of the project, build release binaries.

cargo build --release

Testing the project

cargo test

Running the examples

cd examples
cargo run --example standalone_sql --features=ballista/standalone

Building the Python Client from Source

The Python client (ballista on PyPI) lives in the python/ directory and is versioned and released independently from the main Ballista project, so it is intentionally not part of the default Cargo workspace. Building it from source uses maturin to compile the Rust extension module and install it into a Python virtual environment.

Prerequisites

All commands in this section are run from the python/ directory:

cd python

Create a virtual environment

Using pip:

python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt

The pip upgrade is required because maturin develop invokes pip install --group, which needs pip 25.1 or newer. The pip shipped with python3 -m venv is often older than that.

Using uv:

uv sync --dev --no-install-package ballista

Build and install

maturin develop compiles the Rust extension and installs it into the active virtual environment as an editable package, so subsequent Python imports pick up your local changes.

Using pip:

maturin develop            # debug build
maturin develop --release  # release build (slower to compile, faster at runtime)

Using uv:

uv run --no-project maturin develop --uv

To produce a wheel without installing it (for example, to share or publish):

uv run --no-project maturin build --release --strip

Run the tests

Using pip:

python3 -m pytest

Using uv:

uv run --no-project pytest

For more detail on the underlying workflow, including tips for improving build speed, see the DataFusion Python contributor guide.

Benchmarking

For performance testing and benchmarking with TPC-H and other datasets, see the benchmarks README.

This includes instructions for:

  • Generating TPC-H test data

  • Running benchmarks against DataFusion and Ballista

  • Comparing performance with Apache Spark

  • Running load tests