Ballista Development¶
We welcome participation from everyone and encourage you to join us, ask questions, and get involved.
All participation in the Apache DataFusion Ballista project is governed by the Apache Software Foundation’s code of conduct.
Development Environment¶
The easiest way to get started if you are using VSCode or IntelliJ IDEA is to open the provided Dev Container which will install all the required dependencies including Rust, Docker, Node.js and Yarn. A Dev Container is a development environment that runs in a Docker container. It is configured with all the required dependencies to build and test the project. It also includes VS Code and the Rust and Node.js extensions. Other supporting tools that use Dev Containers can be seen here
To use the Dev Container, open the project in VS Code and then click the “Reopen in Container” button in the bottom right corner of the IDE.
If you are not using the Dev Container or VScode, you will need to install these dependencies yourself.
Protobuf Compiler is required to build the project.
Node.js is required to build the project.
Yarn is required to build the UI.
Docker is required to run the integration tests.
Build the project¶
From the root of the project, build release binaries.
cargo build --release
Testing the project¶
cargo test
Running the examples¶
cd examples
cargo run --example standalone_sql --features=ballista/standalone
Building the Python Client from Source¶
The Python client (ballista on PyPI) lives in the python/ directory and is versioned and released
independently from the main Ballista project, so it is intentionally not part of the default Cargo
workspace. Building it from source uses maturin to compile the Rust
extension module and install it into a Python virtual environment.
Prerequisites¶
Python 3.10 or newer
The Rust toolchain (see Development Environment)
protocon yourPATH
All commands in this section are run from the python/ directory:
cd python
Create a virtual environment¶
Using pip:
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install -r requirements.txt
The pip upgrade is required because maturin develop invokes pip install --group, which
needs pip 25.1 or newer. The pip shipped with python3 -m venv is often older than that.
Using uv:
uv sync --dev --no-install-package ballista
Build and install¶
maturin develop compiles the Rust extension and installs it into the active virtual environment
as an editable package, so subsequent Python imports pick up your local changes.
Using pip:
maturin develop # debug build
maturin develop --release # release build (slower to compile, faster at runtime)
Using uv:
uv run --no-project maturin develop --uv
To produce a wheel without installing it (for example, to share or publish):
uv run --no-project maturin build --release --strip
Run the tests¶
Using pip:
python3 -m pytest
Using uv:
uv run --no-project pytest
For more detail on the underlying workflow, including tips for improving build speed, see the DataFusion Python contributor guide.
Benchmarking¶
For performance testing and benchmarking with TPC-H and other datasets, see the benchmarks README.
This includes instructions for:
Generating TPC-H test data
Running benchmarks against DataFusion and Ballista
Comparing performance with Apache Spark
Running load tests