Starting a Ballista Cluster using Docker Compose

Docker Compose is a convenient way to launch a cluster when testing locally.

Build Docker Images

Run the following commands to download the official Docker image:

docker pull ghcr.io/apache/datafusion-ballista-standalone:latest

Altenatively run the following commands to clone the source repository and build the Docker images from source:

git clone git@github.com:apache/datafusion-ballista.git -b latest
cd datafusion-ballista
./dev/build-ballista-docker.sh

This will create the following images:

  • apache/datafusion-ballista-benchmarks:latest

  • apache/datafusion-ballista-cli:latest

  • apache/datafusion-ballista-executor:latest

  • apache/datafusion-ballista-scheduler:latest

  • apache/datafusion-ballista-standalone:latest

Start a Cluster

Using the docker-compose.yml from the source repository, run the following command to start a cluster:

docker-compose up --build

This should show output similar to the following:

$ docker-compose up
Creating network "ballista-benchmarks_default" with the default driver
Creating ballista-benchmarks_etcd_1 ... done
Creating ballista-benchmarks_ballista-scheduler_1 ... done
Creating ballista-benchmarks_ballista-executor_1  ... done
Attaching to ballista-benchmarks_etcd_1, ballista-benchmarks_ballista-scheduler_1, ballista-benchmarks_ballista-executor_1
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] Running with config:
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] work_dir: /tmp/.tmpLVx39c
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] concurrent_tasks: 4
ballista-scheduler_1  | [2021-08-28T15:55:22Z INFO  ballista_scheduler] Ballista v0.12.0 Scheduler listening on 0.0.0.0:50050
ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] Ballista v0.12.0 Rust Executor listening on 0.0.0.0:50051

The scheduler listens on port 50050 and this is the port that clients will need to connect to.

Connect from the Ballista CLI

docker run --network=host -it apache/datafusion-ballista-cli:latest --host localhost --port 50050