Starting a Ballista Cluster using Docker¶
Build Docker Images¶
Run the following commands to download the official Docker image:
docker pull ghcr.io/apache/datafusion-ballista-standalone:latest
Alternatively run the following commands to clone the source repository and build the Docker images from source:
git clone git@github.com:apache/datafusion-ballista.git
cd datafusion-ballista
./dev/build-ballista-docker.sh
This will create the following images:
apache/datafusion-ballista-benchmarks:latestapache/datafusion-ballista-cli:latestapache/datafusion-ballista-executor:latestapache/datafusion-ballista-scheduler:latestapache/datafusion-ballista-standalone:latest
Start a Cluster¶
Start a Scheduler¶
Start a scheduler using the following syntax:
docker run --network=host \
-d apache/datafusion-ballista-scheduler:latest \
--bind-port 50050
Run docker ps to check that the process is running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a756055576f3 apache/datafusion-ballista-scheduler:latest "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson
Run docker logs CONTAINER_ID to check the output from the process:
$ docker logs a756055576f3
INFO ballista_scheduler::scheduler_process: Ballista v52.0.0 Scheduler listening on 0.0.0.0:50050
INFO ballista_scheduler::scheduler_process: Starting Scheduler grpc server with task scheduling policy of PullStaged
INFO ballista_scheduler::scheduler_server::query_stage_scheduler: Starting QueryStageScheduler
INFO ballista_core::event_loop: Starting the event loop query_stage
Start Executors¶
Start one or more executor processes. Each executor process will need to listen on a different port.
docker run --network=host \
-d apache/datafusion-ballista-executor:latest \
--external-host localhost --bind-port 50051
Use docker ps to check that both the scheduler and executor(s) are now running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fb8b530cee6d apache/datafusion-ballista-executor:latest "/root/executor-entr…" 2 seconds ago Up 1 second gallant_galois
a756055576f3 apache/datafusion-ballista-scheduler:latest "/root/scheduler-ent…" 8 seconds ago Up 8 seconds xenodochial_carson
Use docker logs CONTAINER_ID to check the output from the executor(s):
$ docker logs fb8b530cee6d
INFO ballista_executor::executor_process: Running with config:
INFO ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
INFO ballista_executor::executor_process: concurrent_tasks: 48
INFO ballista_executor::executor_process: Ballista v52.0.0 Rust Executor Flight Server listening on 0.0.0.0:50051
INFO ballista_executor::execution_loop: Starting poll work loop with scheduler
Connect from the CLI¶
docker run --network=host -it apache/datafusion-ballista-cli:latest --host localhost --port 50050