Deploying a standalone Ballista cluster using cargo install

Another simple way to start a local cluster for testing purposes is to use cargo to install the scheduler and executor crates.

cargo install --locked ballista-scheduler
cargo install --locked ballista-executor

With these crates installed, it is now possible to start a scheduler process.

RUST_LOG=info ballista-scheduler

The scheduler will bind to port 50050 by default.

Next, start an executor processes in a new terminal session.

RUST_LOG=info ballista-executor

The executor will bind to port 50051 by default. Additional executors can be started by manually specifying a bind port. For example:

RUST_LOG=info ballista-executor --bind-port 50052

Installing with Optional Features

Ballista supports optional features that can be enabled during installation using the --features flag.

Spark-Compatible Functions

To enable Spark-compatible scalar, aggregate, and window functions from the datafusion-spark crate:

# Install scheduler with spark-compat feature
cargo install --locked --features spark-compat ballista-scheduler

# Install executor with spark-compat feature
cargo install --locked --features spark-compat ballista-executor

# Install CLI with spark-compat feature
cargo install --locked --features spark-compat ballista-cli

When the spark-compat feature is enabled, additional functions like sha1, expm1, sha2, and others become available in SQL queries.

Note: The spark-compat feature provides Spark-compatible expressions and functions only, not full Apache Spark API compatibility.

For more details about Spark-compatible functions, see Spark-Compatible Functions.