Using FlightSQL to Connect to Ballista

One of the easiest ways to start with Ballista is to plug it into your existing data infrastructure using support for Arrow Flight SQL JDBC.

This is optional scheduler feature which should be enabled with flight-sql feature

Getting started involves these main steps:

  1. Installing prerequisites

  2. Run the Ballista docker container

  3. Download the Arrow Flight SQL JDBC Driver

  4. Install the driver into your favorite JDBC tool

  5. Run a “hello, world!” query

  6. Register a table and run more complicated queries

Prerequisites

Ubuntu

sudo apt-get update
sudo apt-get install -y docker.io

MacOS

brew install docker

Windows

choco install docker-desktop

Run Docker Container

docker run -p 50050:50050 --rm ghcr.io/apache/datafusion-ballista-standalone:0.10.0

Download the FlightSQL JDBC Driver

Download the FlightSQL JDBC Driver from Maven Central.

Use the Driver in your Favorite Data Tool

The important pieces of information:

Key

Value

Driver file

flight-sql-jdbc-driver-10.0.0-SNAPSHOT.jar

Class Name

org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver

Authentication

User & Password

Username

admin

Password

password

Advanced Options

useEncryption=false

URL

jdbc:arrow-flight://127.0.0.1:50050

Run a “Hello, World!” Query

select 'Hello from DataFusion Ballista!' as greeting;

Run a Complex Query

In order to run queries against data, tables need to be “registered” with the current session (and re-registered upon each new connection).

To register the built-in demo table, use the syntax below:

create external table taxi stored as parquet location '/data/yellow_tripdata_2022-01.parquet';

Once the table has been registered, all the normal SQL queries can be performed:

select * from taxi limit 10;

🎉 Happy querying! 🎉