Using FlightSQL to Connect to Ballista¶
One of the easiest ways to start with Ballista is to plug it into your existing data infrastructure using support for Arrow Flight SQL JDBC.
This is optional scheduler feature which should be enabled with
flight-sql
feature
Getting started involves these main steps:
Run the Ballista docker container
Download the Arrow Flight SQL JDBC Driver
Install the driver into your favorite JDBC tool
Run a “hello, world!” query
Register a table and run more complicated queries
Prerequisites¶
Ubuntu¶
sudo apt-get update
sudo apt-get install -y docker.io
MacOS¶
brew install docker
Windows¶
choco install docker-desktop
Run Docker Container¶
docker run -p 50050:50050 --rm ghcr.io/apache/datafusion-ballista-standalone:0.10.0
Download the FlightSQL JDBC Driver¶
Download the FlightSQL JDBC Driver from Maven Central.
Use the Driver in your Favorite Data Tool¶
The important pieces of information:
Key |
Value |
---|---|
Driver file |
flight-sql-jdbc-driver-10.0.0-SNAPSHOT.jar |
Class Name |
org.apache.arrow.driver.jdbc.ArrowFlightJdbcDriver |
Authentication |
User & Password |
Username |
admin |
Password |
password |
Advanced Options |
useEncryption=false |
URL |
jdbc:arrow-flight://127.0.0.1:50050 |
Run a “Hello, World!” Query¶
select 'Hello from DataFusion Ballista!' as greeting;
Run a Complex Query¶
In order to run queries against data, tables need to be “registered” with the current session (and re-registered upon each new connection).
To register the built-in demo table, use the syntax below:
create external table taxi stored as parquet location '/data/yellow_tripdata_2022-01.parquet';
Once the table has been registered, all the normal SQL queries can be performed:
select * from taxi limit 10;
🎉 Happy querying! 🎉