Apache DataFusion Ballista 0.9.0 Changelog#

Full Changelog

Implemented enhancements:

  • Support count distinct aggregation function #411

  • Use multi-task definition in pull-based execution loop #400

  • Make the scheduler event loop buffer size configurable #397

  • Remove active execution graph when the related job is successful or failed. #391

  • Improve launch task efficiency by calling LaunchMultiTask #389

  • Use tokio::sync::Semaphore to wait for available task slots #388

  • stdout and file log level settings are inconsistent #385

  • Use dedicated executor in pull based loop #383

  • Avoid calling scheduler when the executor cannot accept new tasks #377

  • Add round robin executor slots reservation policy for the scheduler to evenly assign tasks to executors #371

  • Switch to mimalloc and enable by default #369

  • Integration test script should use docker-compose #364

  • Use local shuffle reader in containerized environments #356

  • Add --ext option to benchmark #352

  • Add job cancel in the UI #350

  • Using local shuffle reader avoid flight rpc call. #346

  • Add a Helm Chart #321

  • [UI] Show list of query stages with metrics #306

  • [UI] Add ability to specify job name and have it show in the job listing page in the UI #277

  • [UI] Add ability to download query plans in dot format #276

  • [UI] Add ability to render query plans #275

  • Add REST API documentation to User Guide #272

  • Graceful shutdown: Handle SIGTERM #266

  • [EPIC] Scheduler UI #265

  • Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature #259

  • Add a feature based object store provider #257

  • Add docker build files #248

  • Allow IDEs to recognize generated code #246

  • Add user guide section on Flight SQL support #230

  • dev/release/README.md is outdated #228

  • Make ShuffleReaderExec output less verbose #211

  • Add LaunchMultiTask rpc interface for executor #209

  • Make executor fetch shuffle partition data in parallel #208

  • Concurrency control and rate limit during shuffle reader #195

  • Update User Guide #160

  • Ballista 0.8.0 Release #159

  • Save encoded execution plan in the ExecutionStage to reduce cost of task serialization and deserialization #142

  • Failed task retry #140

  • Redefine the executor task slots #132

  • Use ArrowFlight bearer token auth to create a session key for FlightSql clients #112

  • Leverage Atomic for the in-memory states in Scheduler #101

  • Introduce the object stores in datafusion-contrib as optional features #87

  • Support multiple paths for ListingTableScanNode #75

  • Need clean up intermediate data in Ballista #9

  • Ballista does not support external file systems #10

Fixed bugs:

  • Build errors in ./dev/build-ballista-rust.sh #407

  • The Ballista Scheduler Dockerfile copies a file that no longer exists #402

  • Benchmark q20 fails #374

  • Integration tests fail #360

  • Helm deploy fails #344

  • Executor get stopped unexpected #333

  • Executor poll work loop failure #311

  • Queries with LIMIT are failing with “PhysicalExtensionCodec is not provided” #300

  • Schema inference does not work in Ballista-cli with a remote context #287

  • There are bugs in the yarn build github misses but break our internal build #270

  • Race condition running docker-compose #267

  • Scheduler UI not working in Docker image #250

  • Use bind host rather than the external host for starting a local executor service #244

  • Initial query stages read parquet files and repartition them needlessly #243

  • Cannot build Docker images on macOS 12.5.1 with M1 chip #234

  • CLI uses DataFusion context if no host or port are provided #219

  • Unsupported binary operator StringConcat #201

  • Ballista assumes all aggregate expressions are not DISTINCT #5

  • Start ballista ui with docker, but it can not found ballista scheduler #11

  • Cannot build Ballista docker images on Apple silicon #17

Documentation updates:

Closed issues:

  • Automatic version updates for github actions with dependabot #127

Merged pull requests: