Apache DataFusion Ballista 0.9.0 Changelog#
Implemented enhancements:
Support count distinct aggregation function #411
Use multi-task definition in pull-based execution loop #400
Make the scheduler event loop buffer size configurable #397
Remove active execution graph when the related job is successful or failed. #391
Improve launch task efficiency by calling LaunchMultiTask #389
Use
tokio::sync::Semaphoreto wait for available task slots #388stdout and file log level settings are inconsistent #385
Use dedicated executor in pull based loop #383
Avoid calling scheduler when the executor cannot accept new tasks #377
Add round robin executor slots reservation policy for the scheduler to evenly assign tasks to executors #371
Switch to mimalloc and enable by default #369
Integration test script should use docker-compose #364
Use local shuffle reader in containerized environments #356
Add
--extoption to benchmark #352Add job cancel in the UI #350
Using local shuffle reader avoid flight rpc call. #346
Add a Helm Chart #321
[UI] Show list of query stages with metrics #306
[UI] Add ability to specify job name and have it show in the job listing page in the UI #277
[UI] Add ability to download query plans in dot format #276
[UI] Add ability to render query plans #275
Add REST API documentation to User Guide #272
Graceful shutdown: Handle
SIGTERM#266[EPIC] Scheduler UI #265
Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature #259
Add a feature based object store provider #257
Add docker build files #248
Allow IDEs to recognize generated code #246
Add user guide section on Flight SQL support #230
dev/release/README.mdis outdated #228Make ShuffleReaderExec output less verbose #211
Add LaunchMultiTask rpc interface for executor #209
Make executor fetch shuffle partition data in parallel #208
Concurrency control and rate limit during shuffle reader #195
Update User Guide #160
Ballista 0.8.0 Release #159
Save encoded execution plan in the ExecutionStage to reduce cost of task serialization and deserialization #142
Failed task retry #140
Redefine the executor task slots #132
Use ArrowFlight bearer token auth to create a session key for FlightSql clients #112
Leverage Atomic for the in-memory states in Scheduler #101
Introduce the object stores in datafusion-contrib as optional features #87
Support multiple paths for ListingTableScanNode #75
Need clean up intermediate data in Ballista #9
Ballista does not support external file systems #10
Fixed bugs:
Build errors in ./dev/build-ballista-rust.sh #407
The Ballista Scheduler Dockerfile copies a file that no longer exists #402
Benchmark q20 fails #374
Integration tests fail #360
Helm deploy fails #344
Executor get stopped unexpected #333
Executor poll work loop failure #311
Queries with
LIMITare failing with “PhysicalExtensionCodec is not provided” #300Schema inference does not work in Ballista-cli with a remote context #287
There are bugs in the yarn build github misses but break our internal build #270
Race condition running docker-compose #267
Scheduler UI not working in Docker image #250
Use bind host rather than the external host for starting a local executor service #244
Initial query stages read parquet files and repartition them needlessly #243
Cannot build Docker images on macOS 12.5.1 with M1 chip #234
CLI uses DataFusion context if no host or port are provided #219
Unsupported binary operator
StringConcat#201Ballista assumes all aggregate expressions are not DISTINCT #5
Start ballista ui with docker, but it can not found ballista scheduler #11
Cannot build Ballista docker images on Apple silicon #17
Documentation updates:
Closed issues:
Automatic version updates for github actions with dependabot #127
Merged pull requests:
Return multiple tasks in poll_work based on free slots #429 (Dandandan)
Run integration tests as part of release verification script #426 (andygrove)
Bump actions/setup-node from 2 to 3 #424 (dependabot[bot])
Bump actions/setup-python from 2 to 4 #423 (dependabot[bot])
Bump actions/checkout from 2 to 3 #422 (dependabot[bot])
Bump actions/download-artifact from 2 to 3 #421 (dependabot[bot])
Bump actions/upload-artifact from 2 to 3 #420 (dependabot[bot])
Use local shuffle reader in containerized environments and some impro… #399 (Ted-Jiang)
Make the scheduler event loop buffer size configurable #398 (yahoNanJing)
Add RoundRobinLocal slots policy for caching executor data to avoid seld persistency #396 (yahoNanJing)
Add round robin executor slots reservation policy for the scheduler to evenly assign tasks to executors #395 (yahoNanJing)
Improve launch task efficiency by calling LaunchMultiTask #394 (yahoNanJing)
Cache encoded stage plan #393 (yahoNanJing)
Remove active execution graph when the related job is successful or failed #392 (yahoNanJing)
Update flatbuffers requirement from 2.1.2 to 22.9.29 #390 (dependabot[bot])
Avoid calling scheduler when the executor cannot accept new tasks #378 (Dandandan)
Switch to mimalloc and enable by default in executor #370 (Dandandan)
Benchmark looks for path with and without extension #354 (andygrove)
Using local shuffle reader avoid flight rpc call. #347 (Ted-Jiang)
Make helm deployable #345 (avantgardnerio)
Check executor id consistency when receive stop executor request #335 (yahoNanJing)
Downgrade
docker-compose.yamlto version 3.3 so that we can support Ubuntu 20.04.4 LTS #329 (andygrove)Dependabot stop suggesting arrow and datafusion updates #324 (andygrove)
Show job stages metrics #323 (onthebridgetonowhere)
Add helm chart #322 (avantgardnerio)
Atomic support for enhancement #319 (metesynnada)
Allow automatic schema inference when registering csv #313 (r4ntix)
Add ability to specify job name and have it show in the job listing page in the UI #312 (andygrove)
Add REST API to generate DOT graph for individual query stage #310 (andygrove)
[UI] Use tabbed pane with Queries and Executors tabs #309 (andygrove)
Add support for SortPreservingMergeExec; fix LIMIT bug #304 (andygrove)
[UI] Add ability to view query plans directly in the UI #301 (onthebridgetonowhere)
Replace function
from_proto_binary_opfrom upstream #298 (askoa)Fix dead link in contribution guideline readme file #297 (onthebridgetonowhere)
UI code cleanup #291 (KenSuenobu)
Fix documentation example #288 (onthebridgetonowhere)
Enabled download of dot files from Download icon #279 (KenSuenobu)
Also run yarn build to catch JavaScript errors in CI #271 (avantgardnerio)
Store sessions so users can register tables and query them through flight #269 (avantgardnerio)
Fix compose for Ian #268 (avantgardnerio)
Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature #260 (yahoNanJing)
Add a feature based object store provider #258 (yahoNanJing)
Make fetch shuffle partition data in parallel #256 (yahoNanJing)
Add LaunchMultiTask rpc interface for executor #255 (yahoNanJing)
CLI uses ballista context instead of datafusion context in local mode #252 (r4ntix)
Generate into source folder to make IDEs happy #247 (avantgardnerio)
Use bind host rather than the external host for starting a local executor service #245 (yahoNanJing)
Add REST endpoint to get DOT graph of a job #242 (andygrove)
Clean up job data on both Scheduler and Executor #188 (mingmwang)
Update etcd-client requirement from 0.9 to 0.10 #111 (dependabot[bot])
Bump terser from 4.8.0 to 4.8.1 in /ballista/ui/scheduler #91 (dependabot[bot])
Bump jsdom from 16.4.0 to 16.7.0 in /ballista/ui/scheduler #74 (dependabot[bot])
Bump numpy from 1.21.3 to 1.22.0 in /python #72 (dependabot[bot])