# Apache DataFusion Ballista 0.9.0 Changelog [Full Changelog](https://github.com/apache/datafusion-ballista/compare/0.8.0...0.9.0) **Implemented enhancements:** - Support count distinct aggregation function [\#411](https://github.com/apache/datafusion-ballista/issues/411) - Use multi-task definition in pull-based execution loop [\#400](https://github.com/apache/datafusion-ballista/issues/400) - Make the scheduler event loop buffer size configurable [\#397](https://github.com/apache/datafusion-ballista/issues/397) - Remove active execution graph when the related job is successful or failed. [\#391](https://github.com/apache/datafusion-ballista/issues/391) - Improve launch task efficiency by calling LaunchMultiTask [\#389](https://github.com/apache/datafusion-ballista/issues/389) - Use `tokio::sync::Semaphore` to wait for available task slots [\#388](https://github.com/apache/datafusion-ballista/issues/388) - stdout and file log level settings are inconsistent [\#385](https://github.com/apache/datafusion-ballista/issues/385) - Use dedicated executor in pull based loop [\#383](https://github.com/apache/datafusion-ballista/issues/383) - Avoid calling scheduler when the executor cannot accept new tasks [\#377](https://github.com/apache/datafusion-ballista/issues/377) - Add round robin executor slots reservation policy for the scheduler to evenly assign tasks to executors [\#371](https://github.com/apache/datafusion-ballista/issues/371) - Switch to mimalloc and enable by default [\#369](https://github.com/apache/datafusion-ballista/issues/369) - Integration test script should use docker-compose [\#364](https://github.com/apache/datafusion-ballista/issues/364) - Use local shuffle reader in containerized environments [\#356](https://github.com/apache/datafusion-ballista/issues/356) - Add `--ext` option to benchmark [\#352](https://github.com/apache/datafusion-ballista/issues/352) - Add job cancel in the UI [\#350](https://github.com/apache/datafusion-ballista/issues/350) - Using local shuffle reader avoid flight rpc call. [\#346](https://github.com/apache/datafusion-ballista/issues/346) - Add a Helm Chart [\#321](https://github.com/apache/datafusion-ballista/issues/321) - \[UI\] Show list of query stages with metrics [\#306](https://github.com/apache/datafusion-ballista/issues/306) - \[UI\] Add ability to specify job name and have it show in the job listing page in the UI [\#277](https://github.com/apache/datafusion-ballista/issues/277) - \[UI\] Add ability to download query plans in dot format [\#276](https://github.com/apache/datafusion-ballista/issues/276) - \[UI\] Add ability to render query plans [\#275](https://github.com/apache/datafusion-ballista/issues/275) - Add REST API documentation to User Guide [\#272](https://github.com/apache/datafusion-ballista/issues/272) - Graceful shutdown: Handle `SIGTERM` [\#266](https://github.com/apache/datafusion-ballista/issues/266) - \[EPIC\] Scheduler UI [\#265](https://github.com/apache/datafusion-ballista/issues/265) - Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature [\#259](https://github.com/apache/datafusion-ballista/issues/259) - Add a feature based object store provider [\#257](https://github.com/apache/datafusion-ballista/issues/257) - Add docker build files [\#248](https://github.com/apache/datafusion-ballista/issues/248) - Allow IDEs to recognize generated code [\#246](https://github.com/apache/datafusion-ballista/issues/246) - Add user guide section on Flight SQL support [\#230](https://github.com/apache/datafusion-ballista/issues/230) - `dev/release/README.md` is outdated [\#228](https://github.com/apache/datafusion-ballista/issues/228) - Make ShuffleReaderExec output less verbose [\#211](https://github.com/apache/datafusion-ballista/issues/211) - Add LaunchMultiTask rpc interface for executor [\#209](https://github.com/apache/datafusion-ballista/issues/209) - Make executor fetch shuffle partition data in parallel [\#208](https://github.com/apache/datafusion-ballista/issues/208) - Concurrency control and rate limit during shuffle reader [\#195](https://github.com/apache/datafusion-ballista/issues/195) - Update User Guide [\#160](https://github.com/apache/datafusion-ballista/issues/160) - Ballista 0.8.0 Release [\#159](https://github.com/apache/datafusion-ballista/issues/159) - Save encoded execution plan in the ExecutionStage to reduce cost of task serialization and deserialization [\#142](https://github.com/apache/datafusion-ballista/issues/142) - Failed task retry [\#140](https://github.com/apache/datafusion-ballista/issues/140) - Redefine the executor task slots [\#132](https://github.com/apache/datafusion-ballista/issues/132) - Use ArrowFlight bearer token auth to create a session key for FlightSql clients [\#112](https://github.com/apache/datafusion-ballista/issues/112) - Leverage Atomic for the in-memory states in Scheduler [\#101](https://github.com/apache/datafusion-ballista/issues/101) - Introduce the object stores in datafusion-contrib as optional features [\#87](https://github.com/apache/datafusion-ballista/issues/87) - Support multiple paths for ListingTableScanNode [\#75](https://github.com/apache/datafusion-ballista/issues/75) - Need clean up intermediate data in Ballista [\#9](https://github.com/apache/datafusion-ballista/issues/9) - Ballista does not support external file systems [\#10](https://github.com/apache/datafusion-ballista/issues/10) **Fixed bugs:** - Build errors in ./dev/build-ballista-rust.sh [\#407](https://github.com/apache/datafusion-ballista/issues/407) - The Ballista Scheduler Dockerfile copies a file that no longer exists [\#402](https://github.com/apache/datafusion-ballista/issues/402) - Benchmark q20 fails [\#374](https://github.com/apache/datafusion-ballista/issues/374) - Integration tests fail [\#360](https://github.com/apache/datafusion-ballista/issues/360) - Helm deploy fails [\#344](https://github.com/apache/datafusion-ballista/issues/344) - Executor get stopped unexpected [\#333](https://github.com/apache/datafusion-ballista/issues/333) - Executor poll work loop failure [\#311](https://github.com/apache/datafusion-ballista/issues/311) - Queries with `LIMIT` are failing with "PhysicalExtensionCodec is not provided" [\#300](https://github.com/apache/datafusion-ballista/issues/300) - Schema inference does not work in Ballista-cli with a remote context [\#287](https://github.com/apache/datafusion-ballista/issues/287) - There are bugs in the yarn build github misses but break our internal build [\#270](https://github.com/apache/datafusion-ballista/issues/270) - Race condition running docker-compose [\#267](https://github.com/apache/datafusion-ballista/issues/267) - Scheduler UI not working in Docker image [\#250](https://github.com/apache/datafusion-ballista/issues/250) - Use bind host rather than the external host for starting a local executor service [\#244](https://github.com/apache/datafusion-ballista/issues/244) - Initial query stages read parquet files and repartition them needlessly [\#243](https://github.com/apache/datafusion-ballista/issues/243) - Cannot build Docker images on macOS 12.5.1 with M1 chip [\#234](https://github.com/apache/datafusion-ballista/issues/234) - CLI uses DataFusion context if no host or port are provided [\#219](https://github.com/apache/datafusion-ballista/issues/219) - Unsupported binary operator `StringConcat` [\#201](https://github.com/apache/datafusion-ballista/issues/201) - Ballista assumes all aggregate expressions are not DISTINCT [\#5](https://github.com/apache/datafusion-ballista/issues/5) - Start ballista ui with docker, but it can not found ballista scheduler [\#11](https://github.com/apache/datafusion-ballista/issues/11) - Cannot build Ballista docker images on Apple silicon [\#17](https://github.com/apache/datafusion-ballista/issues/17) **Documentation updates:** - Fixup links in README.md [\#366](https://github.com/apache/datafusion-ballista/pull/366) ([romanz](https://github.com/romanz)) - Update README in preparation for 0.9.0 release [\#318](https://github.com/apache/datafusion-ballista/pull/318) ([andygrove](https://github.com/andygrove)) - User Guide improvements [\#274](https://github.com/apache/datafusion-ballista/pull/274) ([andygrove](https://github.com/andygrove)) **Closed issues:** - Automatic version updates for github actions with dependabot [\#127](https://github.com/apache/datafusion-ballista/issues/127) **Merged pull requests:** - Return multiple tasks in poll_work based on free slots [\#429](https://github.com/apache/datafusion-ballista/pull/429) ([Dandandan](https://github.com/Dandandan)) - Run integration tests as part of release verification script [\#426](https://github.com/apache/datafusion-ballista/pull/426) ([andygrove](https://github.com/andygrove)) - Bump actions/setup-node from 2 to 3 [\#424](https://github.com/apache/datafusion-ballista/pull/424) ([dependabot[bot]](https://github.com/apps/dependabot)) - Bump actions/setup-python from 2 to 4 [\#423](https://github.com/apache/datafusion-ballista/pull/423) ([dependabot[bot]](https://github.com/apps/dependabot)) - Bump actions/checkout from 2 to 3 [\#422](https://github.com/apache/datafusion-ballista/pull/422) ([dependabot[bot]](https://github.com/apps/dependabot)) - Bump actions/download-artifact from 2 to 3 [\#421](https://github.com/apache/datafusion-ballista/pull/421) ([dependabot[bot]](https://github.com/apps/dependabot)) - Bump actions/upload-artifact from 2 to 3 [\#420](https://github.com/apache/datafusion-ballista/pull/420) ([dependabot[bot]](https://github.com/apps/dependabot)) - MINOR: Fix yarn warnings [\#415](https://github.com/apache/datafusion-ballista/pull/415) ([andygrove](https://github.com/andygrove)) - Fix q20 sql typo in benchmarks [\#409](https://github.com/apache/datafusion-ballista/pull/409) ([r4ntix](https://github.com/r4ntix)) - MINOR: Add notes on Apache Reporter [\#401](https://github.com/apache/datafusion-ballista/pull/401) ([andygrove](https://github.com/andygrove)) - Use local shuffle reader in containerized environments and some impro… [\#399](https://github.com/apache/datafusion-ballista/pull/399) ([Ted-Jiang](https://github.com/Ted-Jiang)) - Make the scheduler event loop buffer size configurable [\#398](https://github.com/apache/datafusion-ballista/pull/398) ([yahoNanJing](https://github.com/yahoNanJing)) - Add RoundRobinLocal slots policy for caching executor data to avoid seld persistency [\#396](https://github.com/apache/datafusion-ballista/pull/396) ([yahoNanJing](https://github.com/yahoNanJing)) - Add round robin executor slots reservation policy for the scheduler to evenly assign tasks to executors [\#395](https://github.com/apache/datafusion-ballista/pull/395) ([yahoNanJing](https://github.com/yahoNanJing)) - Improve launch task efficiency by calling LaunchMultiTask [\#394](https://github.com/apache/datafusion-ballista/pull/394) ([yahoNanJing](https://github.com/yahoNanJing)) - Cache encoded stage plan [\#393](https://github.com/apache/datafusion-ballista/pull/393) ([yahoNanJing](https://github.com/yahoNanJing)) - Remove active execution graph when the related job is successful or failed [\#392](https://github.com/apache/datafusion-ballista/pull/392) ([yahoNanJing](https://github.com/yahoNanJing)) - Update flatbuffers requirement from 2.1.2 to 22.9.29 [\#390](https://github.com/apache/datafusion-ballista/pull/390) ([dependabot[bot]](https://github.com/apps/dependabot)) - Unified the log level configuration behavior [\#386](https://github.com/apache/datafusion-ballista/pull/386) ([r4ntix](https://github.com/r4ntix)) - Add DistinctCount support [\#384](https://github.com/apache/datafusion-ballista/pull/384) ([r4ntix](https://github.com/r4ntix)) - Pull-based execution loop improvements [\#380](https://github.com/apache/datafusion-ballista/pull/380) ([Dandandan](https://github.com/Dandandan)) - Fix latest commit [\#379](https://github.com/apache/datafusion-ballista/pull/379) ([Dandandan](https://github.com/Dandandan)) - Avoid calling scheduler when the executor cannot accept new tasks [\#378](https://github.com/apache/datafusion-ballista/pull/378) ([Dandandan](https://github.com/Dandandan)) - Switch to mimalloc and enable by default in executor [\#370](https://github.com/apache/datafusion-ballista/pull/370) ([Dandandan](https://github.com/Dandandan)) - Benchmark looks for path with and without extension [\#354](https://github.com/apache/datafusion-ballista/pull/354) ([andygrove](https://github.com/andygrove)) - Implement job cancellation in UI [\#349](https://github.com/apache/datafusion-ballista/pull/349) ([Dandandan](https://github.com/Dandandan)) - Using local shuffle reader avoid flight rpc call. [\#347](https://github.com/apache/datafusion-ballista/pull/347) ([Ted-Jiang](https://github.com/Ted-Jiang)) - Make helm deployable [\#345](https://github.com/apache/datafusion-ballista/pull/345) ([avantgardnerio](https://github.com/avantgardnerio)) - Benchmark & UI improvements [\#343](https://github.com/apache/datafusion-ballista/pull/343) ([andygrove](https://github.com/andygrove)) - Add `cancel_job` REST API [\#340](https://github.com/apache/datafusion-ballista/pull/340) ([tfeda](https://github.com/tfeda)) - Fix labeler [\#337](https://github.com/apache/datafusion-ballista/pull/337) ([andygrove](https://github.com/andygrove)) - Upgrade to DataFusion 13.0.0 [\#336](https://github.com/apache/datafusion-ballista/pull/336) ([andygrove](https://github.com/andygrove)) - Check executor id consistency when receive stop executor request [\#335](https://github.com/apache/datafusion-ballista/pull/335) ([yahoNanJing](https://github.com/yahoNanJing)) - Enable more benchmark serde tests [\#331](https://github.com/apache/datafusion-ballista/pull/331) ([andygrove](https://github.com/andygrove)) - Downgrade `docker-compose.yaml` to version 3.3 so that we can support Ubuntu 20.04.4 LTS [\#329](https://github.com/apache/datafusion-ballista/pull/329) ([andygrove](https://github.com/andygrove)) - update labeler [\#326](https://github.com/apache/datafusion-ballista/pull/326) ([andygrove](https://github.com/andygrove)) - Upgrade to DataFusion 13.0.0-rc1 [\#325](https://github.com/apache/datafusion-ballista/pull/325) ([andygrove](https://github.com/andygrove)) - Dependabot stop suggesting arrow and datafusion updates [\#324](https://github.com/apache/datafusion-ballista/pull/324) ([andygrove](https://github.com/andygrove)) - Show job stages metrics [\#323](https://github.com/apache/datafusion-ballista/pull/323) ([onthebridgetonowhere](https://github.com/onthebridgetonowhere)) - Add helm chart [\#322](https://github.com/apache/datafusion-ballista/pull/322) ([avantgardnerio](https://github.com/avantgardnerio)) - Atomic support for enhancement [\#319](https://github.com/apache/datafusion-ballista/pull/319) ([metesynnada](https://github.com/metesynnada)) - Allow automatic schema inference when registering csv [\#313](https://github.com/apache/datafusion-ballista/pull/313) ([r4ntix](https://github.com/r4ntix)) - Add ability to specify job name and have it show in the job listing page in the UI [\#312](https://github.com/apache/datafusion-ballista/pull/312) ([andygrove](https://github.com/andygrove)) - Add REST API to generate DOT graph for individual query stage [\#310](https://github.com/apache/datafusion-ballista/pull/310) ([andygrove](https://github.com/andygrove)) - \[UI\] Use tabbed pane with Queries and Executors tabs [\#309](https://github.com/apache/datafusion-ballista/pull/309) ([andygrove](https://github.com/andygrove)) - REST API to get query stages [\#305](https://github.com/apache/datafusion-ballista/pull/305) ([andygrove](https://github.com/andygrove)) - Add support for SortPreservingMergeExec; fix LIMIT bug [\#304](https://github.com/apache/datafusion-ballista/pull/304) ([andygrove](https://github.com/andygrove)) - Add Python script to run benchmarks [\#302](https://github.com/apache/datafusion-ballista/pull/302) ([andygrove](https://github.com/andygrove)) - \[UI\] Add ability to view query plans directly in the UI [\#301](https://github.com/apache/datafusion-ballista/pull/301) ([onthebridgetonowhere](https://github.com/onthebridgetonowhere)) - Update datafusion.proto [\#299](https://github.com/apache/datafusion-ballista/pull/299) ([andygrove](https://github.com/andygrove)) - Replace function `from_proto_binary_op` from upstream [\#298](https://github.com/apache/datafusion-ballista/pull/298) ([askoa](https://github.com/askoa)) - Fix dead link in contribution guideline readme file [\#297](https://github.com/apache/datafusion-ballista/pull/297) ([onthebridgetonowhere](https://github.com/onthebridgetonowhere)) - UI code cleanup [\#291](https://github.com/apache/datafusion-ballista/pull/291) ([KenSuenobu](https://github.com/KenSuenobu)) - Add support for S3 data sources [\#290](https://github.com/apache/datafusion-ballista/pull/290) ([andygrove](https://github.com/andygrove)) - Use latest datafusion [\#289](https://github.com/apache/datafusion-ballista/pull/289) ([andygrove](https://github.com/andygrove)) - Fix documentation example [\#288](https://github.com/apache/datafusion-ballista/pull/288) ([onthebridgetonowhere](https://github.com/onthebridgetonowhere)) - Improve formatting of job status in UI [\#286](https://github.com/apache/datafusion-ballista/pull/286) ([andygrove](https://github.com/andygrove)) - Enabled download of dot files from Download icon [\#279](https://github.com/apache/datafusion-ballista/pull/279) ([KenSuenobu](https://github.com/KenSuenobu)) - Executor graceful shutdown: Handle SIGTERM [\#278](https://github.com/apache/datafusion-ballista/pull/278) ([mingmwang](https://github.com/mingmwang)) - Also run yarn build to catch JavaScript errors in CI [\#271](https://github.com/apache/datafusion-ballista/pull/271) ([avantgardnerio](https://github.com/avantgardnerio)) - Store sessions so users can register tables and query them through flight [\#269](https://github.com/apache/datafusion-ballista/pull/269) ([avantgardnerio](https://github.com/avantgardnerio)) - Fix compose for Ian [\#268](https://github.com/apache/datafusion-ballista/pull/268) ([avantgardnerio](https://github.com/avantgardnerio)) - Task level retry and Stage level retry [\#261](https://github.com/apache/datafusion-ballista/pull/261) ([mingmwang](https://github.com/mingmwang)) - Introduce the datafusion-objectstore-hdfs in datafusion-contrib as an object store feature [\#260](https://github.com/apache/datafusion-ballista/pull/260) ([yahoNanJing](https://github.com/yahoNanJing)) - Add a feature based object store provider [\#258](https://github.com/apache/datafusion-ballista/pull/258) ([yahoNanJing](https://github.com/yahoNanJing)) - Make fetch shuffle partition data in parallel [\#256](https://github.com/apache/datafusion-ballista/pull/256) ([yahoNanJing](https://github.com/yahoNanJing)) - Add LaunchMultiTask rpc interface for executor [\#255](https://github.com/apache/datafusion-ballista/pull/255) ([yahoNanJing](https://github.com/yahoNanJing)) - CLI uses ballista context instead of datafusion context in local mode [\#252](https://github.com/apache/datafusion-ballista/pull/252) ([r4ntix](https://github.com/r4ntix)) - Fix Scheduler UI in Docker image [\#251](https://github.com/apache/datafusion-ballista/pull/251) ([andygrove](https://github.com/andygrove)) - Generate into source folder to make IDEs happy [\#247](https://github.com/apache/datafusion-ballista/pull/247) ([avantgardnerio](https://github.com/avantgardnerio)) - Use bind host rather than the external host for starting a local executor service [\#245](https://github.com/apache/datafusion-ballista/pull/245) ([yahoNanJing](https://github.com/yahoNanJing)) - Add REST endpoint to get DOT graph of a job [\#242](https://github.com/apache/datafusion-ballista/pull/242) ([andygrove](https://github.com/andygrove)) - Add list of jobs to scheduler UI [\#241](https://github.com/apache/datafusion-ballista/pull/241) ([andygrove](https://github.com/andygrove)) - Clean up job data on both Scheduler and Executor [\#188](https://github.com/apache/datafusion-ballista/pull/188) ([mingmwang](https://github.com/mingmwang)) - Update etcd-client requirement from 0.9 to 0.10 [\#111](https://github.com/apache/datafusion-ballista/pull/111) ([dependabot[bot]](https://github.com/apps/dependabot)) - Bump terser from 4.8.0 to 4.8.1 in /ballista/ui/scheduler [\#91](https://github.com/apache/datafusion-ballista/pull/91) ([dependabot[bot]](https://github.com/apps/dependabot)) - Bump jsdom from 16.4.0 to 16.7.0 in /ballista/ui/scheduler [\#74](https://github.com/apache/datafusion-ballista/pull/74) ([dependabot[bot]](https://github.com/apps/dependabot)) - Bump numpy from 1.21.3 to 1.22.0 in /python [\#72](https://github.com/apache/datafusion-ballista/pull/72) ([dependabot[bot]](https://github.com/apps/dependabot))