Apache DataFusion Ballista 0.12.0 Changelog#

Full Changelog

Documentation updates:

  • docs: fix link #799 (haoxins)

Merged pull requests:

  • [minor] remove outdate todo #683 (Ted-Jiang)

  • Add executor terminating status for graceful shutdown #667 (thinkharderdev)

  • Allow BallistaContext::read_* methods to read multiple paths. #679 (luckylsk34)

  • Update scheduler.md #657 (psvri)

  • Mark SchedulerState as pub #688 (Dandandan)

  • Update graphviz-rust requirement from 0.5.0 to 0.6.1 #651 (dependabot[bot])

  • Upgrade DataFusion to 19.0.0 #691 (r4ntix)

  • Update release docs #692 (andygrove)

  • Mark SchedulerServer::with_task_launcher as pub #695 (Dandandan)

  • Make task_manager pub #698 (Dandandan)

  • Add ExecutionEngine abstraction #687 (andygrove)

  • Allow accessing s3 locations in client mode #700 (luckylsk34)

  • git clone branch incorrect #699 (BubbaJoe)

  • Fix for error message during testing #707 (yahoNanJing)

  • Upgrade datafusion to 20.0.0 & sqlparser to to 0.32.0 #711 (r4ntix)

  • Update README.md #729 (jiangzhx)

  • Update link to scheduler proto file in dev docs #713 (JAicewizard)

  • Fix show tables fails #715 (r4ntix)

  • Remove redundant fields in ExecutorManager #728 (yahoNanJing)

  • Fix parameter ‘–config-backend’ to ‘–cluster-backend’ #720 (paolorechia)

  • Upgrade DataFusion to 21.0.0 #727 (r4ntix)

  • [minor] remove useless brackets #739 (Ted-Jiang)

  • Only decode plan in LaunchMultiTaskParams once #743 (Dandandan)

  • Upgrade DataFusion to 22.0.0 #740 (r4ntix)

  • [feature] support shuffle read with retry when facing IO error. #738 (Ted-Jiang)

  • [log] Print long running task status. #750 (Ted-Jiang)

  • Upgrade DataFusion to 23.0.0 #755 (yahoNanJing)

  • Fix plan metrics length and stage metrics length not match #764 (yahoNanJing)

  • added match arms to create ClusterStorageConfig #766 (BokarevNik)

  • [Improve] refactor the offer_reservation avoid wait result #760 (Ted-Jiang)

  • [fea] Avoid multithreaded write lock conflicts in event queue. #754 (Ted-Jiang)

  • Upgrade DataFusion to 24.0.0, Object_Store to 0.5.6 #769 (r4ntix)

  • Refine create_datafusion_context() #778 (yahoNanJing)

  • Remove output_partitioning for task definition #776 (yahoNanJing)

  • Upgrade DataFusion to 25.0.0 #779 (r4ntix)

  • Disable the ansi feature of tracing-subscriber #784 (yahoNanJing)

  • Add config grpc_server_max_decoding_message_size to make the maximum size of a decoded message at the grpc server side configurable #782 (yahoNanJing)

  • Fix nodejs issues in Docker build #731 (jnaous)

  • Upgrade node version to fix build in main #794 (avantgardnerio)

  • Remove redundant mod session_registry #792 (yahoNanJing)

  • Make last_seen_ts_threshold for getting alive executor at the scheduler side larger than the heartbeat time interval #786 (yahoNanJing)

  • Remove the prometheus-metrics from the default feature #788 (yahoNanJing)

  • Refine the ExecuteQuery grpc interface #790 (yahoNanJing)

  • Add config to collect statistics, enable in TPC-H benchmark #796 (Dandandan)

  • Add support for GCS data sources #805 (haoxins)

  • Update DataFusion to 26 #798 (Dandandan)

  • Issue 162 build docker image in ci #716 (paolorechia)

  • Fix index out of bounds panic #819 (yahoNanJing)

  • Refactor the TaskDefinition by changing encoding execution plan to the decoded one #817 (yahoNanJing)

  • Fix ballista-cli docs #800 (jonahgao)

  • docs: fix link #799 (haoxins)

  • Implement the with_new_children for ShuffleReaderExec #821 (yahoNanJing)

  • Update to point to the correct documentation #838 (dadepo)

  • Remove ExecutorReservation and change the task assignment philosophy from executor first to task first #823 (yahoNanJing)

  • Upgrade DataFusion to 27.0.0 #834 (r4ntix)

  • Reduce the number of calls to create_logical_plan #842 (jonahgao)

  • Bump semver from 5.7.1 to 5.7.2 in /ballista/scheduler/ui #843 (dependabot[bot])

  • Bump actions/labeler from 4.1.0 to 4.3.0 #841 (dependabot[bot])

  • Bump tough-cookie from 4.1.2 to 4.1.3 in /ballista/scheduler/ui #840 (dependabot[bot])

  • Update flatbuffers requirement from 22.9.29 to 23.5.26 #801 (dependabot[bot])

  • Update dirs requirement from 4.0.0 to 5.0.1 #767 (dependabot[bot])

  • Update libloading requirement from 0.7.3 to 0.8.0 #761 (dependabot[bot])

  • Introduce a cache crate supporting concurrent cache value loading #825 (yahoNanJing)

  • Fix cargo clippy for latest rust version #848 (yahoNanJing)

  • Introduce CachedBasedObjectStoreRegistry to use data source cache transparently #827 (yahoNanJing)

  • Add ConsistentHash for node topology management #830 (yahoNanJing)

  • Implement 3-phase consistent hash based task assignment policy #833 (yahoNanJing)

  • Update tonic requirement from 0.8 to 0.9 #733 (dependabot[bot])

  • Update itertools requirement from 0.10 to 0.11 #844 (dependabot[bot])

  • Update etcd-client requirement from 0.10 to 0.11 #845 (dependabot[bot])

  • Update hashbrown requirement from 0.13 to 0.14 #846 (dependabot[bot])

  • Bump word-wrap from 1.2.3 to 1.2.4 in /ballista/scheduler/ui #849 (dependabot[bot])

  • Update hdfs requirement from 0.1.1 to 0.1.4 #856 (yahoNanJing)

  • Update to DataFusion 28 #858 (Dandandan)

  • Upgrade datafusion to 30.0.0 #866 (r4ntix)

  • refactor: port get_scan_files to Ballista #877 (alamb)

  • Upgrade datafusion to 31.0.0 #878 (r4ntix)

  • Upgrade datafusion to 32.0.0 #899 (r4ntix)

  • Update to DataFusion 33 #900 (Dandandan)

  • Refactor lru mod, remove linked_hash_map #918 (PsiACE)

  • Dynamically optimize aggregate (count) based on shuffle stats #919 (Dandandan)

  • Use lz4 compression for shuffle files & flight stream, refactoring / improvements #920 (Dandandan)

  • Make max encoding message size configurable #928 (andygrove)

  • Set max message size to 16MB in gRPC clients #931 (andygrove)

  • Upgrade to DataFusion 34.0.0-rc1 #927 (andygrove)

  • Use official DF 34 release #939 (andygrove)

  • Use StreamWriter instead of FileWriter #943 (avantgardnerio)

  • Remove some TODO comments related to context fetching schemas from scheduler #946 (andygrove)

  • Fix Docker build #947 (andygrove)

  • Fix regression in DataFrame.write_xxx #945 (andygrove)