Apache DataFusion Ballista 53.0.0 Changelog#
This release consists of 174 commits from 16 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
fix: remove unwrap from executor and improve task error handling #1540 (eachsaj)
fix: handle None task slot in update_task_info after executor lost #1523 (milenkovicm)
fix: [Python] serialize optimized logical plan to fix subquery support #1586 (andygrove)
fix: allow S3 access without explicit credentials #1584 (andygrove)
fix: route collect/show/to_pandas through Ballista cluster #1585 (andygrove)
fix: propagate session config from Python client to Ballista cluster #1592 (andygrove)
fix(python): ignore ballista-namespaced cluster_config keys locally #1613 (andygrove)
fix:
df.write_fix as it was broken after update #1625 (milenkovicm)fix(rest): remove unwrap and return 404 if executor does not exist #1628 (milenkovicm)
fix(metrics): avoid stage metrics inflation by tracking partition snapshots #1652 (danielhumanmod)
fix: compilation issue after merge #1658 (milenkovicm)
fix: rest api calculates stage running time correctly #1675 (milenkovicm)
fix: REST API does not show running jobs #1703 (gittihub-jpg)
fix: no executor warning, correct prometheus feature name in TUI #1698 (killzoner)
Implemented enhancements:
[REST]: Cancelling a completed/failed job should return “false” #1494 (martin-g)
feat: Add plain “status” field to the JobResponse #1497 (martin-g)
feat: Expose Logical and Physical plan details in the REST API #1498 (milenkovicm)
feat: (remote) shuffle reader cleanup #1503 (milenkovicm)
feat: enable scheduler rest api by default #1506 (milenkovicm)
feat: [REST] Return the job’s start and end times #1511 (martin-g)
feat: remove hash policy task distribution as not used #1526 (eachsaj)
feat: remove full path from partition locations #1527 (milenkovicm)
feat: add config to ExecutorEngine::create_query_stage_exec #1542 (milenkovicm)
feat: Improve REST API adding task related info to job stages #1543 (milenkovicm)
feat: Ballista Text User Interface #1436 (martin-g)
feat: jupyter notebook support #1513 (sandugood)
feat: ExecutionEngine::create_query_stage_exec accepts partition_id #1556 (milenkovicm)
feat: make ballista client retry policy configurable #1577 (eachsaj)
feat: Executor system & process metrics reporting #1547 (sandugood)
feat: Scheduler config update #1597 (sandugood)
feat: Support
EXPLAIN ANALYZEin Ballista #1567 (danielhumanmod)feat: Add standalone shuffle writer benchmark that shuffles real Parquet input #1600 (andygrove)
feat: defer sort-shuffle materialization with interleave_record_batch #1598 (andygrove)
feat(executor): bound executor memory via –memory-pool-size #1624 (andygrove)
feat(bench): show TPC-H query timings in seconds and add total time #1641 (andygrove)
feat(aqe): Support sort-based shuffle writer in AQE #1640 (danielhumanmod)
feat: rest api supports plan tree rendering #1650 (sandugood)
feat: Cache ballista clients on executor #1578 (milenkovicm)
feat(aqe): Lazy stage evaluation in AQE #1649 (milenkovicm)
feat: move shuffle writer disk I/O off tokio worker threads #1537 (hcrosse)
feat(scheduler): broadcast-style hash join for small-side joins #1647 (andygrove)
feat: default to sort-merge join #1651 (andygrove)
feat: TUI shows running job information #1717 (milenkovicm)
feat(aqe): CoalescePartitionsRule — shuffle-partition coalescing on resolved stats #1684 (metegenez)
feat: TUI make task popup scrollable #1725 (milenkovicm)
feat(tui): Use separate areas for the table and its associated scrollbar #1729 (martin-g)
Documentation updates:
docs: document hash-based and sort-based shuffle implementations #1595 (andygrove)
docs: improve Python documentation structure #1579 (andygrove)
docs: add Python client build-from-source section to contributor guide #1620 (andygrove)
chore: remove NYC Taxi benchmark #1644 (andygrove)
docs: document experimental Adaptive Query Execution in tuning guide #1645 (andygrove)
build(bench): rework docker-compose TPC-H stack #1646 (andygrove)
docs: add Ballista TUI documentation #1593 (goingforstudying-ctrl)
[TUI] Show executor’s details in a popup #1670 (martin-g)
[TUI] Add a config setting for rendering job stage’s plan as a tree #1704 (martin-g)
[TUI] Add support for horizontal scrolling to the job/stage plan popups #1711 (martin-g)
[TUI] Add screenshots of the TUI application in README/cli.md #1714 (martin-g)
Other:
CI: Add CodeQL workflow for GitHub Actions security scanning #1484 (kevinjqliu)
ci: Harden labeler workflow, remove unnecessary checkout from pull_request_target job #1487 (kevinjqliu)
chore(deps): bump tokio from 1.49.0 to 1.50.0 #1490 (dependabot[bot])
chore(deps): bump actions/setup-node from 6.2.0 to 6.3.0 #1489 (dependabot[bot])
ci: add take and stale #1488 (kevinjqliu)
chore(deps): bump uuid from 1.21.0 to 1.22.0 #1492 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.32.5 to 4.32.6 #1491 (dependabot[bot])
chore(deps): bump libc from 0.2.182 to 0.2.183 #1495 (dependabot[bot])
chore(deps): bump graphviz-rust from 0.9.6 to 0.9.7 #1496 (dependabot[bot])
chore(deps): bump quinn-proto from 0.11.13 to 0.11.14 #1499 (dependabot[bot])
chore(deps): bump quinn-proto from 0.11.13 to 0.11.14 in /python #1500 (dependabot[bot])
chore(deps): bump tempfile from 3.26.0 to 3.27.0 #1501 (dependabot[bot])
chore(deps): cargo update deps #1502 (milenkovicm)
chore(deps): bump once_cell from 1.21.3 to 1.21.4 #1505 (dependabot[bot])
chore(deps): bump clap from 4.5.60 to 4.6.0 #1504 (dependabot[bot])
minor: task scheduling policy config cleanup #1507 (milenkovicm)
minor: address task policy config comments #1508 (milenkovicm)
chore(deps): bump tracing-subscriber from 0.3.22 to 0.3.23 #1510 (dependabot[bot])
minor: [REST] add datafusion version info #1512 (milenkovicm)
chore(deps): bump lz4_flex from 0.12.0 to 0.12.1 in /python #1514 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.32.6 to 4.33.0 #1515 (dependabot[bot])
chore(deps): bump rustls-webpki from 0.103.9 to 0.103.10 in /python #1517 (dependabot[bot])
chore(deps): bump lz4_flex from 0.12.0 to 0.12.1 #1518 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.33.0 to 4.34.1 #1519 (dependabot[bot])
chore(deps): bump rustls-webpki from 0.103.9 to 0.103.10 #1521 (dependabot[bot])
ci: pin third-party actions to Apache-approved SHAs #1516 (kevinjqliu)
chore(deps): bump env_logger from 0.11.9 to 0.11.10 #1522 (dependabot[bot])
chore(deps): update to datafusion v.53 #1486 (milenkovicm)
chore(deps): bump insta from 1.46.3 to 1.47.0 #1524 (dependabot[bot])
chore(deps): bump uuid from 1.22.0 to 1.23.0 #1525 (dependabot[bot])
chore: update datafusion proto #1528 (milenkovicm)
minor: cleanup scheduler clap configuration #1529 (milenkovicm)
chore(deps): bump insta from 1.47.0 to 1.47.1 #1532 (dependabot[bot])
chore(deps): bump ctor from 0.6.3 to 0.8.0 #1534 (dependabot[bot])
chore(deps): bump md-5 from 0.10.6 to 0.11.0 #1533 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.34.1 to 4.35.1 #1530 (dependabot[bot])
chore(deps): bump insta from 1.47.1 to 1.47.2 #1535 (dependabot[bot])
chore(deps): bump libc from 0.2.183 to 0.2.184 #1538 (dependabot[bot])
chore(deps): bump tokio from 1.50.0 to 1.51.0 #1544 (dependabot[bot])
chore(deps): bump tui-big-text from 0.8.3 to 0.8.4 #1546 (dependabot[bot])
chore(deps): bump tokio from 1.51.0 to 1.51.1 #1545 (dependabot[bot])
chore(deps): bump ctor from 0.8.0 to 0.9.1 #1548 (dependabot[bot])
chore(deps): bump ctor from 0.9.1 to 0.10.0 #1549 (dependabot[bot])
chore(deps): bump rustls from 0.23.37 to 0.23.38 #1550 (dependabot[bot])
chore(deps): bump libc from 0.2.184 to 0.2.185 #1553 (dependabot[bot])
chore(deps): bump rand from 0.9.2 to 0.9.4 in /python #1552 (dependabot[bot])
chore(deps): bump tokio from 1.51.1 to 1.52.0 #1555 (dependabot[bot])
chore(deps): bump axum from 0.8.8 to 0.8.9 #1554 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.35.1 to 4.35.2 #1557 (dependabot[bot])
chore: Fix Clippy issues with Rust 1.95.0 #1558 (martin-g)
chore: update deps #1561 (milenkovicm)
chore(deps): bump actions/setup-node from 6.3.0 to 6.4.0 #1562 (dependabot[bot])
chore(deps): bump mimalloc from 0.1.48 to 0.1.49 #1566 (dependabot[bot])
chore(deps): bump aws-config from 1.8.15 to 1.8.16 #1564 (dependabot[bot])
chore(deps): bump rand from 0.9.4 to 0.10.1 #1565 (dependabot[bot])
chore(deps): bump ctor from 0.10.0 to 0.10.1 #1571 (dependabot[bot])
chore(deps): bump mimalloc from 0.1.49 to 0.1.50 #1570 (dependabot[bot])
chore(deps): bump rustls from 0.23.38 to 0.23.39 #1569 (dependabot[bot])
chore(deps): bump libc from 0.2.185 to 0.2.186 #1572 (dependabot[bot])
[TUI] Change the key binding for job’s plans #1573 (martin-g)
chore(deps): bump rustls-webpki from 0.103.12 to 0.103.13 #1576 (dependabot[bot])
chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /python #1575 (dependabot[bot])
minor: make config naming convention consistent #1580 (milenkovicm)
chore: Remove unnecessary optimizer rules due to datafusion upgrade to v53 #1594 (sandugood)
chore(deps): bump reqwest from 0.13.2 to 0.13.3 #1605 (dependabot[bot])
ci: drop Intel macOS Python wheel build #1612 (andygrove)
chore(deps): bump ctor from 0.10.1 to 0.11.1 #1618 (dependabot[bot])
chore(deps): bump rustls from 0.23.39 to 0.23.40 #1619 (dependabot[bot])
chore: update python deps to ballista and datafusion 52 #1590 (andygrove)
feat(sort-shuffle): byte-copy spill files and enable block-IO transport #1615 (andygrove)
Use BallistaSessionContext instead of BallistaBuilder in tpch.py #1621 (martin-g)
feat(sort-shuffle): enable sort-based shuffle by default #1623 (andygrove)
perf(sort-shuffle): fix performance regression caused by datafusion upgrade #1626 (andygrove)
chore(deps): bump ctor from 0.11.1 to 0.12.0 #1639 (dependabot[bot])
fix(sort-shuffle): bound writer memory with per-task spill threshold #1636 (andygrove)
chore(deps): bump graphviz-rust from 0.9.7 to 0.9.8 #1655 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.35.2 to 4.35.3 #1653 (dependabot[bot])
chore(deps): bump tokio from 1.52.1 to 1.52.2 #1657 (dependabot[bot])
chore(deps): bump actions/labeler from 6.0.1 to 6.1.0 #1659 (dependabot[bot])
[TUI] Show job’s stages and their tasks #1574 (martin-g)
chore(deps): bump tonic-build from 0.14.5 to 0.14.6 #1662 (dependabot[bot])
chore(deps): bump tonic from 0.14.5 to 0.14.6 #1661 (dependabot[bot])
chore(deps): bump astral-tokio-tar from 0.6.0 to 0.6.1 #1664 (dependabot[bot])
chore(deps): bump ctor from 0.12.0 to 1.0.2 #1663 (dependabot[bot])
chore(deps): bump github/codeql-action from 4.35.3 to 4.35.4 #1665 (dependabot[bot])
chore(deps): bump ctor from 1.0.2 to 1.0.3 #1668 (dependabot[bot])
chore(deps): bump tonic-prost from 0.14.5 to 0.14.6 #1667 (dependabot[bot])
chore(deps): bump tonic-prost-build from 0.14.5 to 0.14.6 #1666 (dependabot[bot])
[TUI] Configurable tick interval #1669 (martin-g)
minor: add pending stage indicator for
AdaptiveDatafusionExec#1672 (milenkovicm)chore(deps): bump ctor from 1.0.3 to 1.0.4 #1680 (dependabot[bot])
chore(deps): bump tokio from 1.52.2 to 1.52.3 #1678 (dependabot[bot])
minor: change parameter ordering in
AdaptivePlanner::try_new_with_optimizers#1687 (milenkovicm)chore(deps): bump ctor from 1.0.4 to 1.0.5 #1694 (dependabot[bot])
Fix REST API panic on job list/detail when end_time < start_time #1693 (abhinavgautam01)
Right align all numeric columns in the TUI tables #1695 (martin-g)
fix(join-selection): guard CollectLeft swap when right has multiple partitions #1691 (andygrove)
Minor improvements for #1675 #1686 (martin-g)
[CLI/TUI] Use only tracing crate for logging in CLI and TUI #1697 (martin-g)
chore(deps): bump urllib3 from 2.6.3 to 2.7.0 in /python #1699 (dependabot[bot])
chore(deps): bump pyjwt from 2.10.1 to 2.12.0 in /python #1700 (dependabot[bot])
chore(deps): bump pytest from 9.0.2 to 9.0.3 in /python #1702 (dependabot[bot])
minor: change log level for few statements #1706 (milenkovicm)
chore(deps): bump config from 0.15.22 to 0.15.23 #1709 (dependabot[bot])
Saturate scheduler job elapsed time #1708 (MukundaKatta)
[TUI] Executor’s id is not a numeric column. It should be center aligned #1713 (martin-g)
Make use of Swatinem/rust-cache to make the CI workflows faster #1705 (martin-g)
Merge Executor’s brief and extended properties #1716 (martin-g)
[INFRA] Set up default rulesets for default and release branches #1715 (asf-gitbox-commits)
chore(deps): bump github/codeql-action from 4.35.4 to 4.35.5 #1719 (dependabot[bot])
chore(deps): bump ctor from 1.0.5 to 1.0.6 #1720 (dependabot[bot])
chore(deps): bump dashmap from 6.1.0 to 6.2.1 #1721 (dependabot[bot])
ci: add TPC-H SF10 workflow #1688 (andygrove)
minor: [TUI] Extract a helper method for the table/scrollbar area splitter #1730 (martin-g)
minor: [TUI] Sort the metrics before rendering them #1731 (martin-g)
chore: Do not run -cli tests twice #1732 (martin-g)
chore: generate changelog for ballista 53 #1735 (milenkovicm)
Credits#
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
74 dependabot[bot]
29 Marko Milenković
25 Andy Grove
23 Martin Grigorov
5 alexander domenti
4 Kevin Liu
3 Daniel Tu
3 Saj
1 Abhinav Gautam
1 Harrison Crosse
1 Mukunda Rao Katta
1 The Apache Software Foundation
1 gittihub-jpg
1 goingforstudying-ctrl
1 jgrim
1 mete
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.