Apache DataFusion 53.0.0 Released

Posted on: Thu 02 April 2026 by pmc

We are proud to announce the release of DataFusion 53.0.0. This post highlights some of the major improvements since DataFusion 52.0.0. The complete list of changes is available in the changelog. Thanks to the 114 contributors for making this release possible.

Performance Improvements 🚀

Performance over time

Figure 1: Average and median normalized execution times for DataFusion 53.0.0 on ClickBench queries, compared to previous releases. Query times are normalized using the ClickBench definition. See the DataFusion Benchmarking Page for more details.

DataFusion 53 continues the project-wide focus on performance. This release reduces planning overhead, skips more unnecessary I/O, and pushes more work into earlier and cheaper stages of execution.

LIMIT-Aware Parquet Row Group Pruning

DataFusion 53 includes a new optimization that makes Parquet pruning aware of LIMIT. This optimization is described in full in limit pruning blog post. If DataFusion can prove that an entire row group matches the predicate, and those fully matching row groups contain enough rows to satisfy the LIMIT, partially matching row groups are skipped entirely.

Pruning pipeline with limit pruning highlighted
Figure 2: Limit pruning is inserted between row group and page index pruning.

Thanks to @xudong963 for implementing this feature. Related PRs: #18868

Improved Filter Pushdown

DataFusion 53 pushes filters down through more join types and through UnionExec, and expands support for pushing down dynamic filters. More pushdown means fewer rows flow into joins, repartitions, and later operators, which reduces CPU, memory, and I/O.

For example:

SELECT *
FROM (
    SELECT *
    FROM t1
    LEFT ANTI JOIN t2 ON t1.k = t2.k
) a
JOIN t1 b ON a.k = b.k
WHERE b.v = 1;

Now DataFusion can often transform the physical plan so filters and dynamic filters are pushed deeper into the plan, even through subqueries and nested joins. In this example, the filter on b.v helps produce dynamic filters that can be pushed into both sides of the nested anti join.

Before and after diagram of dynamic filter pushdown through a subquery with nested joins
Figure 3: DataFusion 53 pushes dynamic filters through subqueries and into both sides of nested joins.

Thanks to @nuno-faria, @haohuaijin, and @jackkleeman for driving this work. Related PRs: #19918, #20145, #20192

Faster Query Planning

DataFusion 53 improves query planning performance by making immutable pieces of execution plans cheaper to clone. This helps applications that need extremely low latency, plan many or complex queries, or use prepared statements or parameterized queries. In some benchmarks, overall execution time drops from roughly 4-5 ms to about 100 us.

Thanks to @askalt for leading this work. Related PRs: #19792, #19893

Faster Functions

DataFusion includes 235 built-in functions. Improving the performance of these functions benefits a wide range of workloads. This release improves the performance of 42 of those functions, such as strpos, replace, concat, translate, array_has, array_agg, left, right, and case_when.

Thanks to the contributors who drove this work, especially @neilconway, @theirix, @lyne7-sc, @kumarUjjawal, @pepijnve, @zhangxffff, and @UBarney.

Nested Field Pushdown

DataFusion 53 pushes expressions such as get_field down the plan and into data sources. This is especially important for nested data such as structs in Parquet files. Instead of reading an entire struct column and then extracting the field of interest, DataFusion 53 pushes the field extraction into the scan.

For example, the following query reads a struct column s and extracts the label field for rows where the value field is greater than 150:

SELECT id, s['label']
FROM t
WHERE s['value'] > 150;
Before and after diagram of field access pushdown into a data source
Figure 4: DataFusion 53 pushes field-access expressions closer to the scan.

Special thanks to @adriangb for designing and implementing this optimizer work. Related PRs: #20065, #20117, #20239

New Features ✨

Stability and Release Engineering 🦺

The community spent significant time this release cycle stabilizing the release branch and improving the release process. While such improvements are not as headline-friendly as new features, they are highly important for real deployments. We are discussing ways to improve the process on #21034 and would welcome suggestions and contributions to help with release engineering work in the future.

Thanks to @comphead for running this release, and to @jonathanc-n, @alamb, @xanderbailey, @haohuaijin, @friendlymatthew, @fwojciec, @Kontinuation, @nathanb9, and many others who helped stabilize the release branch.

Upgrade Notes

DataFusion 53 includes some breaking changes, including updates to the SQL parser, optimizer behavior, and some physical-plan APIs. Please see the upgrade guide and changelog for the full details before upgrading.

Known Issues

A small number of issues were discovered after the 53.0.0 release, and we expect to publish DataFusion 53.1.0 soon. See the 53.1.0 release tracking issue for the latest status.

Thank You

Thank you to everyone in the DataFusion community who contributed code, reviews, testing, bug reports, documentation, and release engineering work for 53.0.0. This release contains direct contributions from 114 different people, and we are grateful for the time and effort that everyone put in to make it happen.


Comments