<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - Oznur Hanci and Berkay Sahin on behalf of the PMC</title><link href="https://datafusion.apache.org/blog/" rel="alternate"/><link href="https://datafusion.apache.org/blog/feeds/oznur-hanci-and-berkay-sahin-on-behalf-of-the-pmc.atom.xml" rel="self"/><id>https://datafusion.apache.org/blog/</id><updated>2025-03-24T00:00:00+00:00</updated><entry><title>Apache DataFusion 46.0.0 Released</title><link href="https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0" rel="alternate"/><published>2025-03-24T00:00:00+00:00</published><updated>2025-03-24T00:00:00+00:00</updated><author><name>Oznur Hanci and Berkay Sahin on behalf of the PMC</name></author><id>tag:datafusion.apache.org,2025-03-24:/blog/2025/03/24/datafusion-46.0.0</id><summary type="html">&lt;!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
--&gt;

&lt;p&gt;We’re excited to announce the release of &lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new version represents a significant milestone for the project, packing in a wide range of improvements and fixes. You can find the complete details in the full &lt;a href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;. We’ll highlight the most important changes below …&lt;/p&gt;</summary><content type="html">&lt;!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
--&gt;

&lt;p&gt;We’re excited to announce the release of &lt;strong&gt;Apache DataFusion 46.0.0&lt;/strong&gt;! This new version represents a significant milestone for the project, packing in a wide range of improvements and fixes. You can find the complete details in the full &lt;a href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md"&gt;changelog&lt;/a&gt;. We’ll highlight the most important changes below and guide you through upgrading.&lt;/p&gt;
&lt;h2 id="breaking-changes"&gt;Breaking Changes&lt;a class="headerlink" href="#breaking-changes" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;DataFusion 46.0.0 brings a few &lt;strong&gt;breaking changes&lt;/strong&gt; that may require adjustments to your code as described in the &lt;a href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade Guide&lt;/a&gt;. Here are the most notable ones:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/apache/datafusion/pull/14224#"&gt;Unified &lt;code&gt;DataSourceExec&lt;/code&gt; Execution Plan&lt;/a&gt;&lt;strong&gt;:&lt;/strong&gt; DataFusion 46.0.0 introduces a major refactor of scan operators. The separate file-format-specific execution plan nodes (&lt;code&gt;ParquetExec&lt;/code&gt;, &lt;code&gt;CsvExec&lt;/code&gt;, &lt;code&gt;JsonExec&lt;/code&gt;, &lt;code&gt;AvroExec&lt;/code&gt;, etc.) have been &lt;strong&gt;deprecated and merged into a single &lt;code&gt;DataSourceExec&lt;/code&gt; plan&lt;/strong&gt;. Format-specific logic is now encapsulated in new &lt;code&gt;DataSource&lt;/code&gt; and &lt;code&gt;FileSource&lt;/code&gt; traits. This change simplifies the execution model, but if you have code that directly references the old plan nodes, you’ll need to update it to use &lt;code&gt;DataSourceExec&lt;/code&gt; (see the &lt;a href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade Guide&lt;/a&gt; for examples of the new API).&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/apache/arrow-datafusion/issues/7360#:~:text=2"&gt;**Error Handling Improvements&lt;/a&gt; (&lt;code&gt;DataFusionError::Collection&lt;/code&gt;):** We began overhauling DataFusion’s approach to error handling. In this release, a new error variant &lt;code&gt;DataFusionError::Collection&lt;/code&gt; (and related mechanisms) has been introduced to aggregate multiple errors into one. This is part of a broader effort to provide richer error context and reduce internal panics. As a result, some error types or messages have changed. Downstream code that matches on specific &lt;code&gt;DataFusionError&lt;/code&gt; variants might need adjustment.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="performance-improvements"&gt;Performance Improvements&lt;a class="headerlink" href="#performance-improvements" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;DataFusion 46.0.0 comes with a slew of performance enhancements across the board. Here are some of the noteworthy optimizations in this release:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Faster &lt;code&gt;median()&lt;/code&gt; (no grouping):&lt;/strong&gt; The &lt;code&gt;median()&lt;/code&gt; aggregate function got a special fast path when used without a &lt;code&gt;GROUP BY&lt;/code&gt;. By optimizing its accumulator, median calculation is about &lt;strong&gt;2× faster&lt;/strong&gt; in the single-group case. If you use &lt;code&gt;MEDIAN()&lt;/code&gt; on large datasets (especially as a single value), you should notice reduced query times (PR &lt;a href="https://github.com/apache/datafusion/pull/14399"&gt;#14399&lt;/a&gt; by &lt;a href="https://github.com/2010YOUY01"&gt;@2010YOUY01&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimized &lt;code&gt;FIRST_VALUE&lt;/code&gt;/&lt;code&gt;LAST_VALUE&lt;/code&gt;:&lt;/strong&gt; The &lt;code&gt;FIRST_VALUE&lt;/code&gt; and &lt;code&gt;LAST_VALUE&lt;/code&gt; window functions have been improved by avoiding an internal sort of rows. Instead of sorting each partition, the implementation now uses a direct approach to pick the first/last element. This yields &lt;strong&gt;10–100% performance improvement&lt;/strong&gt; for these functions, depending on the scenario. Queries using &lt;code&gt;FIRST_VALUE(...) OVER (PARTITION BY ... ORDER BY ...)&lt;/code&gt; will run faster, especially when partitions are large (PR &lt;a href="https://github.com/apache/datafusion/pull/14402"&gt;#14402&lt;/a&gt; by &lt;a href="https://github.com/blaginin"&gt;@blaginin&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;repeat()&lt;/code&gt; String Function Boost:&lt;/strong&gt; Repeating strings is now more efficient – the &lt;code&gt;repeat(text, n)&lt;/code&gt; function was optimized by about &lt;strong&gt;50%&lt;/strong&gt;. This was achieved by reducing allocations and using a more efficient concatenation strategy. If you generate large repeated strings in queries, this can cut the time nearly in half (PR &lt;a href="https://github.com/apache/datafusion/pull/14697"&gt;#14697&lt;/a&gt; by &lt;a href="https://github.com/zjregee"&gt;@zjregee&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ultra-fast &lt;code&gt;uuid()&lt;/code&gt; UDF:&lt;/strong&gt; The &lt;code&gt;uuid()&lt;/code&gt; function (which generates random UUID strings) received a major speed-up. It’s now roughly &lt;strong&gt;40× faster&lt;/strong&gt; than before! The new implementation avoids unnecessary string copying and uses a more direct conversion to hex, making bulk UUID generation far more practical (PR &lt;a href="https://github.com/apache/datafusion/pull/14675"&gt;#14675&lt;/a&gt; by &lt;a href="https://github.com/simonvandel"&gt;@simonvandel&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Accelerated &lt;code&gt;chr()&lt;/code&gt; and &lt;code&gt;to_hex()&lt;/code&gt;:&lt;/strong&gt; Several scalar functions have been micro-optimized. The &lt;code&gt;chr()&lt;/code&gt; function (which returns the character for a given ASCII code) is about &lt;strong&gt;4× faster&lt;/strong&gt; now, and the &lt;code&gt;to_hex()&lt;/code&gt; function (which converts numbers to hex string) is roughly &lt;strong&gt;2× faster&lt;/strong&gt;. These improvements may be most noticeable in tight loops or when these functions are applied to large arrays of values (PR &lt;a href="https://github.com/apache/datafusion/pull/14700"&gt;#14700&lt;/a&gt; for &lt;code&gt;chr&lt;/code&gt;, &lt;a href="https://github.com/apache/datafusion/pull/14686"&gt;#14686&lt;/a&gt; for &lt;code&gt;to_hex&lt;/code&gt; by &lt;a href="https://github.com/simonvandel"&gt;@simonvandel&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No More RowConverter in Grouped Ordering:&lt;/strong&gt; We removed an inefficient step in the &lt;em&gt;partial grouping&lt;/em&gt; algorithm. The &lt;code&gt;GroupOrderingPartial&lt;/code&gt; operator no longer converts data to “row format” for each batch (via &lt;code&gt;RowConverter&lt;/code&gt;). Instead, it uses a direct arrow-based approach to detect sort key changes. This eliminated overhead and yields a nice speedup for certain aggregation queries. (PR &lt;a href="https://github.com/apache/datafusion/pull/14566"&gt;#14566&lt;/a&gt; by &lt;a href="https://github.com/ctsk"&gt;@ctsk&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Predicate Pruning for &lt;code&gt;NOT LIKE&lt;/code&gt;:&lt;/strong&gt; DataFusion’s parquet reader can now prune row groups using &lt;code&gt;NOT LIKE&lt;/code&gt; filters, similar to how it handles &lt;code&gt;LIKE&lt;/code&gt;. This means if you have a filter such as &lt;code&gt;column NOT LIKE 'prefix%'&lt;/code&gt;, DataFusion can use min/max statistics to skip reading files/parts that can be determined to either entirely match or not match the predicate. In particular, a pattern like &lt;code&gt;NOT LIKE 'X%'&lt;/code&gt; can skip data ranges that definitely start with "X". While a niche case, it contributes to query efficiency in those scenarios (PR &lt;a href="https://github.com/apache/datafusion/pull/14567"&gt;#14567&lt;/a&gt; by &lt;a href="https://github.com/UBarney"&gt;@UBarney&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="google-summer-of-code-2025"&gt;Google Summer of Code 2025&lt;a class="headerlink" href="#google-summer-of-code-2025" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Another exciting development: &lt;strong&gt;Apache DataFusion has been accepted as a mentoring organization for Google Summer of Code (GSoC) 2025&lt;/strong&gt;! 🎉 This means that this summer, students from around the world will have the opportunity to contribute to DataFusion under the guidance of our committers. We have put together &lt;a href="https://datafusion.apache.org/contributor-guide/gsoc_project_ideas.html"&gt;a list of project ideas&lt;/a&gt; that candidates can choose from.&lt;/p&gt;
&lt;p&gt;If you’re interested, check out our &lt;a href="https://datafusion.apache.org/contributor-guide/gsoc_application_guidelines.html"&gt;GSoC Application Guidelines&lt;/a&gt;. We encourage students to reach out, discuss ideas with us, and apply.&lt;/p&gt;
&lt;h2 id="highlighted-new-features"&gt;Highlighted New Features&lt;a class="headerlink" href="#highlighted-new-features" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id="improved-diagnostics"&gt;Improved Diagnostics&lt;a class="headerlink" href="#improved-diagnostics" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;DataFusion 46.0.0 introduces a new &lt;a href="https://github.com/apache/datafusion/issues/14429"&gt;&lt;strong&gt;SQL Diagnostics framework&lt;/strong&gt;&lt;/a&gt; to make error messages more understandable. This comes in the form of new &lt;code&gt;Diagnostic&lt;/code&gt; and &lt;code&gt;DiagnosticEntry&lt;/code&gt; types, which allow the system to attach rich context (like source query text spans) to error messages. In practical terms, certain planner errors will now point to the exact location in your SQL query that caused the issue. &lt;/p&gt;
&lt;p&gt;For example, if you reference an unknown table or miss a column in &lt;code&gt;GROUP BY&lt;/code&gt; the error message will include the query snippet causing the error. These diagnostics are meant for end-users of applications built on DataFusion, providing clearer messages instead of generic errors. Here’s an example:&lt;/p&gt;
&lt;p&gt;&lt;img alt="diagnostic-example" class="img-fluid" src="/blog/images/datafusion-46.0.0/diagnostic-example.png" width="80%"/&gt;&lt;/p&gt;
&lt;p&gt;Currently, diagnostics cover unresolved table/column references, missing &lt;code&gt;GROUP BY&lt;/code&gt; columns, ambiguous references, wrong number of UNION columns, type mismatches, and a few others. Future releases will extend this to more error types. This feature should greatly ease debugging of complex SQL by pinpointing errors directly in the query text. We thank &lt;a href="https://github.com/eliaperantoni"&gt;@eliaperantoni&lt;/a&gt; for his contributions in this project.&lt;/p&gt;
&lt;h3 id="unified-datasourceexec-for-table-providers"&gt;Unified &lt;code&gt;DataSourceExec&lt;/code&gt; for Table Providers&lt;a class="headerlink" href="#unified-datasourceexec-for-table-providers" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;As mentioned, DataFusion now uses a unified &lt;code&gt;DataSourceExec&lt;/code&gt; for reading tables, which is both a breaking change and a feature. &lt;em&gt;Why is this important?&lt;/em&gt; The new approach simplifies how custom table providers are integrated and optimized. Namely, the optimizer can treat file scans uniformly and push down filters/limits more consistently when there is one execution plan that handles all data sources. The new &lt;code&gt;DataSourceExec&lt;/code&gt; is paired with a &lt;code&gt;DataSource&lt;/code&gt; trait that encapsulates format-specific behaviors (Parquet, CSV, JSON, Avro, etc.) in a pluggable way.&lt;/p&gt;
&lt;p&gt;All built-in sources (Parquet, CSV, Avro, Arrow, JSON, etc.) have been migrated to this framework. This unification makes the codebase cleaner and sets the stage for future enhancements (like consistent metadata handling and limit pushdown across all formats). Check out PR &lt;a href="https://github.com/apache/datafusion/pull/14224"&gt;#14224&lt;/a&gt; for design details. We thank &lt;a href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; and &lt;a href="https://github.com/ozankabak"&gt;@ozankabak&lt;/a&gt; for their contributions.&lt;/p&gt;
&lt;h3 id="ffi-support-for-scalar-udfs"&gt;FFI Support for Scalar UDFs&lt;a class="headerlink" href="#ffi-support-for-scalar-udfs" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;DataFusion’s Foreign Function Interface (FFI) has been extended to support &lt;a href="https://github.com/apache/datafusion/pull/14579"&gt;&lt;strong&gt;user-defined scalar functions&lt;/strong&gt;&lt;/a&gt; defined in external languages. In 46.0.0, you can now expose a custom scalar UDF through the FFI layer and use it in DataFusion as if it were built-in. This is particularly exciting for the &lt;strong&gt;Python bindings&lt;/strong&gt; and other language integrations – it means you could define a function in Python (or C, etc.) and register it with DataFusion’s Rust core via the FFI crate. Thanks, &lt;a href="https://github.com/timsaucer"&gt;@timsaucer&lt;/a&gt;!&lt;/p&gt;
&lt;h3 id="new-statisticsdistribution-framework"&gt;New Statistics/Distribution Framework&lt;a class="headerlink" href="#new-statisticsdistribution-framework" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;This release, thanks mainly to &lt;a href="https://github.com/Fly-Style"&gt;@Fly-Style&lt;/a&gt; with contributions from &lt;a href="https://github.com/ozankabak"&gt;@ozankabak&lt;/a&gt; and &lt;a href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt;, includes the initial pieces of a &lt;a href="https://github.com/apache/datafusion/pull/14699"&gt;**redesigned statistics framework&lt;/a&gt;.&lt;strong&gt; DataFusion’s optimizer can now represent column data distributions using a new &lt;code&gt;Distribution&lt;/code&gt; enum, instead of the old precision or range estimations. The supported distribution types currently include &lt;/strong&gt;Uniform, Gaussian (normal), Exponential, Bernoulli&lt;strong&gt;, and a &lt;/strong&gt;Generic** catch-all.&lt;/p&gt;
&lt;p&gt;For example, if a filter expression is applied to a column with a known uniform distribution range, the optimizer can propagate that to estimate result selectivity more accurately. Similarly, comparisons (&lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, etc.) on columns yield Bernoulli distributions (with true/false probabilities) in this model.&lt;/p&gt;
&lt;p&gt;This is a foundational change with many follow-on PRs underway. Even though the immediate user-visible effect is limited (the optimizer didn't magically improve by an order of magnitude overnight), but it lays groundwork for more advanced query planning in the future. Over time, as statistics information encapsulated in &lt;code&gt;Distribution&lt;/code&gt;s get integrated, DataFusion will be able to make smarter decisions like more aggressive parquet pruning, better join orderings, and so on based on data distribution information. The core framework is now in place and is being hooked up to column and table level statistics.&lt;/p&gt;
&lt;h3 id="aggregate-monotonicity-and-window-ordering"&gt;Aggregate Monotonicity and Window Ordering&lt;a class="headerlink" href="#aggregate-monotonicity-and-window-ordering" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;DataFusion 46.0.0 adds a new concept of &lt;a href="https://github.com/apache/datafusion/pull/14271"&gt;&lt;strong&gt;set-monotonicity&lt;/strong&gt;&lt;/a&gt; for certain transformations, which helps avoid unnecessary sort operations. In particular, the planner now understands when a &lt;strong&gt;window function introduces new orderings of data&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;For example, DataFusion now recognizes that a window-aggregate like &lt;code&gt;MAX&lt;/code&gt; on a column can produce a result that is &lt;strong&gt;monotonically increasing&lt;/strong&gt;, even if the input column is unordered — depending on the window frame used.&lt;/p&gt;
&lt;p&gt;Consider the following query:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT MAX(c1) OVER (
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS max_c1
FROM c1_table
ORDER BY max_c1;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In earlier versions of DataFusion, this query would require an additional SortExec on max_c1 to satisfy the ORDER BY clause. However, with the new set-monotonicity logic, the planner knows that MAX(...) OVER (...) produces values that are not smaller than the previous row, making the extra sort redundant. This leads to more efficient query execution.&lt;/p&gt;
&lt;p&gt;PR &lt;a href="https://github.com/apache/datafusion/pull/14271"&gt;#14271&lt;/a&gt; introduced the core monotonicity tracking for aggregates and window functions.
PR &lt;a href="https://github.com/apache/datafusion/pull/14813"&gt;#14813&lt;/a&gt; improved ordering preservation within various window frame types, and brought an extensive test coverage.
Huge thanks to &lt;a href="https://github.com/berkaysynnada"&gt;@berkaysynnada&lt;/a&gt; and &lt;a href="https://github.com/mertak-synnada"&gt;@mertak-synnada&lt;/a&gt; for designing and implementing this optimizer enhancement!&lt;/p&gt;
&lt;h3 id="union-all-distinct-by-name-support"&gt;UNION [ALL | DISTINCT] BY NAME Support&lt;a class="headerlink" href="#union-all-distinct-by-name-support" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;DataFusion now supports UNION BY NAME and UNION ALL BY NAME, which align columns by name instead of position. This matches functionality found in systems like Spark and DuckDB and simplifies combining heterogeneously ordered result sets.&lt;/p&gt;
&lt;p&gt;You no longer need to rewrite column order manually — just write:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT col1, col2 FROM t1
UNION ALL BY NAME
SELECT col2, col1 FROM t2;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Under the hood, this is supported by the new union_by_name() and union_by_name_distinct() plan builder methods.&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a href="https://github.com/rkrishn7"&gt;@rkrishn7&lt;/a&gt; for PR &lt;a href="https://github.com/apache/datafusion/pull/14538"&gt;#14538&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="new-range-table-function"&gt;New range() Table Function&lt;a class="headerlink" href="#new-range-table-function" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;A new table-valued function range(start, stop, step) has been added to make it easy to generate integer sequences — similar to PostgreSQL’s generate_series() or Spark’s range().&lt;/p&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT * FROM range(1, 10, 2);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This returns: 1, 3, 5, 7, 9. It’s great for testing, cross joins, surrogate keys, and more.&lt;/p&gt;
&lt;p&gt;Thanks to &lt;a href="https://github.com/simonvandel"&gt;@simonvandel&lt;/a&gt; for PR &lt;a href="https://github.com/apache/datafusion/pull/14830"&gt;#14830&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="upgrade-guide-and-changelog"&gt;Upgrade Guide and Changelog&lt;a class="headerlink" href="#upgrade-guide-and-changelog" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Upgrading to 46.0.0 should be straightforward for most users, but do review the &lt;a href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade Guide for DataFusion 46.0.0&lt;/a&gt; for detailed steps and code changes. The upgrade guide covers the breaking changes mentioned (like replacing old exec nodes with &lt;code&gt;DataSourceExec&lt;/code&gt;, updating UDF invocation to &lt;code&gt;invoke_with_args&lt;/code&gt;, etc.) and provides code snippets to help with the transition. For a comprehensive list of all changes, please refer to the &lt;strong&gt;changelog&lt;/strong&gt; for 46.0.0 (linked above and in the repository). The changelog enumerates every merged PR in this release, including many smaller fixes and improvements that we couldn’t cover in this post.&lt;/p&gt;
&lt;h2 id="get-involved"&gt;Get Involved&lt;a class="headerlink" href="#get-involved" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Apache DataFusion is an open-source project, and we welcome involvement from anyone interested. Now is a great time to take 46.0.0 for a spin: try it out on your workloads, and let us know if you encounter any issues or have suggestions. You can report bugs or request features on our GitHub issue tracker, or better yet, submit a pull request. Join our community discussions – whether you have questions, want to share how you’re using DataFusion, or are looking to contribute, we’d love to hear from you. A list of open issues suitable for beginners is &lt;a href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22"&gt;here&lt;/a&gt; and you can find how to reach us on the &lt;a href="https://datafusion.apache.org/contributor-guide/communication.html"&gt;communication doc&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy querying!&lt;/p&gt;</content><category term="blog"/></entry></feed>