<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - milenkovicm</title><link href="https://datafusion.apache.org/blog/" rel="alternate"/><link href="https://datafusion.apache.org/blog/feeds/milenkovicm.atom.xml" rel="self"/><id>https://datafusion.apache.org/blog/</id><updated>2025-02-02T00:00:00+00:00</updated><entry><title>Apache DataFusion Ballista 43.0.0 Released</title><link href="https://datafusion.apache.org/blog/2025/02/02/datafusion-ballista-43.0.0" rel="alternate"/><published>2025-02-02T00:00:00+00:00</published><updated>2025-02-02T00:00:00+00:00</updated><author><name>milenkovicm</name></author><id>tag:datafusion.apache.org,2025-02-02:/blog/2025/02/02/datafusion-ballista-43.0.0</id><summary type="html">&lt;!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
--&gt;

&lt;p&gt;We are  pleased to announce version &lt;a href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07"&gt;43.0.0&lt;/a&gt; of the &lt;a href="https://datafusion.apache.org/ballista/"&gt;DataFusion Ballista&lt;/a&gt;. Ballista allows existing &lt;a href="https://datafusion.apache.org"&gt;DataFusion&lt;/a&gt; applications to be scaled out on a cluster for use cases that are not practical to run on a single node.&lt;/p&gt;
&lt;h2 id="highlights-of-this-release"&gt;Highlights of this release&lt;a class="headerlink" href="#highlights-of-this-release" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id="seamless-integration-with-datafusion"&gt;Seamless Integration with DataFusion&lt;a class="headerlink" href="#seamless-integration-with-datafusion" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The primary objective of …&lt;/p&gt;</summary><content type="html">&lt;!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
--&gt;

&lt;p&gt;We are  pleased to announce version &lt;a href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07"&gt;43.0.0&lt;/a&gt; of the &lt;a href="https://datafusion.apache.org/ballista/"&gt;DataFusion Ballista&lt;/a&gt;. Ballista allows existing &lt;a href="https://datafusion.apache.org"&gt;DataFusion&lt;/a&gt; applications to be scaled out on a cluster for use cases that are not practical to run on a single node.&lt;/p&gt;
&lt;h2 id="highlights-of-this-release"&gt;Highlights of this release&lt;a class="headerlink" href="#highlights-of-this-release" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;h3 id="seamless-integration-with-datafusion"&gt;Seamless Integration with DataFusion&lt;a class="headerlink" href="#seamless-integration-with-datafusion" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;The primary objective of this release has been to achieve a more seamless integration with the DataFusion ecosystem and try to achieve the same level of flexibility as DataFusion.&lt;/p&gt;
&lt;p&gt;In recent months, our development efforts have been directed toward providing a robust and extensible Ballista API. This new API empowers end-users to tailor Ballista's core functionality to their specific use cases. As a result, we have deprecated several experimental features from the Ballista core, allowing users to reintroduce them as custom extensions outside the core framework. This shift reduces the maintenance burden on Ballista's core maintainers and paves the way for optional features, such as &lt;a href="https://github.com/delta-io/delta-rs"&gt;delta-rs&lt;/a&gt; support, to be added externally when needed.&lt;/p&gt;
&lt;p&gt;The most significant enhancement in this release is the deprecation of &lt;code&gt;BallistaContext&lt;/code&gt;, which has been superseded by the DataFusion &lt;code&gt;SessionContext&lt;/code&gt;. This change enables DataFusion applications written in Rust to execute on a Ballista cluster with minimal modifications. Beyond simplifying migration and reducing maintenance overhead, this update introduces distributed write functionality to Ballista for the first time, significantly enhancing its capabilities.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-rust"&gt;use ballista::prelude::*;
use datafusion::prelude::*;

#[tokio::main]
async fn main() -&amp;gt; datafusion::error::Result&amp;lt;()&amp;gt; {

  // Instead of creating classic SessionContext
  // let ctx = SessionContext::new();

  // create DataFusion SessionContext with ballista standalone cluster started
  // let ctx = SessionContext::standalone().await;

  // create DataFusion SessionContext with ballista remote cluster started
  let ctx = SessionContext::remote("df://localhost:50050").await;

  // register the table
  ctx.register_csv("example", "tests/data/example.csv", CsvReadOptions::new()).await?;

  // create a plan to run a SQL query
  let df = ctx.sql("SELECT a, MIN(b) FROM example WHERE a &amp;lt;= b GROUP BY a LIMIT 100").await?;

  // execute and print results
  df.show().await?;
  Ok(())
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Additionally, Ballista’s versioning scheme has been aligned with that of DataFusion, ensuring that Ballista's version number reflects the compatible DataFusion version.&lt;/p&gt;
&lt;p&gt;At the moment there is a gap between DataFusion and Ballista, which we will try to bridge in the future.&lt;/p&gt;
&lt;h3 id="removal-of-experimental-features"&gt;Removal of Experimental Features&lt;a class="headerlink" href="#removal-of-experimental-features" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Ballista had grown in scope to include several experimental features in various states of completeness. Some features have been removed from this release in an effort to strip Ballista back to its core and make it easier to maintain and extend.&lt;/p&gt;
&lt;p&gt;Specifically, the caching subsystem, predefined object store registry, plugin subsystem, key-value stores for persistent scheduler state, and the UI have been removed.&lt;/p&gt;
&lt;h3 id="performance-scalability"&gt;Performance &amp;amp; Scalability&lt;a class="headerlink" href="#performance-scalability" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Ballista has significantly leveraged the advancements made in the DataFusion project over the past year. Benchmark results demonstrate notable improvements in performance, highlighting the impact of these enhancements:&lt;/p&gt;
&lt;p&gt;Per query comparison:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Per query comparison" class="img-fluid" src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_compare.png" width="100%"/&gt;&lt;/p&gt;
&lt;p&gt;Relative speedup:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Relative speedup graph" class="img-fluid" src="/blog/images/datafusion-ballista-43.0.0/tpch_queries_speedup_rel.png" width="100%"/&gt;&lt;/p&gt;
&lt;p&gt;The overall speedup is 2.9x&lt;/p&gt;
&lt;p&gt;&lt;img alt="Overall speedup" class="img-fluid" src="/blog/images/datafusion-ballista-43.0.0/tpch_allqueries.png" width="50%"/&gt;&lt;/p&gt;
&lt;h3 id="new-logo"&gt;New Logo&lt;a class="headerlink" href="#new-logo" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h3&gt;
&lt;p&gt;Ballista now has a new logo, which is visually similar to other DataFusion projects.  &lt;/p&gt;
&lt;p&gt;&lt;img alt="New logo" class="img-fluid" src="/blog/images/datafusion-ballista-43.0.0/ballista-logo.png" width="50%"/&gt;&lt;/p&gt;
&lt;h2 id="roadmap"&gt;Roadmap&lt;a class="headerlink" href="#roadmap" title="Permanent link"&gt;¶&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Moving forward, Ballista will adopt the same release cadence as DataFusion, providing synchronized updates across the ecosystem.
Currently, there is no established long-term roadmap for Ballista. A plan will be formulated in the coming months based on community feedback and the availability of additional maintainers.&lt;/p&gt;
&lt;p&gt;In the short term, development efforts will concentrate on closing the feature gap between DataFusion and Ballista. Key priorities include implementing support for &lt;code&gt;INSERT INTO&lt;/code&gt;, enabling table &lt;code&gt;URL&lt;/code&gt; functionality, and achieving deeper integration with the Python ecosystem.&lt;/p&gt;</content><category term="blog"/></entry></feed>