Welcome to the Apache DataFusion Blog!

Here you can find the latest updates from DataFusion and related projects.

Apache DataFusion Comet 0.4.0 Release

Posted on: Wed 20 November 2024 by pmc

The Apache DataFusion PMC is pleased to announce version 0.4.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Apache DataFusion Comet 0.3.0 Release

Posted on: Fri 27 September 2024 by pmc

The Apache DataFusion PMC is pleased to announce version 0.3.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Apache DataFusion Comet 0.2.0 Release

Posted on: Wed 28 August 2024 by pmc

The Apache DataFusion PMC is pleased to announce version 0.2.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Apache DataFusion Comet 0.1.0 Release

Posted on: Sat 20 July 2024 by pmc

The Apache DataFusion PMC is pleased to announce the first official source release of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims …

Aggregating Millions of Groups Fast in Apache Arrow DataFusion 28.0.0

Posted on: Sat 05 August 2023 by alamb, Dandandan, tustvold

Aggregating Millions of Groups Fast in Apache Arrow DataFusion

Andrew Lamb, Daniël Heres, Raphael Taylor-Davies,

Note: this article was originally published on the InfluxData Blog

TLDR

Grouped aggregations are a core part of any analytic tool, creating understandable summaries of huge data volumes. Apache Arrow DataFusion’s parallel aggregation capability …

Apache Arrow Ballista 0.9.0 Release

Posted on: Fri 28 October 2022 by pmc

Introduction

Ballista is an Arrow-native distributed SQL query engine implemented in Rust.

Ballista 0.9.0 is now available and is the most significant release since the project was donated to Apache Arrow in 2021.

This release represents 4 weeks of work, with 66 commits from 14 contributors:

    22  Andy …

Apache Arrow DataFusion 8.0.0 Release

Posted on: Mon 16 May 2022 by pmc

Introduction

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

When you want to extend your Rust project with SQL support, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …

Apache Arrow DataFusion 7.0.0 Release

Posted on: Mon 28 February 2022 by pmc

Introduction

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

When you want to extend your Rust project with SQL support, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …

Apache Arrow DataFusion 6.0.0 Release

Posted on: Fri 19 November 2021 by pmc

Introduction

DataFusion is an embedded query engine which leverages the unique features of Rust and Apache Arrow to provide a system that is high performance, easy to connect, easy to embed, and high quality.

The Apache Arrow team is pleased to announce the DataFusion 6.0.0 release. This covers …

Apache Arrow Ballista 0.5.0 Release

Posted on: Wed 18 August 2021 by pmc

Ballista extends DataFusion to provide support for distributed queries. This is the first release of Ballista since the project was donated to the Apache Arrow project and includes 80 commits from 11 contributors.

git shortlog -sn 4.0.0..5.0.0 ballista/rust/client ballista/rust/core ballista/rust …

Apache Arrow DataFusion 5.0.0 Release

Posted on: Wed 18 August 2021 by pmc

The Apache Arrow team is pleased to announce the DataFusion 5.0.0 release. This covers 4 months of development work and includes 211 commits from the following 31 distinct contributors.

$ git shortlog -sn 4.0.0..5.0.0 datafusion datafusion-cli datafusion-examples
    61  Jiayu Liu
    47  Andrew Lamb
    27 …

Ballista: A Distributed Scheduler for Apache Arrow

Posted on: Mon 12 April 2021 by agrove

We are excited to announce that Ballista has been donated to the Apache Arrow project.

Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported …

DataFusion: A Rust-native Query Engine for Apache Arrow

Posted on: Mon 04 February 2019 by agrove

We are excited to announce that DataFusion has been donated to the Apache Arrow project. DataFusion is an in-memory query engine for the Rust implementation of Apache Arrow.

Although DataFusion was started two years ago, it was recently re-implemented to be Arrow-native and currently has limited capabilities but does support …

Copyright 2024, The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache® and the Apache feather logo are trademarks of The Apache Software Foundation.