Apache DataFusion Blog

Articles by pmc

Apache DataFusion Comet 0.16.0 Release

The Apache DataFusion PMC is pleased to announce version 0.16.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately three weeks of development …

Published:

By pmc

Apache DataFusion Comet 0.15.0 Release

The Apache DataFusion PMC is pleased to announce version 0.15.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately four weeks of development …

Published:

By pmc

Apache DataFusion 53.0.0 Released

We are proud to announce the release of DataFusion 53.0.0. This post highlights some of the major improvements since DataFusion 52.0.0. The complete list of changes is available in the changelog. Thanks to the 114 contributors for making this release possible.

Performance Improvements 🚀

Performance over time

Figure 1: Average …

Published:

By pmc

Apache DataFusion Comet 0.14.0 Release

The Apache DataFusion PMC is pleased to announce version 0.14.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately eight weeks of development …

Published:

By pmc

Apache DataFusion Comet 0.13.0 Release

The Apache DataFusion PMC is pleased to announce version 0.13.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately eight weeks of development …

Published:

By pmc

Apache DataFusion 52.0.0 Released

We are proud to announce the release of DataFusion 52.0.0. This post highlights some of the major improvements since DataFusion 51.0.0. The complete list of changes is available in the changelog. Thanks to the 121 contributors for making this release possible.

Performance Improvements 🚀

We continue to …

Published:

By pmc

Apache DataFusion Comet 0.12.0 Release

The Apache DataFusion PMC is pleased to announce version 0.12.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately four weeks of development …

Published:

By pmc

Apache DataFusion 51.0.0 Released

Introduction

We are proud to announce the release of DataFusion 51.0.0. This post highlights some of the major improvements since DataFusion 50.0.0. The complete list of changes is available in the changelog. Thanks to the 128 contributors for making this release possible.

Performance Improvements 🚀

We continue …

Published:

By pmc

Apache DataFusion Comet 0.11.0 Release

The Apache DataFusion PMC is pleased to announce version 0.11.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately five weeks of development …

Published:

By pmc

Apache DataFusion 50.0.0 Released

Introduction

We are proud to announce the release of DataFusion 50.0.0. This blog post highlights some of the major improvements since the release of DataFusion 49.0.0. The complete list of changes is available in the changelog. Thanks to numerous contributors for making this release possible!

Performance …

Published:

By pmc

Apache DataFusion Comet 0.10.0 Release

The Apache DataFusion PMC is pleased to announce version 0.10.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately ten weeks of development …

Published:

By pmc

Apache DataFusion 49.0.0 Released

Introduction

We are proud to announce the release of DataFusion 49.0.0. This blog post highlights some of the major improvements since the release of DataFusion 48.0.0. The complete list of changes is available in the changelog.

Performance Improvements 🚀

DataFusion continues to focus on enhancing performance, as …

Published:

By pmc

Apache DataFusion 48.0.0 Released

We’re excited to announce the release of Apache DataFusion 48.0.0! As always, this version packs in a wide range of improvements and fixes. You can find the complete details in the full changelog. We’ll highlight the most important changes below and guide you through upgrading.

Breaking …

Published:

By PMC

Apache DataFusion 47.0.0 Released

We’re excited to announce the release of Apache DataFusion 47.0.0! This new version represents a significant milestone for the project, packing in a wide range of improvements and fixes. You can find the complete details in the full changelog. We’ll highlight the most important changes below …

Published:

By PMC

Apache DataFusion Comet 0.9.0 Release

The Apache DataFusion PMC is pleased to announce version 0.9.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately ten weeks of development …

Published:

By pmc

Apache DataFusion Comet 0.8.0 Release

The Apache DataFusion PMC is pleased to announce version 0.8.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

This release covers approximately six weeks of development …

Published:

By pmc

Apache DataFusion Comet 0.7.0 Release

The Apache DataFusion PMC is pleased to announce version 0.7.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Published:

By pmc

Apache DataFusion 45.0.0 Released

Introduction

We are very proud to announce DataFusion 45.0.0. This blog highlights some of the many major improvements since we released DataFusion 40.0.0 and a preview of what the community is thinking about in the next 6 months. It has been an exciting period of development …

Published:

By pmc

Apache DataFusion Comet 0.6.0 Release

The Apache DataFusion PMC is pleased to announce version 0.6.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Published:

By pmc

Apache DataFusion Comet 0.5.0 Release

The Apache DataFusion PMC is pleased to announce version 0.5.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Published:

By pmc

Apache DataFusion Comet 0.4.0 Release

The Apache DataFusion PMC is pleased to announce version 0.4.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Published:

By pmc

Apache DataFusion Comet 0.3.0 Release

The Apache DataFusion PMC is pleased to announce version 0.3.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Published:

By pmc

Apache DataFusion Comet 0.2.0 Release

The Apache DataFusion PMC is pleased to announce version 0.2.0 of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims to …

Published:

By pmc

Apache DataFusion 40.0.0 Released

Introduction

We are proud to announce DataFusion 40.0.0. This blog highlights some of the many major improvements since we released DataFusion 34.0.0 and a preview of what the community is thinking about in the next 6 months. We are hoping to make more regular blog posts …

Published:

By pmc

Apache DataFusion Comet 0.1.0 Release

The Apache DataFusion PMC is pleased to announce the first official source release of the Comet subproject.

Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

Comet runs on commodity hardware and aims …

Published:

By pmc

Announcing Apache Arrow DataFusion is now Apache DataFusion

Introduction

TLDR; Apache Arrow DataFusion --> Apache DataFusion

The Arrow PMC and newly created DataFusion PMC are happy to announce that as of April 16, 2024 the Apache Arrow DataFusion subproject is now a top level Apache Software Foundation project.

Background

Apache DataFusion is a fast, extensible query engine for building …

Published:

By pmc

Announcing Apache Arrow DataFusion Comet

Introduction

The Apache Arrow PMC is pleased to announce the donation of the Comet project, a native Spark SQL Accelerator built on Apache Arrow DataFusion.

Comet is an Apache Spark plugin that uses Apache Arrow DataFusion to accelerate Spark workloads. It is designed as a drop-in replacement for Spark's JVM …

Published:

By pmc

Apache Arrow DataFusion 34.0.0 Released, Looking Forward to 2024

Introduction

We recently released DataFusion 34.0.0. This blog highlights some of the major improvements since we released DataFusion 26.0.0 (spoiler alert there are many) and a preview of where the community plans to focus in the next 6 months.

Apache Arrow DataFusion is an extensible query …

Published:

By pmc

Apache Arrow DataFusion 26.0.0

It has been a whirlwind 6 months of DataFusion development since our last update: the community has grown, many features have been added, performance improved and we are discussing branching out to our own top level Apache Project.

Background

Apache Arrow DataFusion is an extensible query engine and database toolkit …

Published:

By pmc

Apache Arrow DataFusion 16.0.0 Project Update

Introduction

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format. It is targeted primarily at developers creating data intensive analytics, and offers mature SQL support, a DataFrame API, and many extension points.

Systems based on DataFusion perform very well in benchmarks …

Published:

By pmc

Apache Arrow Ballista 0.9.0 Release

Introduction

Ballista is an Arrow-native distributed SQL query engine implemented in Rust.

Ballista 0.9.0 is now available and is the most significant release since the project was donated to Apache Arrow in 2021.

This release represents 4 weeks of work, with 66 commits from 14 contributors:

    22  Andy …

Published:

By pmc

Apache Arrow DataFusion 13.0.0 Project Update

Introduction

Apache Arrow DataFusion 13.0.0 is released, and this blog contains an update on the project for the 5 months since our last update in May 2022.

DataFusion is an extensible and embeddable query engine, written in Rust used to create modern, fast and efficient data pipelines, ETL …

Published:

By pmc

Apache Arrow DataFusion 8.0.0 Release

Introduction

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

When you want to extend your Rust project with SQL support, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …

Published:

By pmc

Introducing Apache Arrow DataFusion Contrib

Introduction

Apache Arrow DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

When you want to extend your Rust project with SQL support, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is …

Published:

By pmc

Apache Arrow DataFusion 7.0.0 Release

Introduction

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

When you want to extend your Rust project with SQL support, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …

Published:

By pmc

Apache Arrow DataFusion 6.0.0 Release

Introduction

DataFusion is an embedded query engine which leverages the unique features of Rust and Apache Arrow to provide a system that is high performance, easy to connect, easy to embed, and high quality.

The Apache Arrow team is pleased to announce the DataFusion 6.0.0 release. This covers …

Published:

By pmc

Apache Arrow Ballista 0.5.0 Release

Ballista extends DataFusion to provide support for distributed queries. This is the first release of Ballista since the project was donated to the Apache Arrow project and includes 80 commits from 11 contributors.

git shortlog -sn 4.0.0..5.0.0 ballista/rust/client ballista/rust/core ballista/rust …

Published:

By pmc

Apache Arrow DataFusion 5.0.0 Release

The Apache Arrow team is pleased to announce the DataFusion 5.0.0 release. This covers 4 months of development work and includes 211 commits from the following 31 distinct contributors.

$ git shortlog -sn 4.0.0..5.0.0 datafusion datafusion-cli datafusion-examples
    61  Jiayu Liu
    47  Andrew Lamb
    27 …