Articles in the blog category

  1. Apache DataFusion 48.0.0 Released

    By PMC

    We’re excited to announce the release of Apache DataFusion 48.0.0! As always, this version packs in a wide range of improvements and fixes. You can find the complete details in the full changelog. We’ll highlight the most important changes below and guide you through upgrading.

    Breaking …

  2. Apache DataFusion 47.0.0 Released

    By PMC

    We’re excited to announce the release of Apache DataFusion 47.0.0! This new version represents a significant milestone for the project, packing in a wide range of improvements and fixes. You can find the complete details in the full changelog. We’ll highlight the most important changes below …

  3. Apache DataFusion Comet 0.9.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.9.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    This release covers approximately ten weeks of development …

  4. Apache DataFusion Comet 0.8.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.8.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    This release covers approximately six weeks of development …

  5. User defined Window Functions in DataFusion

    Window functions are a powerful feature in SQL, allowing for complex analytical computations over a subset of data. However, efficiently implementing them, especially sliding windows, can be quite challenging. With Apache DataFusion's user-defined window functions, developers can easily take advantage of all the effort put into DataFusion's implementation.

    In …

  6. Apache DataFusion Comet 0.7.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.7.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    Comet runs on commodity hardware and aims to …

  7. Using Ordering for Better Plans in Apache DataFusion

    Introduction

    In this blog post, we explain when an ordering requirement of an operator is satisfied by its input data. This analysis is essential for order-based optimizations and is often more complex than one might initially think.

    Ordering Requirement for an operator describes how the input data to that operator …
  8. Apache DataFusion Comet 0.6.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.6.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    Comet runs on commodity hardware and aims to …

  9. Apache DataFusion Comet 0.5.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.5.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    Comet runs on commodity hardware and aims to …

  10. Apache DataFusion Comet 0.4.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.4.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    Comet runs on commodity hardware and aims to …

  11. Apache DataFusion Comet 0.3.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.3.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    Comet runs on commodity hardware and aims to …

  12. Apache DataFusion Comet 0.2.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce version 0.2.0 of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    Comet runs on commodity hardware and aims to …

  13. Apache DataFusion Comet 0.1.0 Release

    By pmc

    The Apache DataFusion PMC is pleased to announce the first official source release of the Comet subproject.

    Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.

    Comet runs on commodity hardware and aims …

  14. Apache Arrow Ballista 0.9.0 Release

    By pmc

    Introduction

    Ballista is an Arrow-native distributed SQL query engine implemented in Rust.

    Ballista 0.9.0 is now available and is the most significant release since the project was donated to Apache Arrow in 2021.

    This release represents 4 weeks of work, with 66 commits from 14 contributors:

        22  Andy …
  15. Apache Arrow DataFusion 8.0.0 Release

    By pmc

    Introduction

    DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

    When you want to extend your Rust project with SQL support, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …

  16. Apache Arrow DataFusion 7.0.0 Release

    By pmc

    Introduction

    DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

    When you want to extend your Rust project with SQL support, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …

  17. Apache Arrow DataFusion 6.0.0 Release

    By pmc

    Introduction

    DataFusion is an embedded query engine which leverages the unique features of Rust and Apache Arrow to provide a system that is high performance, easy to connect, easy to embed, and high quality.

    The Apache Arrow team is pleased to announce the DataFusion 6.0.0 release. This covers …

  18. Apache Arrow Ballista 0.5.0 Release

    By pmc

    Ballista extends DataFusion to provide support for distributed queries. This is the first release of Ballista since the project was donated to the Apache Arrow project and includes 80 commits from 11 contributors.

    git shortlog -sn 4.0.0..5.0.0 ballista/rust/client ballista/rust/core ballista/rust …
  19. Apache Arrow DataFusion 5.0.0 Release

    By pmc

    The Apache Arrow team is pleased to announce the DataFusion 5.0.0 release. This covers 4 months of development work and includes 211 commits from the following 31 distinct contributors.

    $ git shortlog -sn 4.0.0..5.0.0 datafusion datafusion-cli datafusion-examples
        61  Jiayu Liu
        47  Andrew Lamb
        27 …
  20. Ballista: A Distributed Scheduler for Apache Arrow

    We are excited to announce that Ballista has been donated to the Apache Arrow project.

    Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported …

  21. DataFusion: A Rust-native Query Engine for Apache Arrow

    By agrove

    We are excited to announce that DataFusion has been donated to the Apache Arrow project. DataFusion is an in-memory query engine for the Rust implementation of Apache Arrow.

    Although DataFusion was started two years ago, it was recently re-implemented to be Arrow-native and currently has limited capabilities but does support …