Concepts, Readings, Events¶
🧭 Background Concepts¶
2024-06-13: 2024 ACM SIGMOD International Conference on Management of Data: Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine Download Talk Slides, Recording
2024-06-07: Video: SIGMOD 2024 Practice: Apache Arrow DataFusion A Fast, Embeddable, Modular Analytic Query Engine
2023-04-05: Video: DataFusion Architecture Part 3: Physical Plan and Execution Slides
2023-04-04: Video: DataFusion Architecture Part 2: Logical Plans and Expressions Slides
2023-03-31: Video: DataFusion Architecture Part 1: Query Engines Slides
2020-02-27: Online Book: How Query Engines Work
✨ Good Reads¶
This is a list of DataFusion related blog posts, articles, and other resources. Please open a PR to add any new resources you create or find
2024-11-22 Blog: Apache Datafusion Comet and the story of my first contribution to it
2024-11-21 Blog: DataFusion is featured as one of the coolest 10 open source software tools by CRN
2024-11-20 Apache DataFusion Comet 0.4.0 Release
2024-11-19 Blog: Comparing approaches to User Defined Functions in Apache DataFusion using Python
2024-11-18 Blog: Apache DataFusion is now the fastest single node engine for querying Apache Parquet files
2024-11-18 Building Databases over a Weekend
2024-10-27 Caching in DataFusion: Don’t read twice
2024-10-24 Parquet pruning in DataFusion: Read no more than you need
2024-09-13 Blog: Using StringView / German Style Strings to make Queries Faster: Part 2 - String Operations Reposted on DataFusion Blog
2024-09-13 Blog: Using StringView / German Style Strings to Make Queries Faster: Part 1- Reading Parquet Reposted on Datafusion Blog
2024-10-16 Blog: Candle Image Segmentation
2024-09-23 → 2024-12-02 Carnegie Mellon University: Database Building Blocks Seminar Series - Fall 2024
2024-11-04 Video: Synnada: Towards “Unified” Compute Engines: Opportunities and Challenges (Mehmet Ozan Kabak)
2024-10-28 Video: Exon: A Built for Purpose Bioinformatics Database (Trent Hauck)
2024-10-21 Video: Accelerating Data and AI with Spice.ai Open-Source Software (Luke Kim)
2024-10-07 Video: ParadeDB – Postgres for Search and Analytics (Philippe Noël)
2024-09-30 Video: Accelerating Apache Spark Workloads with Apache DataFusion Comet (Andy Grove)
2024-09-23 Video: Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine (Andrew Lamb)
2024-09-17 Video: Profiling Apache DataFusion using flamegraph
2024-08-15 Blog: Faster DataFusion with StringView - Xiangpeng Hao (Aug 15, 2024)
2024-08-14 Blog: DataFusion @ UWheel
2024-06-17 Blog: Columnar File Readers In-Depth: APIs and Fusion
2024-06-14 2024 Simplicity in Management of Data (SiMOD): DataFusion: The Case for Building Open Data Systems (Keynote)Slides
2024-06-26 Microsoft Gray Systems Lab: Building InfluxDB 3.0 (and other systems) Slides
2024-03-26 DataCouncil 2024: Building InfluxDB 3.0 with Apache Arrow, DataFusion, Flight, and Parquet Slides | Recording
2024-03-20 Video: Profiling DataFusion with Instruments (part of XCode on Mac OSx)
2024-03-18 Blog: Making Recent Value Queries Hundreds of Times Faster
2023-10-25 Blog: Flight, DataFusion, Arrow, and Parquet: Using the FDAP Architecture to build InfluxDB 3.0
2023-09-26 Blog: 100x Faster Ingest with DataFusion + Better Connectivity with FlightSQL
2023-08-05 Blog: Aggregating Millions of Groups Fast in Apache Arrow DataFusion | DataFusion Blog
2023-07-28 Blog: Sliding Window Hash Join (SWHJ)
2023-07-13 Blog: Probabilistic Data Structures in Streaming: Count-Min Sketch
2023-05-25 Video: D3L2: Discussing Rust, Ballista, Ray SQL, Data Fusion with Andy Grove
2023-02-20 Blog: General Purpose Stream Joins via Pruning Symmetric Hash Joins
2023-09-27 Slides: MIT Database Group: Implementing InfluxDB IOx
2023-06-02 Dutch Seminar on Database System Design: Implementing InfluxDB IOx Slides | Recording
2023-02-15 Slides: Invited Talk at Optum Labs: Building a New Time Series Database
2023-01-01 Blog: What I Want from DataFusion 2023
2022-06-27 DataBricks Data+AI Summit: DataFusion and Arrow Slides | Recording
2022-05-23 Video: Slides The Data Thread 2022: Apache Arrow and DataFusionSlides
2021-03-10 Video: InfluxData Tech Talk: Query Engine Design and Rust-Based DataFusion in Apache ArrowSlides
📅 Release Notes & Updates¶
2024-08-24 Apache DataFusion Python 40.1.0 Released, Significant usability updates
2024-07-24 DataFusion 40.0.0 Release
2024-01-19 DataFusion 34.0.0 Release
2023-06-24 DataFusion 25.0.0 Release
2023-01-19 DataFusion 16.0.0 Release
2022-10-25 DataFusion 13.0.0 Release
2022-05-16 DataFusion 8.0.0 Release
2022-02-28 DataFusion 7.0.0 Release
2021-11-19 DataFusion 6.0.0 Release
2021-08-18 DataFusion 5.0.0 Release
2019-09-22 DataFusion 0.15.0 Release Notes
🌎 Community Events¶
2025-01-25 (Upcoming) Amsterdam Apache DataFusion Meetup
2025-01-15 (Upcoming) Boston Apache DataFusion Meetup
2024-12-18 (Upcoming) Chicago Apache DataFusion Meetup
2024-10-14 Seattle Apache DataFusion Meetup
2024-09-27 Belgrade Apache DataFusion Meetup, recap, slides, recordings
2024-06-26 New York City Apache DataFusion Meetup. slides
2024-06-25 San Francisco Bay Area Apache DataFusion Meetup. slides
2024-03-27 Austin Apache DataFusion Meetup. slides, recording