Skip to main content
Ctrl+K
Apache DataFusion in Python Apache DataFusion in Python
  • User Guide
  • Contributor Guide
  • API Reference
  • Links
  • GitHub
  • Rust API docs (docs.rs)
  • User Guide
  • Contributor Guide
  • API Reference
  • Links
  • GitHub
  • Rust API docs (docs.rs)

Section Navigation

  • User Guide
    • Introduction
    • Concepts
    • Data Sources
    • DataFrames
      • DataFrame Rendering
      • Execution Metrics
    • Common Operations
      • Registering Views
      • Basic Operations
      • Column Selections
      • Expressions
      • Joins
      • Functions
      • Spark-Compatible Functions
      • Aggregation
      • Window Functions
      • User-Defined Functions
    • IO
      • Arrow
      • Avro
      • CSV
      • JSON
      • Parquet
      • Custom Table Provider
    • Configuration
    • Distributing work
    • SQL
    • Upgrade Guides
    • Using AI Coding Assistants
  • Contributor Guide
    • Introduction
    • Python Extensions
  • API Reference
    • datafusion
      • datafusion.catalog
      • datafusion.context
      • datafusion.dataframe
      • datafusion.dataframe_formatter
      • datafusion.expr
      • datafusion.functions
        • datafusion.functions.spark
      • datafusion.input
        • datafusion.input.base
        • datafusion.input.location
      • datafusion.io
      • datafusion.ipc
      • datafusion.object_store
      • datafusion.options
      • datafusion.plan
      • datafusion.record_batch
      • datafusion.substrait
      • datafusion.unparser
      • datafusion.user_defined
  • Links
    • GitHub and Issue Tracker
    • Rust API Docs
    • Code of Conduct
    • Examples
  • User Guide

User Guide#

The user guide walks through installing DataFusion in Python, building queries with the DataFrame API or SQL, reading and writing data, and tuning execution.

  • Introduction
    • Installation
  • Concepts
    • Session Context
    • DataFrame
    • Expressions
  • Data Sources
    • Local file
    • Create in-memory
    • Object Store
    • Other DataFrame Libraries
    • Delta Lake
    • Apache Iceberg
    • Custom Table Provider
  • Catalog
    • User Defined Catalog and Schema
  • DataFrames
    • Overview
    • Creating DataFrames
    • Common DataFrame Operations
    • Column Names as Function Arguments
    • Terminal Operations
    • Zero-copy streaming to Arrow-based Python libraries
    • PyArrow
    • HTML Rendering
    • Core Classes
    • Expression Classes
    • Built-in Functions
    • Execution Metrics
  • Common Operations
    • Registering Views
    • Basic Operations
    • Column Selections
    • Expressions
    • Joins
    • Functions
    • Handling Missing Values
    • Spark-Compatible Functions
    • Aggregation
    • Window Functions
    • User-Defined Functions
  • IO
    • Arrow
    • Avro
    • CSV
    • JSON
    • Parquet
    • Custom Table Provider
  • Configuration
    • Maximizing CPU Usage
  • Distributing work
    • Expression-level distribution
    • Query-level distribution via datafusion-distributed
    • Query-level distribution via Apache Ballista
    • See also
  • SQL
    • Parameterized queries
  • Upgrade Guides
    • DataFusion 54.0.0
    • DataFusion 53.0.0
    • DataFusion 52.0.0
  • Using AI Coding Assistants
    • What is published
    • Installing the skill
    • What the skill covers
    • If you are an agent author

previous

DataFusion in Python

next

Introduction

Apache Arrow DataFusion, Arrow DataFusion, Apache, the Apache feather logo, and the Apache Arrow DataFusion project logo

are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.