Extending SQL Syntax#

DataFusion provides a flexible extension system that allows you to customize SQL parsing and planning without modifying the core codebase. This is useful when you need to:

  • Support custom operators from other SQL dialects (e.g., PostgreSQL’s -> for JSON)

  • Add custom data types not natively supported

  • Implement SQL constructs like TABLESAMPLE, PIVOT/UNPIVOT, or MATCH_RECOGNIZE

Architecture Overview#

When DataFusion processes a SQL query, it goes through these stages:

┌─────────────┐    ┌─────────┐    ┌──────────────────────┐    ┌─────────────┐
│ SQL String  │───▶│ Parser  │───▶│      SqlToRel        │───▶│ LogicalPlan │
└─────────────┘    └─────────┘    │ (SQL to LogicalPlan) │    └─────────────┘
                                  └──────────────────────┘
                                              │
                                              │ uses
                                              ▼
                                  ┌───────────────────────┐
                                  │  Extension Planners   │
                                  │  • ExprPlanner        │
                                  │  • TypePlanner        │
                                  │  • RelationPlanner    │
                                  └───────────────────────┘

The extension planners intercept specific parts of the SQL AST during the SqlToRel phase and allow you to customize how they are converted to DataFusion’s logical plan.

Extension Points#

DataFusion provides three planner traits for extending SQL:

Trait

Purpose

Registration Method

ExprPlanner

Custom expressions and operators

ctx.register_expr_planner()

TypePlanner

Custom SQL data types

SessionStateBuilder::with_type_planner()

RelationPlanner

Custom FROM clause elements (relations)

ctx.register_relation_planner()

Planner Precedence: Multiple ExprPlanners and RelationPlanners can be registered; they are invoked in reverse registration order (last registered wins). Return Original(...) to delegate to the next planner. Only one TypePlanner can be active at a time.

ExprPlanner: Custom Expressions and Operators#

Use ExprPlanner to customize how SQL expressions are converted to DataFusion logical expressions. This is useful for:

  • Custom binary operators (e.g., ->, ->>, @>, ?)

  • Custom field access patterns

  • Custom aggregate or window function handling

Available Methods#

Category

Methods

Operators

plan_binary_op, plan_any

Literals

plan_array_literal, plan_dictionary_literal, plan_struct_literal

Functions

plan_extract, plan_substring, plan_overlay, plan_position, plan_make_map

Identifiers

plan_field_access, plan_compound_identifier

Aggregates/Windows

plan_aggregate, plan_window

See the ExprPlanner API documentation for full method signatures.

Example: Custom Arrow Operator#

This example maps the -> operator to string concatenation:

use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr};

#[derive(Debug)]
struct MyCustomPlanner;

impl ExprPlanner for MyCustomPlanner {
    fn plan_binary_op(
        &self,
        expr: RawBinaryExpr,
        _schema: &DFSchema,
    ) -> Result<PlannerResult<RawBinaryExpr>> {
        match &expr.op {
            // Map `->` to string concatenation
            BinaryOperator::Arrow => {
                Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr {
                    left: Box::new(expr.left.clone()),
                    right: Box::new(expr.right.clone()),
                    op: Operator::StringConcat,
                })))
            }
            _ => Ok(PlannerResult::Original(expr)),
        }
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    // Use postgres dialect to enable `->` operator parsing
    let config = SessionConfig::new()
        .set_str("datafusion.sql_parser.dialect", "postgres");
    let mut ctx = SessionContext::new_with_config(config);

    // Register the custom planner
    ctx.register_expr_planner(Arc::new(MyCustomPlanner))?;

    // Now `->` works as string concatenation
    let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?;
    // Returns: "helloworld"
    Ok(())
}

For more details, see the ExprPlanner API documentation and the expr_planner test examples.

TypePlanner: Custom Data Types#

Use TypePlanner to map SQL data types to Arrow/DataFusion types. This is useful when you need to support SQL types that aren’t natively recognized.

Example: Custom DATETIME Type#

use datafusion_expr::planner::TypePlanner;

#[derive(Debug)]
struct MyTypePlanner;

impl TypePlanner for MyTypePlanner {
    fn plan_type(&self, sql_type: &ast::DataType) -> Result<Option<DataType>> {
        match sql_type {
            // Map DATETIME(precision) to Arrow Timestamp
            ast::DataType::Datetime(precision) => {
                let time_unit = match precision {
                    Some(0) => TimeUnit::Second,
                    Some(3) => TimeUnit::Millisecond,
                    Some(6) => TimeUnit::Microsecond,
                    None | Some(9) => TimeUnit::Nanosecond,
                    _ => return Ok(None), // Let default handling take over
                };
                Ok(Some(DataType::Timestamp(time_unit, None)))
            }
            _ => Ok(None), // Return None for types we don't handle
        }
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    let state = SessionStateBuilder::new()
        .with_default_features()
        .with_type_planner(Arc::new(MyTypePlanner))
        .build();

    let ctx = SessionContext::new_with_state(state);

    // Now DATETIME type is recognized
    ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?;
    Ok(())
}

For more details, see the TypePlanner API documentation.

RelationPlanner: Custom FROM Clause Elements#

Use RelationPlanner to handle custom relations in the FROM clause. This enables you to implement SQL constructs like:

  • TABLESAMPLE for sampling data

  • PIVOT / UNPIVOT for data reshaping

  • MATCH_RECOGNIZE for pattern matching

  • Any custom relation syntax parsed by sqlparser

The RelationPlannerContext#

When implementing RelationPlanner, you receive a RelationPlannerContext that provides utilities for planning:

Method

Purpose

plan(relation)

Recursively plan a nested relation

sql_to_expr(expr, schema)

Convert SQL expression to DataFusion Expr

context_provider()

Access session configuration, tables, functions

See the RelationPlanner API documentation for additional methods like normalize_ident() and object_name_to_table_reference().

Implementation Strategies#

There are two main approaches when implementing a RelationPlanner:

  1. Rewrite to Standard SQL: Transform custom syntax into equivalent standard operations that DataFusion already knows how to execute (e.g., PIVOT → GROUP BY with CASE expressions). This is the simplest approach when possible.

  2. Custom Logical and Physical Nodes: Create a UserDefinedLogicalNode to represent the operation in the logical plan, along with a custom ExecutionPlan to execute it. Both are required for end-to-end execution.

Example: Basic RelationPlanner Structure#

use datafusion_expr::planner::{
    PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning,
};
use datafusion_sql::sqlparser::ast::TableFactor;

#[derive(Debug)]
struct MyRelationPlanner;

impl RelationPlanner for MyRelationPlanner {
    fn plan_relation(
        &self,
        relation: TableFactor,
        ctx: &mut dyn RelationPlannerContext,
    ) -> Result<RelationPlanning> {
        match relation {
            // Handle your custom relation
            TableFactor::Pivot { table, alias, .. } => {
                // Plan the input table
                let input = ctx.plan(*table)?;

                // Transform or wrap the plan as needed
                // ...

                Ok(RelationPlanning::Planned(PlannedRelation::new(input, alias)))
            }

            // Return Original for relations you don't handle
            other => Ok(RelationPlanning::Original(other)),
        }
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();

    // Register the custom planner
    ctx.register_relation_planner(Arc::new(MyRelationPlanner))?;

    Ok(())
}

Complete Examples#

The DataFusion repository includes comprehensive examples demonstrating each approach:

TABLESAMPLE (Custom Logical and Physical Nodes)#

The table_sample.rs example shows a complete end-to-end implementation of how to support queries such as:

SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)

PIVOT/UNPIVOT (Rewrite Strategy)#

The pivot_unpivot.rs example demonstrates rewriting custom syntax to standard SQL for queries such as:

SELECT * FROM sales
  PIVOT (SUM(amount) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))

Recap#

  1. Use ExprPlanner for custom operators and expression handling

  2. Use [TypePlanner for custom SQL data types

  3. Use RelationPlanner for custom FROM clause syntax (TABLESAMPLE, PIVOT, etc.)

  4. Register planners via SessionContext or SessionStateBuilder

See Also#