Extending SQL Syntax#
DataFusion provides a flexible extension system that allows you to customize SQL parsing and planning without modifying the core codebase. This is useful when you need to:
Support custom operators from other SQL dialects (e.g., PostgreSQL’s
->for JSON)Add custom data types not natively supported
Implement SQL constructs like
TABLESAMPLE,PIVOT/UNPIVOT, orMATCH_RECOGNIZE
Architecture Overview#
When DataFusion processes a SQL query, it goes through these stages:
┌─────────────┐ ┌─────────┐ ┌──────────────────────┐ ┌─────────────┐
│ SQL String │───▶│ Parser │───▶│ SqlToRel │───▶│ LogicalPlan │
└─────────────┘ └─────────┘ │ (SQL to LogicalPlan) │ └─────────────┘
└──────────────────────┘
│
│ uses
▼
┌───────────────────────┐
│ Extension Planners │
│ • ExprPlanner │
│ • TypePlanner │
│ • RelationPlanner │
└───────────────────────┘
The extension planners intercept specific parts of the SQL AST during the
SqlToRel phase and allow you to customize how they are converted to DataFusion’s
logical plan.
Extension Points#
DataFusion provides three planner traits for extending SQL:
Trait |
Purpose |
Registration Method |
|---|---|---|
Custom expressions and operators |
|
|
Custom SQL data types |
|
|
Custom FROM clause elements (relations) |
|
Planner Precedence: Multiple ExprPlanners and RelationPlanners can be
registered; they are invoked in reverse registration order (last registered wins).
Return Original(...) to delegate to the next planner. Only one TypePlanner
can be active at a time.
ExprPlanner: Custom Expressions and Operators#
Use ExprPlanner to customize how SQL expressions are converted to DataFusion
logical expressions. This is useful for:
Custom binary operators (e.g.,
->,->>,@>,?)Custom field access patterns
Custom aggregate or window function handling
Available Methods#
Category |
Methods |
|---|---|
Operators |
|
Literals |
|
Functions |
|
Identifiers |
|
Aggregates/Windows |
|
See the ExprPlanner API documentation for full method signatures.
Example: Custom Arrow Operator#
This example maps the -> operator to string concatenation:
use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr};
#[derive(Debug)]
struct MyCustomPlanner;
impl ExprPlanner for MyCustomPlanner {
fn plan_binary_op(
&self,
expr: RawBinaryExpr,
_schema: &DFSchema,
) -> Result<PlannerResult<RawBinaryExpr>> {
match &expr.op {
// Map `->` to string concatenation
BinaryOperator::Arrow => {
Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr {
left: Box::new(expr.left.clone()),
right: Box::new(expr.right.clone()),
op: Operator::StringConcat,
})))
}
_ => Ok(PlannerResult::Original(expr)),
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
// Use postgres dialect to enable `->` operator parsing
let config = SessionConfig::new()
.set_str("datafusion.sql_parser.dialect", "postgres");
let mut ctx = SessionContext::new_with_config(config);
// Register the custom planner
ctx.register_expr_planner(Arc::new(MyCustomPlanner))?;
// Now `->` works as string concatenation
let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?;
// Returns: "helloworld"
Ok(())
}
For more details, see the ExprPlanner API documentation and the expr_planner test examples.
TypePlanner: Custom Data Types#
Use TypePlanner to map SQL data types to Arrow/DataFusion types. This is useful
when you need to support SQL types that aren’t natively recognized.
Example: Custom DATETIME Type#
use datafusion_expr::planner::TypePlanner;
#[derive(Debug)]
struct MyTypePlanner;
impl TypePlanner for MyTypePlanner {
fn plan_type(&self, sql_type: &ast::DataType) -> Result<Option<DataType>> {
match sql_type {
// Map DATETIME(precision) to Arrow Timestamp
ast::DataType::Datetime(precision) => {
let time_unit = match precision {
Some(0) => TimeUnit::Second,
Some(3) => TimeUnit::Millisecond,
Some(6) => TimeUnit::Microsecond,
None | Some(9) => TimeUnit::Nanosecond,
_ => return Ok(None), // Let default handling take over
};
Ok(Some(DataType::Timestamp(time_unit, None)))
}
_ => Ok(None), // Return None for types we don't handle
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
let state = SessionStateBuilder::new()
.with_default_features()
.with_type_planner(Arc::new(MyTypePlanner))
.build();
let ctx = SessionContext::new_with_state(state);
// Now DATETIME type is recognized
ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?;
Ok(())
}
For more details, see the TypePlanner API documentation.
RelationPlanner: Custom FROM Clause Elements#
Use RelationPlanner to handle custom relations in the FROM clause. This
enables you to implement SQL constructs like:
TABLESAMPLEfor sampling dataPIVOT/UNPIVOTfor data reshapingMATCH_RECOGNIZEfor pattern matchingAny custom relation syntax parsed by sqlparser
The RelationPlannerContext#
When implementing RelationPlanner, you receive a RelationPlannerContext that
provides utilities for planning:
Method |
Purpose |
|---|---|
|
Recursively plan a nested relation |
|
Convert SQL expression to DataFusion Expr |
|
Access session configuration, tables, functions |
See the RelationPlanner API documentation for additional methods like
normalize_ident() and object_name_to_table_reference().
Implementation Strategies#
There are two main approaches when implementing a RelationPlanner:
Rewrite to Standard SQL: Transform custom syntax into equivalent standard operations that DataFusion already knows how to execute (e.g., PIVOT → GROUP BY with CASE expressions). This is the simplest approach when possible.
Custom Logical and Physical Nodes: Create a
UserDefinedLogicalNodeto represent the operation in the logical plan, along with a customExecutionPlanto execute it. Both are required for end-to-end execution.
Example: Basic RelationPlanner Structure#
use datafusion_expr::planner::{
PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning,
};
use datafusion_sql::sqlparser::ast::TableFactor;
#[derive(Debug)]
struct MyRelationPlanner;
impl RelationPlanner for MyRelationPlanner {
fn plan_relation(
&self,
relation: TableFactor,
ctx: &mut dyn RelationPlannerContext,
) -> Result<RelationPlanning> {
match relation {
// Handle your custom relation
TableFactor::Pivot { table, alias, .. } => {
// Plan the input table
let input = ctx.plan(*table)?;
// Transform or wrap the plan as needed
// ...
Ok(RelationPlanning::Planned(PlannedRelation::new(input, alias)))
}
// Return Original for relations you don't handle
other => Ok(RelationPlanning::Original(other)),
}
}
}
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
// Register the custom planner
ctx.register_relation_planner(Arc::new(MyRelationPlanner))?;
Ok(())
}
Complete Examples#
The DataFusion repository includes comprehensive examples demonstrating each approach:
TABLESAMPLE (Custom Logical and Physical Nodes)#
The table_sample.rs example shows a complete end-to-end implementation of how to support queries such as:
SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)
PIVOT/UNPIVOT (Rewrite Strategy)#
The pivot_unpivot.rs example demonstrates rewriting custom syntax to standard SQL for queries such as:
SELECT * FROM sales
PIVOT (SUM(amount) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))
Recap#
Use
ExprPlannerfor custom operators and expression handlingUse [
TypePlannerfor custom SQL data typesUse
RelationPlannerfor custom FROM clause syntax (TABLESAMPLE, PIVOT, etc.)Register planners via
SessionContextorSessionStateBuilder
See Also#
API Documentation:
ExprPlanner,TypePlanner,RelationPlannerrelation_planner examples - Complete TABLESAMPLE, PIVOT/UNPIVOT implementations
expr_planner test examples - Custom operator examples
Custom Expression Planning in the UDF guide