Upgrade Guides#

DataFusion 53.0.0#

Note: DataFusion 53.0.0 has not been released yet. The information provided *in this section pertains to features and changes that have already been merged *to the main branch and are awaiting release in this version. See #19692 for *more details.

`ExecutionPlan::properties` now returns `&Arc<PlanProperties>`#

Now ExecutionPlan::properties() returns &Arc<PlanProperties> instead of a reference. This make it possible to cheaply clone properties and reuse them across multiple ExecutionPlans. It also makes it possible to optimize [ExecutionPlan::with_new_children] to reuse properties when the children plans have not changed, which can significantly reduce planning time for complex queries.

ExecutionPlan::with_new_children

To migrate, in all ExecutionPlan implementations, you will likely need to wrap stored PlanProperties in an Arc:

-    cache: PlanProperties,
+    cache: Arc<PlanProperties>,

...

-    fn properties(&self) -> &PlanProperties {
+    fn properties(&self) -> &Arc<PlanProperties> {
         &self.cache
     }

To improve performance of with_new_children for custom ExecutionPlan implementations, you can use the new macro: check_if_same_properties. For it to work, you need to implement the function: with_new_children_and_same_properties with semantics identical to with_new_children, but operating under the assumption that the properties of the children plans have not changed.

An example of supporting this optimization for ProjectionExec:

     impl ProjectionExec {
+       fn with_new_children_and_same_properties(
+           &self,
+           mut children: Vec<Arc<dyn ExecutionPlan>>,
+       ) -> Self {
+           Self {
+               input: children.swap_remove(0),
+               metrics: ExecutionPlanMetricsSet::new(),
+               ..Self::clone(self)
+           }
+       }
    }

    impl ExecutionPlan for ProjectionExec {
        fn with_new_children(
            self: Arc<Self>,
            mut children: Vec<Arc<dyn ExecutionPlan>>,
        ) -> Result<Arc<dyn ExecutionPlan>> {
+           check_if_same_properties!(self, children);
            ProjectionExec::try_new(
                self.projector.projection().into_iter().cloned(),
                children.swap_remove(0),
            )
            .map(|p| Arc::new(p) as _)
        }
    }

`PlannerContext` outer query schema API now uses a stack#

PlannerContext no longer stores a single outer_query_schema. It now tracks a stack of outer relation schemas so nested subqueries can access non-adjacent outer relations.

Before:

let old_outer_query_schema =
    planner_context.set_outer_query_schema(Some(input_schema.clone().into()));
let sub_plan = self.query_to_plan(subquery, planner_context)?;
planner_context.set_outer_query_schema(old_outer_query_schema);

After:

planner_context.append_outer_query_schema(input_schema.clone().into());
let sub_plan = self.query_to_plan(subquery, planner_context)?;
planner_context.pop_outer_query_schema();

`FileSinkConfig` adds `file_output_mode`#

FileSinkConfig now includes a file_output_mode: FileOutputMode field to control single-file vs directory output behavior. Any code constructing FileSinkConfig via struct literals must initialize this field.

The FileOutputMode enum has three variants:

Automatic (default): Infer output mode from the URL (extension/trailing / heuristic)
SingleFile: Write to a single file at the exact output path
Directory: Write to a directory with generated filenames

Before:

FileSinkConfig {
    // ...
    file_extension: "parquet".into(),
}

After:

use datafusion_datasource::file_sink_config::FileOutputMode;

FileSinkConfig {
    // ...
    file_extension: "parquet".into(),
    file_output_mode: FileOutputMode::Automatic,
}

`SimplifyInfo` trait removed, `SimplifyContext` now uses builder-style API#

The SimplifyInfo trait has been removed and replaced with the concrete SimplifyContext struct. This simplifies the expression simplification API and removes the need for trait objects.

Who is affected:

Users who implemented custom SimplifyInfo implementations
Users who implemented ScalarUDFImpl::simplify() for custom scalar functions
Users who directly use SimplifyContext or ExprSimplifier

Breaking changes:

The SimplifyInfo trait has been removed entirely
SimplifyContext no longer takes &ExecutionProps - it now uses a builder-style API with direct fields
ScalarUDFImpl::simplify() now takes &SimplifyContext instead of &dyn SimplifyInfo
Time-dependent function simplification (e.g., now()) is now optional - if query_execution_start_time is None, these functions won’t be simplified

Migration guide:

If you implemented a custom SimplifyInfo:

Before:

impl SimplifyInfo for MySimplifyInfo {
    fn is_boolean_type(&self, expr: &Expr) -> Result<bool> { ... }
    fn nullable(&self, expr: &Expr) -> Result<bool> { ... }
    fn execution_props(&self) -> &ExecutionProps { ... }
    fn get_data_type(&self, expr: &Expr) -> Result<DataType> { ... }
}

After:

Use SimplifyContext directly with the builder-style API:

let context = SimplifyContext::default()
    .with_schema(schema)
    .with_config_options(config_options)
    .with_query_execution_start_time(Some(Utc::now())); // or use .with_current_time()

If you implemented ScalarUDFImpl::simplify():

Before:

fn simplify(
    &self,
    args: Vec<Expr>,
    info: &dyn SimplifyInfo,
) -> Result<ExprSimplifyResult> {
    let now_ts = info.execution_props().query_execution_start_time;
    // ...
}

After:

fn simplify(
    &self,
    args: Vec<Expr>,
    info: &SimplifyContext,
) -> Result<ExprSimplifyResult> {
    // query_execution_start_time is now Option<DateTime<Utc>>
    // Return Original if time is not set (simplification skipped)
    let Some(now_ts) = info.query_execution_start_time() else {
        return Ok(ExprSimplifyResult::Original(args));
    };
    // ...
}

If you created SimplifyContext from ExecutionProps:

Before:

let props = ExecutionProps::new();
let context = SimplifyContext::new(&props).with_schema(schema);

After:

let context = SimplifyContext::default()
    .with_schema(schema)
    .with_config_options(config_options)
    .with_current_time(); // Sets query_execution_start_time to Utc::now()

See SimplifyContext documentation for more details.

Struct Casting Now Requires Field Name Overlap#

DataFusion’s struct casting mechanism previously allowed casting between structs with differing field names if the field counts matched. This “positional fallback” behavior could silently misalign fields and cause data corruption.

Breaking Change:

Starting with DataFusion 53.0.0, struct casts now require at least one overlapping field name between the source and target structs. Casts without field name overlap are rejected at plan time with a clear error message.

Who is affected:

Applications that cast between structs with no overlapping field names
Queries that rely on positional struct field mapping (e.g., casting struct(x, y) to struct(a, b) based solely on position)
Code that constructs or transforms struct columns programmatically

Migration guide:

If you encounter an error like:

Cannot cast struct with 2 fields to 2 fields because there is no field name overlap

You must explicitly rename or map fields to ensure at least one field name matches. Here are common patterns:

Example 1: Source and target field names already match (Name-based casting)

Success case (field names align):

-- source_col has schema: STRUCT<x INT, y INT>
-- Casting to the same field names succeeds (no-op or type validation only)
SELECT CAST(source_col AS STRUCT<x INT, y INT>) FROM table1;

Example 2: Source and target field names differ (Migration scenario)

What fails now (no field name overlap):

-- source_col has schema: STRUCT<a INT, b INT>
-- This FAILS because there is no field name overlap:
-- ❌ SELECT CAST(source_col AS STRUCT<x INT, y INT>) FROM table1;
-- Error: Cannot cast struct with 2 fields to 2 fields because there is no field name overlap

Migration options (must align names):

Option A: Use struct constructor for explicit field mapping

-- source_col has schema: STRUCT<a INT, b INT>
-- Use STRUCT_CONSTRUCT with explicit field names
SELECT STRUCT_CONSTRUCT(
    'x', source_col.a,
    'y', source_col.b
) AS renamed_struct FROM table1;

Option B: Rename in the cast target to match source names

-- source_col has schema: STRUCT<a INT, b INT>
-- Cast to target with matching field names
SELECT CAST(source_col AS STRUCT<a INT, b INT>) FROM table1;

Example 3: Using struct constructors in Rust API

If you need to map fields programmatically, build the target struct explicitly:

// Build the target struct with explicit field names
let target_struct_type = DataType::Struct(vec![
    FieldRef::new("x", DataType::Int32),
    FieldRef::new("y", DataType::Utf8),
]);

// Use struct constructors rather than casting for field mapping
// This makes the field mapping explicit and unambiguous
// Use struct builders or row constructors that preserve your mapping logic

Why this change:

Safety: Field names are now the primary contract for struct compatibility
Explicitness: Prevents silent data misalignment caused by positional assumptions
Consistency: Matches DuckDB’s behavior and aligns with other SQL engines that enforce name-based matching
Debuggability: Errors now appear at plan time rather than as silent data corruption

See Issue #19841 and PR #19955 for more details.

`FilterExec` builder methods deprecated#

The following methods on FilterExec have been deprecated in favor of using FilterExecBuilder:

with_projection()
with_batch_size()

Who is affected:

Users who create FilterExec instances and use these methods to configure them

Migration guide:

Use FilterExecBuilder instead of chaining method calls on FilterExec:

Before:

let filter = FilterExec::try_new(predicate, input)?
    .with_projection(Some(vec![0, 2]))?
    .with_batch_size(8192)?;

After:

let filter = FilterExecBuilder::new(predicate, input)
    .with_projection(Some(vec![0, 2]))
    .with_batch_size(8192)
    .build()?;

The builder pattern is more efficient as it computes properties once during build() rather than recomputing them for each method call.

Note: with_default_selectivity() is not deprecated as it simply updates a field value and does not require the overhead of the builder pattern.

Protobuf conversion trait added#

A new trait, PhysicalProtoConverterExtension, has been added to the datafusion-proto crate. This is used for controlling the process of conversion of physical plans and expressions to and from their protobuf equivalents. The methods for conversion now require an additional parameter.

The primary APIs for interacting with this crate have not been modified, so most users should not need to make any changes. If you do require this trait, you can use the DefaultPhysicalProtoConverter implementation.

For example, to convert a sort expression protobuf node you can make the following updates:

Before:

let sort_expr = parse_physical_sort_expr(
    sort_proto,
    ctx,
    input_schema,
    codec,
);

After:

let converter = DefaultPhysicalProtoConverter {};
let sort_expr = parse_physical_sort_expr(
    sort_proto,
    ctx,
    input_schema,
    codec,
    &converter
);

Similarly to convert from a physical sort expression into a protobuf node:

Before:

let sort_proto = serialize_physical_sort_expr(
    sort_expr,
    codec,
);

After:

let converter = DefaultPhysicalProtoConverter {};
let sort_proto = serialize_physical_sort_expr(
    sort_expr,
    codec,
    &converter,
);

`generate_series` and `range` table functions changed#

The generate_series and range table functions now return an empty set when the interval is invalid, instead of an error. This behavior is consistent with systems like PostgreSQL.

Before:

> select * from generate_series(0, -1);
Error during planning: Start is bigger than end, but increment is positive: Cannot generate infinite series

> select * from range(0, -1);
Error during planning: Start is bigger than end, but increment is positive: Cannot generate infinite series

Now:

> select * from generate_series(0, -1);
+-------+
| value |
+-------+
+-------+
0 row(s) fetched.

> select * from range(0, -1);
+-------+
| value |
+-------+
+-------+
0 row(s) fetched.

Upgrade Guides#

DataFusion 53.0.0#

ExecutionPlan::properties now returns &Arc<PlanProperties>#

PlannerContext outer query schema API now uses a stack#

FileSinkConfig adds file_output_mode#

SimplifyInfo trait removed, SimplifyContext now uses builder-style API#

Struct Casting Now Requires Field Name Overlap#

FilterExec builder methods deprecated#

Protobuf conversion trait added#

generate_series and range table functions changed#

This Page

`ExecutionPlan::properties` now returns `&Arc<PlanProperties>`#

`PlannerContext` outer query schema API now uses a stack#

`FileSinkConfig` adds `file_output_mode`#

`SimplifyInfo` trait removed, `SimplifyContext` now uses builder-style API#

`FilterExec` builder methods deprecated#

`generate_series` and `range` table functions changed#