Upgrade Guides#
DataFusion 53.0.0#
Note: DataFusion 53.0.0 has not been released yet. The information provided
*in this section pertains to features and changes that have already been merged
*to the main branch and are awaiting release in this version. See #19692 for
*more details.
FileSinkConfig adds file_output_mode#
FileSinkConfig now includes a file_output_mode: FileOutputMode field to control
single-file vs directory output behavior. Any code constructing FileSinkConfig via struct
literals must initialize this field.
The FileOutputMode enum has three variants:
Automatic(default): Infer output mode from the URL (extension/trailing/heuristic)SingleFile: Write to a single file at the exact output pathDirectory: Write to a directory with generated filenames
Before:
FileSinkConfig {
// ...
file_extension: "parquet".into(),
}
After:
use datafusion_datasource::file_sink_config::FileOutputMode;
FileSinkConfig {
// ...
file_extension: "parquet".into(),
file_output_mode: FileOutputMode::Automatic,
}
SimplifyInfo trait removed, SimplifyContext now uses builder-style API#
The SimplifyInfo trait has been removed and replaced with the concrete SimplifyContext struct. This simplifies the expression simplification API and removes the need for trait objects.
Who is affected:
Users who implemented custom
SimplifyInfoimplementationsUsers who implemented
ScalarUDFImpl::simplify()for custom scalar functionsUsers who directly use
SimplifyContextorExprSimplifier
Breaking changes:
The
SimplifyInfotrait has been removed entirelySimplifyContextno longer takes&ExecutionProps- it now uses a builder-style API with direct fieldsScalarUDFImpl::simplify()now takes&SimplifyContextinstead of&dyn SimplifyInfoTime-dependent function simplification (e.g.,
now()) is now optional - ifquery_execution_start_timeisNone, these functions won’t be simplified
Migration guide:
If you implemented a custom SimplifyInfo:
Before:
impl SimplifyInfo for MySimplifyInfo {
fn is_boolean_type(&self, expr: &Expr) -> Result<bool> { ... }
fn nullable(&self, expr: &Expr) -> Result<bool> { ... }
fn execution_props(&self) -> &ExecutionProps { ... }
fn get_data_type(&self, expr: &Expr) -> Result<DataType> { ... }
}
After:
Use SimplifyContext directly with the builder-style API:
let context = SimplifyContext::default()
.with_schema(schema)
.with_config_options(config_options)
.with_query_execution_start_time(Some(Utc::now())); // or use .with_current_time()
If you implemented ScalarUDFImpl::simplify():
Before:
fn simplify(
&self,
args: Vec<Expr>,
info: &dyn SimplifyInfo,
) -> Result<ExprSimplifyResult> {
let now_ts = info.execution_props().query_execution_start_time;
// ...
}
After:
fn simplify(
&self,
args: Vec<Expr>,
info: &SimplifyContext,
) -> Result<ExprSimplifyResult> {
// query_execution_start_time is now Option<DateTime<Utc>>
// Return Original if time is not set (simplification skipped)
let Some(now_ts) = info.query_execution_start_time() else {
return Ok(ExprSimplifyResult::Original(args));
};
// ...
}
If you created SimplifyContext from ExecutionProps:
Before:
let props = ExecutionProps::new();
let context = SimplifyContext::new(&props).with_schema(schema);
After:
let context = SimplifyContext::default()
.with_schema(schema)
.with_config_options(config_options)
.with_current_time(); // Sets query_execution_start_time to Utc::now()
See SimplifyContext documentation for more details.
Struct Casting Now Requires Field Name Overlap#
DataFusion’s struct casting mechanism previously allowed casting between structs with differing field names if the field counts matched. This “positional fallback” behavior could silently misalign fields and cause data corruption.
Breaking Change:
Starting with DataFusion 53.0.0, struct casts now require at least one overlapping field name between the source and target structs. Casts without field name overlap are rejected at plan time with a clear error message.
Who is affected:
Applications that cast between structs with no overlapping field names
Queries that rely on positional struct field mapping (e.g., casting
struct(x, y)tostruct(a, b)based solely on position)Code that constructs or transforms struct columns programmatically
Migration guide:
If you encounter an error like:
Cannot cast struct with 2 fields to 2 fields because there is no field name overlap
You must explicitly rename or map fields to ensure at least one field name matches. Here are common patterns:
Example 1: Source and target field names already match (Name-based casting)
Success case (field names align):
-- source_col has schema: STRUCT<x INT, y INT>
-- Casting to the same field names succeeds (no-op or type validation only)
SELECT CAST(source_col AS STRUCT<x INT, y INT>) FROM table1;
Example 2: Source and target field names differ (Migration scenario)
What fails now (no field name overlap):
-- source_col has schema: STRUCT<a INT, b INT>
-- This FAILS because there is no field name overlap:
-- ❌ SELECT CAST(source_col AS STRUCT<x INT, y INT>) FROM table1;
-- Error: Cannot cast struct with 2 fields to 2 fields because there is no field name overlap
Migration options (must align names):
Option A: Use struct constructor for explicit field mapping
-- source_col has schema: STRUCT<a INT, b INT>
-- Use STRUCT_CONSTRUCT with explicit field names
SELECT STRUCT_CONSTRUCT(
'x', source_col.a,
'y', source_col.b
) AS renamed_struct FROM table1;
Option B: Rename in the cast target to match source names
-- source_col has schema: STRUCT<a INT, b INT>
-- Cast to target with matching field names
SELECT CAST(source_col AS STRUCT<a INT, b INT>) FROM table1;
Example 3: Using struct constructors in Rust API
If you need to map fields programmatically, build the target struct explicitly:
// Build the target struct with explicit field names
let target_struct_type = DataType::Struct(vec![
FieldRef::new("x", DataType::Int32),
FieldRef::new("y", DataType::Utf8),
]);
// Use struct constructors rather than casting for field mapping
// This makes the field mapping explicit and unambiguous
// Use struct builders or row constructors that preserve your mapping logic
Why this change:
Safety: Field names are now the primary contract for struct compatibility
Explicitness: Prevents silent data misalignment caused by positional assumptions
Consistency: Matches DuckDB’s behavior and aligns with other SQL engines that enforce name-based matching
Debuggability: Errors now appear at plan time rather than as silent data corruption
See Issue #19841 and PR #19955 for more details.
FilterExec builder methods deprecated#
The following methods on FilterExec have been deprecated in favor of using FilterExecBuilder:
with_projection()with_batch_size()
Who is affected:
Users who create
FilterExecinstances and use these methods to configure them
Migration guide:
Use FilterExecBuilder instead of chaining method calls on FilterExec:
Before:
let filter = FilterExec::try_new(predicate, input)?
.with_projection(Some(vec![0, 2]))?
.with_batch_size(8192)?;
After:
let filter = FilterExecBuilder::new(predicate, input)
.with_projection(Some(vec![0, 2]))
.with_batch_size(8192)
.build()?;
The builder pattern is more efficient as it computes properties once during build() rather than recomputing them for each method call.
Note: with_default_selectivity() is not deprecated as it simply updates a field value and does not require the overhead of the builder pattern.
Protobuf conversion trait added#
A new trait, PhysicalProtoConverterExtension, has been added to the datafusion-proto
crate. This is used for controlling the process of conversion of physical plans and
expressions to and from their protobuf equivalents. The methods for conversion now
require an additional parameter.
The primary APIs for interacting with this crate have not been modified, so most users
should not need to make any changes. If you do require this trait, you can use the
DefaultPhysicalProtoConverter implementation.
For example, to convert a sort expression protobuf node you can make the following updates:
Before:
let sort_expr = parse_physical_sort_expr(
sort_proto,
ctx,
input_schema,
codec,
);
After:
let converter = DefaultPhysicalProtoConverter {};
let sort_expr = parse_physical_sort_expr(
sort_proto,
ctx,
input_schema,
codec,
&converter
);
Similarly to convert from a physical sort expression into a protobuf node:
Before:
let sort_proto = serialize_physical_sort_expr(
sort_expr,
codec,
);
After:
let converter = DefaultPhysicalProtoConverter {};
let sort_proto = serialize_physical_sort_expr(
sort_expr,
codec,
&converter,
);
generate_series and range table functions changed#
The generate_series and range table functions now return an empty set when the interval is invalid, instead of an error.
This behavior is consistent with systems like PostgreSQL.
Before:
> select * from generate_series(0, -1);
Error during planning: Start is bigger than end, but increment is positive: Cannot generate infinite series
> select * from range(0, -1);
Error during planning: Start is bigger than end, but increment is positive: Cannot generate infinite series
Now:
> select * from generate_series(0, -1);
+-------+
| value |
+-------+
+-------+
0 row(s) fetched.
> select * from range(0, -1);
+-------+
| value |
+-------+
+-------+
0 row(s) fetched.