# Upgrade Guides ## DataFusion 53.0.0 **Note:** DataFusion `53.0.0` has not been released yet. The information provided *in this section pertains to features and changes that have already been merged *to the main branch and are awaiting release in this version. See [#19692] for \*more details. [#19692]: https://github.com/apache/datafusion/issues/19692 ### `FileSinkConfig` adds `file_output_mode` `FileSinkConfig` now includes a `file_output_mode: FileOutputMode` field to control single-file vs directory output behavior. Any code constructing `FileSinkConfig` via struct literals must initialize this field. The `FileOutputMode` enum has three variants: - `Automatic` (default): Infer output mode from the URL (extension/trailing `/` heuristic) - `SingleFile`: Write to a single file at the exact output path - `Directory`: Write to a directory with generated filenames **Before:** ```rust,ignore FileSinkConfig { // ... file_extension: "parquet".into(), } ``` **After:** ```rust,ignore use datafusion_datasource::file_sink_config::FileOutputMode; FileSinkConfig { // ... file_extension: "parquet".into(), file_output_mode: FileOutputMode::Automatic, } ``` ### `SimplifyInfo` trait removed, `SimplifyContext` now uses builder-style API The `SimplifyInfo` trait has been removed and replaced with the concrete `SimplifyContext` struct. This simplifies the expression simplification API and removes the need for trait objects. **Who is affected:** - Users who implemented custom `SimplifyInfo` implementations - Users who implemented `ScalarUDFImpl::simplify()` for custom scalar functions - Users who directly use `SimplifyContext` or `ExprSimplifier` **Breaking changes:** 1. The `SimplifyInfo` trait has been removed entirely 2. `SimplifyContext` no longer takes `&ExecutionProps` - it now uses a builder-style API with direct fields 3. `ScalarUDFImpl::simplify()` now takes `&SimplifyContext` instead of `&dyn SimplifyInfo` 4. Time-dependent function simplification (e.g., `now()`) is now optional - if `query_execution_start_time` is `None`, these functions won't be simplified **Migration guide:** If you implemented a custom `SimplifyInfo`: **Before:** ```rust,ignore impl SimplifyInfo for MySimplifyInfo { fn is_boolean_type(&self, expr: &Expr) -> Result { ... } fn nullable(&self, expr: &Expr) -> Result { ... } fn execution_props(&self) -> &ExecutionProps { ... } fn get_data_type(&self, expr: &Expr) -> Result { ... } } ``` **After:** Use `SimplifyContext` directly with the builder-style API: ```rust,ignore let context = SimplifyContext::default() .with_schema(schema) .with_config_options(config_options) .with_query_execution_start_time(Some(Utc::now())); // or use .with_current_time() ``` If you implemented `ScalarUDFImpl::simplify()`: **Before:** ```rust,ignore fn simplify( &self, args: Vec, info: &dyn SimplifyInfo, ) -> Result { let now_ts = info.execution_props().query_execution_start_time; // ... } ``` **After:** ```rust,ignore fn simplify( &self, args: Vec, info: &SimplifyContext, ) -> Result { // query_execution_start_time is now Option> // Return Original if time is not set (simplification skipped) let Some(now_ts) = info.query_execution_start_time() else { return Ok(ExprSimplifyResult::Original(args)); }; // ... } ``` If you created `SimplifyContext` from `ExecutionProps`: **Before:** ```rust,ignore let props = ExecutionProps::new(); let context = SimplifyContext::new(&props).with_schema(schema); ``` **After:** ```rust,ignore let context = SimplifyContext::default() .with_schema(schema) .with_config_options(config_options) .with_current_time(); // Sets query_execution_start_time to Utc::now() ``` See [`SimplifyContext` documentation](https://docs.rs/datafusion-expr/latest/datafusion_expr/simplify/struct.SimplifyContext.html) for more details. ### Struct Casting Now Requires Field Name Overlap DataFusion's struct casting mechanism previously allowed casting between structs with differing field names if the field counts matched. This "positional fallback" behavior could silently misalign fields and cause data corruption. **Breaking Change:** Starting with DataFusion 53.0.0, struct casts now require **at least one overlapping field name** between the source and target structs. Casts without field name overlap are rejected at plan time with a clear error message. **Who is affected:** - Applications that cast between structs with no overlapping field names - Queries that rely on positional struct field mapping (e.g., casting `struct(x, y)` to `struct(a, b)` based solely on position) - Code that constructs or transforms struct columns programmatically **Migration guide:** If you encounter an error like: ```text Cannot cast struct with 2 fields to 2 fields because there is no field name overlap ``` You must explicitly rename or map fields to ensure at least one field name matches. Here are common patterns: **Example 1: Source and target field names already match (Name-based casting)** **Success case (field names align):** ```sql -- source_col has schema: STRUCT -- Casting to the same field names succeeds (no-op or type validation only) SELECT CAST(source_col AS STRUCT) FROM table1; ``` **Example 2: Source and target field names differ (Migration scenario)** **What fails now (no field name overlap):** ```sql -- source_col has schema: STRUCT -- This FAILS because there is no field name overlap: -- ❌ SELECT CAST(source_col AS STRUCT) FROM table1; -- Error: Cannot cast struct with 2 fields to 2 fields because there is no field name overlap ``` **Migration options (must align names):** **Option A: Use struct constructor for explicit field mapping** ```sql -- source_col has schema: STRUCT -- Use STRUCT_CONSTRUCT with explicit field names SELECT STRUCT_CONSTRUCT( 'x', source_col.a, 'y', source_col.b ) AS renamed_struct FROM table1; ``` **Option B: Rename in the cast target to match source names** ```sql -- source_col has schema: STRUCT -- Cast to target with matching field names SELECT CAST(source_col AS STRUCT) FROM table1; ``` **Example 3: Using struct constructors in Rust API** If you need to map fields programmatically, build the target struct explicitly: ```rust,ignore // Build the target struct with explicit field names let target_struct_type = DataType::Struct(vec![ FieldRef::new("x", DataType::Int32), FieldRef::new("y", DataType::Utf8), ]); // Use struct constructors rather than casting for field mapping // This makes the field mapping explicit and unambiguous // Use struct builders or row constructors that preserve your mapping logic ``` **Why this change:** 1. **Safety:** Field names are now the primary contract for struct compatibility 2. **Explicitness:** Prevents silent data misalignment caused by positional assumptions 3. **Consistency:** Matches DuckDB's behavior and aligns with other SQL engines that enforce name-based matching 4. **Debuggability:** Errors now appear at plan time rather than as silent data corruption See [Issue #19841](https://github.com/apache/datafusion/issues/19841) and [PR #19955](https://github.com/apache/datafusion/pull/19955) for more details. ### `FilterExec` builder methods deprecated The following methods on `FilterExec` have been deprecated in favor of using `FilterExecBuilder`: - `with_projection()` - `with_batch_size()` **Who is affected:** - Users who create `FilterExec` instances and use these methods to configure them **Migration guide:** Use `FilterExecBuilder` instead of chaining method calls on `FilterExec`: **Before:** ```rust,ignore let filter = FilterExec::try_new(predicate, input)? .with_projection(Some(vec![0, 2]))? .with_batch_size(8192)?; ``` **After:** ```rust,ignore let filter = FilterExecBuilder::new(predicate, input) .with_projection(Some(vec![0, 2])) .with_batch_size(8192) .build()?; ``` The builder pattern is more efficient as it computes properties once during `build()` rather than recomputing them for each method call. Note: `with_default_selectivity()` is not deprecated as it simply updates a field value and does not require the overhead of the builder pattern. ### Protobuf conversion trait added A new trait, `PhysicalProtoConverterExtension`, has been added to the `datafusion-proto` crate. This is used for controlling the process of conversion of physical plans and expressions to and from their protobuf equivalents. The methods for conversion now require an additional parameter. The primary APIs for interacting with this crate have not been modified, so most users should not need to make any changes. If you do require this trait, you can use the `DefaultPhysicalProtoConverter` implementation. For example, to convert a sort expression protobuf node you can make the following updates: **Before:** ```rust,ignore let sort_expr = parse_physical_sort_expr( sort_proto, ctx, input_schema, codec, ); ``` **After:** ```rust,ignore let converter = DefaultPhysicalProtoConverter {}; let sort_expr = parse_physical_sort_expr( sort_proto, ctx, input_schema, codec, &converter ); ``` Similarly to convert from a physical sort expression into a protobuf node: **Before:** ```rust,ignore let sort_proto = serialize_physical_sort_expr( sort_expr, codec, ); ``` **After:** ```rust,ignore let converter = DefaultPhysicalProtoConverter {}; let sort_proto = serialize_physical_sort_expr( sort_expr, codec, &converter, ); ``` ### `generate_series` and `range` table functions changed The `generate_series` and `range` table functions now return an empty set when the interval is invalid, instead of an error. This behavior is consistent with systems like PostgreSQL. Before: ```sql > select * from generate_series(0, -1); Error during planning: Start is bigger than end, but increment is positive: Cannot generate infinite series > select * from range(0, -1); Error during planning: Start is bigger than end, but increment is positive: Cannot generate infinite series ``` Now: ```sql > select * from generate_series(0, -1); +-------+ | value | +-------+ +-------+ 0 row(s) fetched. > select * from range(0, -1); +-------+ | value | +-------+ +-------+ 0 row(s) fetched. ```