Upgrade Guides#
DataFusion 55.0.0#
Note: DataFusion 55.0.0 has not been released yet. The information provided
in this section pertains to features and changes that have already been merged
to the main branch and are awaiting release in this version.
Dialect::AVAILABLE replaced by Dialect::available()#
datafusion_common::config::Dialect::AVAILABLE has been removed. Use
Dialect::available() instead.
Decimal scalar formatting uses human-readable values#
Decimal scalar literals in EXPLAIN output, expression display strings, and
auto-generated column names now format the decimal value using its scale while
still showing the precision and scale. For example, a Decimal128 literal with
stored value 1, precision 1, and scale 1 is now rendered as
Decimal128(0.1,1,1) instead of Decimal128(Some(1),1,1). When formatting a
ScalarValue directly, it now appears as 0.1 instead of Some(1),1,1.
NULL decimal literals were previously shown as Decimal128(None,10,2); they
will now appear as Decimal128(NULL,10,2).
Query result values already used human-readable decimal formatting and are unchanged.
is_dynamic_physical_expr is deprecated#
datafusion_physical_expr_common::physical_expr::is_dynamic_physical_expr is
deprecated. It was a thin wrapper over snapshot_generation(expr) != 0 used to
ask “does this predicate contain a dynamic filter?”.
Prefer asking the question directly against the concrete type. For a one-off
check, downcast to DynamicFilterPhysicalExpr:
use datafusion_physical_expr::expressions::DynamicFilterPhysicalExpr;
use datafusion_common::tree_node::{TreeNode, TreeNodeRecursion};
let mut is_dynamic = false;
predicate.apply(|e| {
if e.downcast_ref::<DynamicFilterPhysicalExpr>().is_some() {
is_dynamic = true;
Ok(TreeNodeRecursion::Stop)
} else {
Ok(TreeNodeRecursion::Continue)
}
})?;
If you also need to know whether the dynamic filters can still change (and to be
notified when they do), use the new DynamicFilterTracking /
DynamicFilterTracker API in datafusion_physical_expr:
use datafusion_physical_expr::DynamicFilterTracking;
let tracking = DynamicFilterTracking::classify(&predicate);
if tracking.contains_dynamic_filter() {
// worth re-evaluating the predicate at runtime
}
FilePruner::try_new no longer builds a pruner for static predicates without statistics#
datafusion_pruning::FilePruner::try_new now returns None when the predicate
is purely static and the file carries no usable column statistics, because
such a pruner can never prune anything beyond what planning already did.
Previously it returned Some whenever a statistics struct was present (the
“is this worth pruning?” decision lived in the Parquet opener). Files with column
statistics, and predicates that carry a dynamic filter, are unaffected.
QueryPlanner adds Any as a supertrait#
To enable downcasting of dyn QueryPlanner to concrete query planner types (via
is::<T>() / downcast_ref::<T>()), the QueryPlanner trait now has Any
as a supertrait:
- pub trait QueryPlanner: Debug
+ pub trait QueryPlanner: Any + Debug
DdlStatement::CreateExternalTable and CreateFunction are now boxed#
The two largest variants of datafusion_expr::DdlStatement are now
Boxed:
// Before
pub enum DdlStatement {
CreateExternalTable(CreateExternalTable),
// ...
CreateFunction(CreateFunction),
// ...
}
// After
pub enum DdlStatement {
CreateExternalTable(Box<CreateExternalTable>),
// ...
CreateFunction(Box<CreateFunction>),
// ...
}
CreateExternalTable is 312 bytes and CreateFunction is 288 bytes, so
without boxing they forced the entire LogicalPlan enum to 320 bytes
even on SELECT-only query paths that never instantiate them. Boxing
shrinks LogicalPlan from 320 → 176 bytes (−45%), making every
mem::take / mem::swap / Arc<LogicalPlan> store on the planning
hot path move a smaller payload.
Who is affected:
Users who construct
DdlStatement::CreateExternalTable(...)orDdlStatement::CreateFunction(...)from an owned struct.Users who pattern-match these variants and destructure the inner struct in the same pattern (e.g.
DdlStatement::CreateExternalTable(CreateExternalTable { name, .. })).Code that consumes the inner struct out of these variants (e.g. to pass
CreateExternalTableby value to another function).
Migration guide:
When constructing the variants, wrap the inner struct in Box::new:
// Before
let stmt = DdlStatement::CreateFunction(CreateFunction { name, args, .. });
// After
let stmt = DdlStatement::CreateFunction(Box::new(CreateFunction {
name,
args,
..
}));
When pattern-matching, bind the boxed value and either access fields
through it (Rust auto-derefs the Box) or destructure via .as_ref():
// Before
match ddl {
DdlStatement::CreateExternalTable(CreateExternalTable {
name, location, ..
}) => { /* use name, location */ }
}
// After — access fields through the box
match ddl {
DdlStatement::CreateExternalTable(ce) => {
let name = &ce.name;
let location = &ce.location;
/* ... */
}
}
// After — destructure the dereferenced struct
match ddl {
DdlStatement::CreateExternalTable(ce) => {
let CreateExternalTable { name, location, .. } = ce.as_ref();
/* ... */
}
}
When you need an owned CreateExternalTable / CreateFunction out of
the variant, dereference the box with *:
// Before
match plan {
LogicalPlan::Ddl(DdlStatement::CreateExternalTable(cmd)) => Ok(cmd),
_ => { /* ... */ }
}
// After
match plan {
LogicalPlan::Ddl(DdlStatement::CreateExternalTable(cmd)) => Ok(*cmd),
_ => { /* ... */ }
}
See PR #22733 for details, including the per-variant size breakdown and benchmark results.
Spark map functions now reject duplicate keys by default#
The Spark-compatibility map-construction functions (map_from_arrays,
map_from_entries, str_to_map) now raise [DUPLICATED_MAP_KEY] at runtime
when constructing a map that contains duplicate keys. This matches the default
of Spark’s spark.sql.mapKeyDedupPolicy.
A new config option, datafusion.spark.map_key_dedup_policy, controls the
behavior:
EXCEPTION(default): raise on any duplicate key.LAST_WIN: keep the last occurrence of each duplicate key. The key stays at its first-seen position with the value from its last occurrence (matching Spark’sArrayBasedMapBuilder).
Who is affected:
Queries calling
map_from_arraysorstr_to_mapon data that contains duplicate keys. Previously these functions either tolerated duplicates silently or raised a non-configurable error.
Migration guide:
To restore lenient duplicate-key handling, set the policy to LAST_WIN:
SET datafusion.spark.map_key_dedup_policy = 'LAST_WIN';
See PR #21720 for details.