# Upgrade Guides ## DataFusion 46.0.0 ### Use `invoke_with_args` instead of `invoke()` and `invoke_batch()` DataFusion is moving to a consistent API for invoking ScalarUDFs, [`ScalarUDFImpl::invoke_with_args()`], and deprecating [`ScalarUDFImpl::invoke()`], [`ScalarUDFImpl::invoke_batch()`], and [`ScalarUDFImpl::invoke_no_args()`] If you see errors such as the following it means the older APIs are being used: ```text This feature is not implemented: Function concat does not implement invoke but called ``` To fix this error, use [`ScalarUDFImpl::invoke_with_args()`] instead, as shown below. See [PR 14876] for an example. Given existing code like this: ```rust # /* comment to avoid running impl ScalarUDFImpl for SparkConcat { ... fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result { if args .iter() .any(|arg| matches!(arg.data_type(), DataType::List(_))) { ArrayConcat::new().invoke_batch(args, number_rows) } else { ConcatFunc::new().invoke_batch(args, number_rows) } } } # */ ``` To ```rust # /* comment to avoid running impl ScalarUDFImpl for SparkConcat { ... fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result { if args .args .iter() .any(|arg| matches!(arg.data_type(), DataType::List(_))) { ArrayConcat::new().invoke_with_args(args) } else { ConcatFunc::new().invoke_with_args(args) } } } # */ ``` [`scalarudfimpl::invoke()`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.invoke [`scalarudfimpl::invoke_batch()`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.invoke_batch [`scalarudfimpl::invoke_no_args()`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.invoke_no_args [`scalarudfimpl::invoke_with_args()`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.ScalarUDFImpl.html#method.invoke_with_args [pr 14876]: https://github.com/apache/datafusion/pull/14876 ### `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` deprecated DataFusion 46 has a major change to how the built in DataSources are organized. Instead of individual `ExecutionPlan`s for the different file formats they now all use `DataSourceExec` and the format specific information is embodied in new traits `DataSource` and `FileSource`. Here is more information about - [Design Ticket] - Change PR [PR #14224] - Example of an Upgrade [PR in delta-rs] [design ticket]: https://github.com/apache/datafusion/issues/13838 [pr #14224]: https://github.com/apache/datafusion/pull/14224 [pr in delta-rs]: https://github.com/delta-io/delta-rs/pull/3261 ### Cookbook: Changes to `ParquetExecBuilder` Code that looks for `ParquetExec` like this will no longer work: ```rust # /* comment to avoid running if let Some(parquet_exec) = plan.as_any().downcast_ref::() { // Do something with ParquetExec here } # */ ``` Instead, with `DataSourceExec`, the same information is now on `FileScanConfig` and `ParquetSource`. The equivalent code is ```rust # /* comment to avoid running if let Some(datasource_exec) = plan.as_any().downcast_ref::() { if let Some(scan_config) = datasource_exec.data_source().as_any().downcast_ref::() { // FileGroups, and other information is on the FileScanConfig // parquet if let Some(parquet_source) = scan_config.file_source.as_any().downcast_ref::() { // Information on PruningPredicates and parquet options are here } } # */ ``` ### Cookbook: Changes to `ParquetExecBuilder` Likewise code that builds `ParquetExec` using the `ParquetExecBuilder` such as the following must be changed: ```rust # /* comment to avoid running let mut exec_plan_builder = ParquetExecBuilder::new( FileScanConfig::new(self.log_store.object_store_url(), file_schema) .with_projection(self.projection.cloned()) .with_limit(self.limit) .with_table_partition_cols(table_partition_cols), ) .with_schema_adapter_factory(Arc::new(DeltaSchemaAdapterFactory {})) .with_table_parquet_options(parquet_options); // Add filter if let Some(predicate) = logical_filter { if config.enable_parquet_pushdown { exec_plan_builder = exec_plan_builder.with_predicate(predicate); } }; # */ ``` New code should use `FileScanConfig` to build the appropriate `DataSourceExec`: ```rust # /* comment to avoid running let mut file_source = ParquetSource::new(parquet_options) .with_schema_adapter_factory(Arc::new(DeltaSchemaAdapterFactory {})); // Add filter if let Some(predicate) = logical_filter { if config.enable_parquet_pushdown { file_source = file_source.with_predicate(predicate); } }; let file_scan_config = FileScanConfig::new( self.log_store.object_store_url(), file_schema, Arc::new(file_source), ) .with_statistics(stats) .with_projection(self.projection.cloned()) .with_limit(self.limit) .with_table_partition_cols(table_partition_cols); // Build the actual scan like this parquet_scan: file_scan_config.build(), # */ ``` ### `datafusion-cli` no longer automatically unescapes strings `datafusion-cli` previously would incorrectly unescape string literals (see [ticket] for more details). To escape `'` in SQL literals, use `''`: ```sql > select 'it''s escaped'; +----------------------+ | Utf8("it's escaped") | +----------------------+ | it's escaped | +----------------------+ 1 row(s) fetched. ``` To include special characters (such as newlines via `\n`) you can use an `E` literal string. For example ```sql > select 'foo\nbar'; +------------------+ | Utf8("foo\nbar") | +------------------+ | foo\nbar | +------------------+ 1 row(s) fetched. Elapsed 0.005 seconds. ``` ### Changes to array scalar function signatures DataFusion 46 has changed the way scalar array function signatures are declared. Previously, functions needed to select from a list of predefined signatures within the `ArrayFunctionSignature` enum. Now the signatures can be defined via a `Vec` of pseudo-types, which each correspond to a single argument. Those pseudo-types are the variants of the `ArrayFunctionArgument` enum and are as follows: - `Array`: An argument of type List/LargeList/FixedSizeList. All Array arguments must be coercible to the same type. - `Element`: An argument that is coercible to the inner type of the `Array` arguments. - `Index`: An `Int64` argument. Each of the old variants can be converted to the new format as follows: `TypeSignature::ArraySignature(ArrayFunctionSignature::ArrayAndElement)`: ```rust # use datafusion::common::utils::ListCoercion; # use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature}; TypeSignature::ArraySignature(ArrayFunctionSignature::Array { arguments: vec![ArrayFunctionArgument::Array, ArrayFunctionArgument::Element], array_coercion: Some(ListCoercion::FixedSizedListToList), }); ``` `TypeSignature::ArraySignature(ArrayFunctionSignature::ElementAndArray)`: ```rust # use datafusion::common::utils::ListCoercion; # use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature}; TypeSignature::ArraySignature(ArrayFunctionSignature::Array { arguments: vec![ArrayFunctionArgument::Element, ArrayFunctionArgument::Array], array_coercion: Some(ListCoercion::FixedSizedListToList), }); ``` `TypeSignature::ArraySignature(ArrayFunctionSignature::ArrayAndIndex)`: ```rust # use datafusion::common::utils::ListCoercion; # use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature}; TypeSignature::ArraySignature(ArrayFunctionSignature::Array { arguments: vec![ArrayFunctionArgument::Array, ArrayFunctionArgument::Index], array_coercion: None, }); ``` `TypeSignature::ArraySignature(ArrayFunctionSignature::ArrayAndElementAndOptionalIndex)`: ```rust # use datafusion::common::utils::ListCoercion; # use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature}; TypeSignature::OneOf(vec![ TypeSignature::ArraySignature(ArrayFunctionSignature::Array { arguments: vec![ArrayFunctionArgument::Array, ArrayFunctionArgument::Element], array_coercion: None, }), TypeSignature::ArraySignature(ArrayFunctionSignature::Array { arguments: vec![ ArrayFunctionArgument::Array, ArrayFunctionArgument::Element, ArrayFunctionArgument::Index, ], array_coercion: None, }), ]); ``` `TypeSignature::ArraySignature(ArrayFunctionSignature::Array)`: ```rust # use datafusion::common::utils::ListCoercion; # use datafusion_expr_common::signature::{ArrayFunctionArgument, ArrayFunctionSignature, TypeSignature}; TypeSignature::ArraySignature(ArrayFunctionSignature::Array { arguments: vec![ArrayFunctionArgument::Array], array_coercion: None, }); ``` Alternatively, you can switch to using one of the following functions which take care of constructing the `TypeSignature` for you: - `Signature::array_and_element` - `Signature::array_and_element_and_optional_index` - `Signature::array_and_index` - `Signature::array` [ticket]: https://github.com/apache/datafusion/issues/13286