datafusion.udf¶
Provides the user defined functions for evaluation of dataframes.
Attributes¶
Classes¶
Defines how an |
|
Class for performing scalar user defined functions (UDF). |
|
Class for performing scalar user defined functions (UDF). |
|
Defines how stable or volatile a function is. |
Module Contents¶
- class datafusion.udf.Accumulator¶
Defines how an
AggregateUDF
accumulates values.- abstract evaluate() pyarrow.Scalar ¶
Return the resultant value.
- abstract merge(states: List[pyarrow.Array]) None ¶
Merge a set of states.
- abstract state() List[pyarrow.Scalar] ¶
Return the current state.
- abstract update(values: pyarrow.Array) None ¶
Evaluate an array of values and update state.
- class datafusion.udf.AggregateUDF(name: str | None, accumulator: _A, input_types: list[pyarrow.DataType], return_type: _R, state_type: list[pyarrow.DataType], volatility: Volatility | str)¶
Class for performing scalar user defined functions (UDF).
Aggregate UDFs operate on a group of rows and return a single value. See also
ScalarUDF
for operating on a row by row basis.Instantiate a user defined aggregate function (UDAF).
See
udaf()
for a convenience function and argument descriptions.- __call__(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Execute the UDAF.
This function is not typically called by an end user. These calls will occur during the evaluation of the dataframe.
- static udaf(accum: _A, input_types: list[pyarrow.DataType], return_type: _R, state_type: list[pyarrow.DataType], volatility: Volatility | str, name: str | None = None) AggregateUDF ¶
Create a new User Defined Aggregate Function.
The accumulator function must be callable and implement
Accumulator
.- Parameters:
accum – The accumulator python function.
input_types – The data types of the arguments to
accum
.return_type – The data type of the return value.
state_type – The data types of the intermediate accumulation.
volatility – See
Volatility
for allowed values.name – A descriptive name for the function.
- Returns:
A user defined aggregate function, which can be used in either data aggregation or window function calls.
- _udf¶
- class datafusion.udf.ScalarUDF(name: str | None, func: Callable[Ellipsis, _R], input_types: list[pyarrow.DataType], return_type: _R, volatility: Volatility | str)¶
Class for performing scalar user defined functions (UDF).
Scalar UDFs operate on a row by row basis. See also
AggregateUDF
for operating on a group of rows.Instantiate a scalar user defined function (UDF).
See helper method
udf()
for argument details.- __call__(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Execute the UDF.
This function is not typically called by an end user. These calls will occur during the evaluation of the dataframe.
- static udf(func: Callable[Ellipsis, _R], input_types: list[pyarrow.DataType], return_type: _R, volatility: Volatility | str, name: str | None = None) ScalarUDF ¶
Create a new User Defined Function.
- Parameters:
func – A callable python function.
input_types – The data types of the arguments to
func
. This list must be of the same length as the number of arguments.return_type – The data type of the return value from the python function.
volatility – See
Volatility
for allowed values.name – A descriptive name for the function.
- Returns:
- A user defined aggregate function, which can be used in either data
aggregation or window function calls.
- _udf¶
- class datafusion.udf.Volatility(*args, **kwds)¶
Bases:
enum.Enum
Defines how stable or volatile a function is.
When setting the volatility of a function, you can either pass this enumeration or a
str
. Thestr
equivalent is the lower case value of the name (“immutable”, “stable”, or “volatile”).- __str__()¶
Returns the string equivalent.
- Immutable = 1¶
An immutable function will always return the same output when given the same input.
DataFusion will attempt to inline immutable functions during planning.
- Stable = 2¶
Returns the same value for a given input within a single queries.
A stable function may return different values given the same input across different queries but must return the same value for a given input within a query. An example of this is the
Now
function. DataFusion will attempt to inlineStable
functions during planning, when possible. For queryselect col1, now() from t1
, it might take a while to execute butnow()
column will be the same for each output row, which is evaluated during planning.
- Volatile = 3¶
A volatile function may change the return value from evaluation to evaluation.
Multiple invocations of a volatile function may return different results when used in the same query. An example of this is the random() function. DataFusion can not evaluate such functions during planning. In the query
select col1, random() from t1
,random()
function will be evaluated for each output row, resulting in a unique random value for each row.
- datafusion.udf._A¶
- datafusion.udf._R¶