datafusion.expr¶
This module supports expressions, one of the core concepts in DataFusion.
See Expressions in the online documentation for more details.
Attributes¶
Classes¶
Builder class for constructing case statements. |
|
Expression object. |
|
Used to specify sorting on either a DataFrame or function. |
|
Define reusable window parameters. |
|
Defines a window frame for performing window operations. |
|
Defines a single window frame bound. |
Module Contents¶
- class datafusion.expr.CaseBuilder(case_builder: datafusion._internal.expr.CaseBuilder)¶
Builder class for constructing case statements.
An example usage would be as follows:
import datafusion.functions as f from datafusion import lit, col df.select( f.case(col("column_a") .when(lit(1), lit("One")) .when(lit(2), lit("Two")) .otherwise(lit("Unknown")) )
Constructs a case builder.
This is not typically called by the end user directly. See
datafusion.functions.case()
instead.- when(when_expr: Expr, then_expr: Expr) CaseBuilder ¶
Add a case to match against.
- case_builder¶
- class datafusion.expr.Expr(expr: datafusion._internal.expr.Expr)¶
Expression object.
Expressions are one of the core concepts in DataFusion. See Expressions in the online documentation for more information.
This constructor should not be called by the end user.
- __add__(rhs: Any) Expr ¶
Addition operator.
Accepts either an expression or any valid PyArrow scalar literal value.
- __eq__(rhs: Any) Expr ¶
Equal to.
Accepts either an expression or any valid PyArrow scalar literal value.
- __ge__(rhs: Any) Expr ¶
Greater than or equal to.
Accepts either an expression or any valid PyArrow scalar literal value.
- __getitem__(key: str | int) Expr ¶
Retrieve sub-object.
If
key
is a string, returns the subfield of the struct. Ifkey
is an integer, retrieves the element in the array. Note that the element index begins at0
, unlike array_element which begins at1
.
- __gt__(rhs: Any) Expr ¶
Greater than.
Accepts either an expression or any valid PyArrow scalar literal value.
- __le__(rhs: Any) Expr ¶
Less than or equal to.
Accepts either an expression or any valid PyArrow scalar literal value.
- __lt__(rhs: Any) Expr ¶
Less than.
Accepts either an expression or any valid PyArrow scalar literal value.
- __mod__(rhs: Any) Expr ¶
Modulo operator (%).
Accepts either an expression or any valid PyArrow scalar literal value.
- __mul__(rhs: Any) Expr ¶
Multiplication operator.
Accepts either an expression or any valid PyArrow scalar literal value.
- __ne__(rhs: Any) Expr ¶
Not equal to.
Accepts either an expression or any valid PyArrow scalar literal value.
- __repr__() str ¶
Generate a string representation of this expression.
- __sub__(rhs: Any) Expr ¶
Subtraction operator.
Accepts either an expression or any valid PyArrow scalar literal value.
- __truediv__(rhs: Any) Expr ¶
Division operator.
Accepts either an expression or any valid PyArrow scalar literal value.
- between(low: Any, high: Any, negated: bool = False) Expr ¶
Returns
True
if this expression is between a given range.- Parameters:
low – lower bound of the range (inclusive).
high – higher bound of the range (inclusive).
negated – negates whether the expression is between a given range
- canonical_name() str ¶
Returns a complete string representation of this expression.
- cast(to: pyarrow.DataType[Any] | Type[float] | Type[int] | Type[str] | Type[bool]) Expr ¶
Cast to a new data type.
- column_name(plan: datafusion.plan.LogicalPlan) str ¶
Compute the output column name based on the provided logical plan.
- display_name() str ¶
Returns the name of this expression as it should appear in a schema.
This name will not include any CAST expressions.
- distinct() ExprFuncBuilder ¶
Only evaluate distinct values for an aggregate function.
This function will create an
ExprFuncBuilder
that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated whenbuild()
is called.
- filter(filter: Expr) ExprFuncBuilder ¶
Filter an aggregate function.
This function will create an
ExprFuncBuilder
that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated whenbuild()
is called.
- static literal(value: Any) Expr ¶
Creates a new expression representing a scalar value.
value
must be a valid PyArrow scalar value or easily castable to one.
- null_treatment(null_treatment: datafusion.common.NullTreatment) ExprFuncBuilder ¶
Set the treatment for
null
values for a window or aggregate function.This function will create an
ExprFuncBuilder
that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated whenbuild()
is called.
- order_by(*exprs: Expr | SortExpr) ExprFuncBuilder ¶
Set the ordering for a window or aggregate function.
This function will create an
ExprFuncBuilder
that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated whenbuild()
is called.
- over(window: Window) Expr ¶
Turn an aggregate function into a window function.
This function turns any aggregate function into a window function. With the exception of
partition_by
, how each of the parameters is used is determined by the underlying aggregate function.- Parameters:
window – Window definition
- partition_by(*partition_by: Expr) ExprFuncBuilder ¶
Set the partitioning for a window function.
This function will create an
ExprFuncBuilder
that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated whenbuild()
is called.
- python_value() Any ¶
Extracts the Expr value into a PyObject.
This is only valid for literal expressions.
- Returns:
Python object representing literal value of the expression.
- rex_call_operands() list[Expr] ¶
Return the operands of the expression based on it’s variant type.
Row expressions, Rex(s), operate on the concept of operands. Different variants of Expressions, Expr(s), store those operands in different datastructures. This function examines the Expr variant and returns the operands to the calling logic.
- rex_call_operator() str ¶
Extracts the operator associated with a row expression type call.
- rex_type() datafusion.common.RexType ¶
Return the Rex Type of this expression.
A Rex (Row Expression) specifies a single row of data.That specification could include user defined functions or types. RexType identifies the row as one of the possible valid
RexType
.
- schema_name() str ¶
Returns the name of this expression as it should appear in a schema.
This name will not include any CAST expressions.
- sort(ascending: bool = True, nulls_first: bool = True) SortExpr ¶
Creates a sort
Expr
from an existingExpr
.- Parameters:
ascending – If true, sort in ascending order.
nulls_first – Return null values first.
- to_variant() Any ¶
Convert this expression into a python object if possible.
- types() datafusion.common.DataTypeMap ¶
Return the
DataTypeMap
.- Returns:
DataTypeMap which represents the PythonType, Arrow DataType, and SqlType Enum which this expression represents.
- variant_name() str ¶
Returns the name of the Expr variant.
Ex:
IsNotNull
,Literal
,BinaryExpr
, etc
- window_frame(window_frame: WindowFrame) ExprFuncBuilder ¶
Set the frame fora window function.
This function will create an
ExprFuncBuilder
that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated whenbuild()
is called.
- __radd__¶
- __rand__¶
- __rmod__¶
- __rmul__¶
- __ror__¶
- __rsub__¶
- __rtruediv__¶
- _to_pyarrow_types¶
- expr¶
- class datafusion.expr.SortExpr(expr: Expr, ascending: bool, nulls_first: bool)¶
Used to specify sorting on either a DataFrame or function.
This constructor should not be called by the end user.
- __repr__() str ¶
Generate a string representation of this expression.
- ascending() bool ¶
Return ascending property.
- nulls_first() bool ¶
Return nulls_first property.
- raw_sort¶
- class datafusion.expr.Window(partition_by: list[Expr] | None = None, window_frame: WindowFrame | None = None, order_by: list[SortExpr | Expr] | None = None, null_treatment: datafusion.common.NullTreatment | None = None)¶
Define reusable window parameters.
Construct a window definition.
- Parameters:
partition_by – Partitions for window operation
window_frame – Define the start and end bounds of the window frame
order_by – Set ordering
null_treatment – Indicate how nulls are to be treated
- _null_treatment = None¶
- _order_by = None¶
- _partition_by = None¶
- _window_frame = None¶
- class datafusion.expr.WindowFrame(units: str, start_bound: Any | None, end_bound: Any | None)¶
Defines a window frame for performing window operations.
Construct a window frame using the given parameters.
- Parameters:
units – Should be one of
rows
,range
, orgroups
.start_bound – Sets the preceding bound. Must be >= 0. If none, this will be set to unbounded. If unit type is
groups
, this parameter must be set.end_bound – Sets the following bound. Must be >= 0. If none, this will be set to unbounded. If unit type is
groups
, this parameter must be set.
- get_frame_units() str ¶
Returns the window frame units for the bounds.
- get_lower_bound() WindowFrameBound ¶
Returns starting bound.
- get_upper_bound()¶
Returns end bound.
- window_frame¶
- class datafusion.expr.WindowFrameBound(frame_bound: datafusion._internal.expr.WindowFrameBound)¶
Defines a single window frame bound.
WindowFrame
typically requires a start and end bound.Constructs a window frame bound.
- get_offset() int | None ¶
Returns the offset of the window frame.
- is_current_row() bool ¶
Returns if the frame bound is current row.
- is_following() bool ¶
Returns if the frame bound is following.
- is_preceding() bool ¶
Returns if the frame bound is preceding.
- is_unbounded() bool ¶
Returns if the frame bound is unbounded.
- frame_bound¶
- datafusion.expr.Aggregate¶
- datafusion.expr.AggregateFunction¶
- datafusion.expr.Alias¶
- datafusion.expr.Analyze¶
- datafusion.expr.Between¶
- datafusion.expr.BinaryExpr¶
- datafusion.expr.Case¶
- datafusion.expr.Cast¶
- datafusion.expr.Column¶
- datafusion.expr.CreateMemoryTable¶
- datafusion.expr.CreateView¶
- datafusion.expr.Distinct¶
- datafusion.expr.DropTable¶
- datafusion.expr.EmptyRelation¶
- datafusion.expr.Exists¶
- datafusion.expr.Explain¶
- datafusion.expr.Extension¶
- datafusion.expr.Filter¶
- datafusion.expr.GroupingSet¶
- datafusion.expr.ILike¶
- datafusion.expr.InList¶
- datafusion.expr.InSubquery¶
- datafusion.expr.IsFalse¶
- datafusion.expr.IsNotFalse¶
- datafusion.expr.IsNotNull¶
- datafusion.expr.IsNotTrue¶
- datafusion.expr.IsNotUnknown¶
- datafusion.expr.IsNull¶
- datafusion.expr.IsTrue¶
- datafusion.expr.IsUnknown¶
- datafusion.expr.Join¶
- datafusion.expr.JoinConstraint¶
- datafusion.expr.JoinType¶
- datafusion.expr.Like¶
- datafusion.expr.Limit¶
- datafusion.expr.Literal¶
- datafusion.expr.Negative¶
- datafusion.expr.Not¶
- datafusion.expr.Partitioning¶
- datafusion.expr.Placeholder¶
- datafusion.expr.Projection¶
- datafusion.expr.Repartition¶
- datafusion.expr.ScalarSubquery¶
- datafusion.expr.ScalarVariable¶
- datafusion.expr.SimilarTo¶
- datafusion.expr.Sort¶
- datafusion.expr.Subquery¶
- datafusion.expr.SubqueryAlias¶
- datafusion.expr.TableScan¶
- datafusion.expr.TryCast¶
- datafusion.expr.Union¶
- datafusion.expr.Unnest¶
- datafusion.expr.UnnestExpr¶
- datafusion.expr.WindowExpr¶