On this page

datafusion.expr¶

This module supports expressions, one of the core concepts in DataFusion.

See Expressions in the online documentation for more details.

Classes¶

`CaseBuilder`	Builder class for constructing case statements.
`Expr`	Expression object.
`SortExpr`	Used to specify sorting on either a DataFrame or function.
`Window`	Define reusable window parameters.
`WindowFrame`	Defines a window frame for performing window operations.
`WindowFrameBound`	Defines a single window frame bound.

Module Contents¶

class datafusion.expr.CaseBuilder(case_builder: datafusion._internal.expr.CaseBuilder)¶

Builder class for constructing case statements.

An example usage would be as follows:

import datafusion.functions as f
from datafusion import lit, col
df.select(
    f.case(col("column_a")
    .when(lit(1), lit("One"))
    .when(lit(2), lit("Two"))
    .otherwise(lit("Unknown"))
)

Constructs a case builder.

This is not typically called by the end user directly. See datafusion.functions.case() instead.

end() → Expr¶

Finish building a case statement.

Any non-matching cases will end in a null value.

otherwise(else_expr: Expr) → Expr¶: Set a default value for the case statement.

when(when_expr: Expr, then_expr: Expr) → CaseBuilder¶: Add a case to match against.

case_builder¶

class datafusion.expr.Expr(expr: datafusion._internal.expr.Expr)¶

Expression object.

Expressions are one of the core concepts in DataFusion. See Expressions in the online documentation for more information.

This constructor should not be called by the end user.

__add__(rhs: Any) → Expr¶

Addition operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__and__(rhs: Expr) → Expr¶: Logical AND.

__eq__(rhs: Any) → Expr¶

Equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__ge__(rhs: Any) → Expr¶

Greater than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__getitem__(key: str | int) → Expr¶

Retrieve sub-object.

If key is a string, returns the subfield of the struct. If key is an integer, retrieves the element in the array. Note that the element index begins at 0, unlike array_element which begins at 1.

__gt__(rhs: Any) → Expr¶

Greater than.

Accepts either an expression or any valid PyArrow scalar literal value.

__invert__() → Expr¶: Binary not (~).

__le__(rhs: Any) → Expr¶

Less than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__lt__(rhs: Any) → Expr¶

Less than.

Accepts either an expression or any valid PyArrow scalar literal value.

__mod__(rhs: Any) → Expr¶

Modulo operator (%).

Accepts either an expression or any valid PyArrow scalar literal value.

__mul__(rhs: Any) → Expr¶

Multiplication operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__ne__(rhs: Any) → Expr¶

Not equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__or__(rhs: Expr) → Expr¶: Logical OR.

__repr__() → str¶: Generate a string representation of this expression.

__richcmp__(other: Expr, op: int) → Expr¶: Comparison operator.

__sub__(rhs: Any) → Expr¶

Subtraction operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__truediv__(rhs: Any) → Expr¶

Division operator.

Accepts either an expression or any valid PyArrow scalar literal value.

alias(name: str) → Expr¶: Assign a name to the expression.

between(low: Any, high: Any, negated: bool = False) → Expr¶

Returns True if this expression is between a given range.

Parameters:

low – lower bound of the range (inclusive).
high – higher bound of the range (inclusive).
negated – negates whether the expression is between a given range

canonical_name() → str¶: Returns a complete string representation of this expression.

cast(to: pyarrow.DataType[Any] | Type[float] | Type[int] | Type[str] | Type[bool]) → Expr¶: Cast to a new data type.

static column(value: str) → Expr¶: Creates a new expression representing a column.

column_name(plan: datafusion.plan.LogicalPlan) → str¶: Compute the output column name based on the provided logical plan.

display_name() → str¶

Returns the name of this expression as it should appear in a schema.

This name will not include any CAST expressions.

distinct() → ExprFuncBuilder¶

Only evaluate distinct values for an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

fill_nan(value: Any | Expr | None = None) → Expr¶: Fill NaN values with a provided value.

fill_null(value: Any | Expr | None = None) → Expr¶: Fill NULL values with a provided value.

filter(filter: Expr) → ExprFuncBuilder¶

Filter an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

is_not_null() → Expr¶: Returns True if this expression is not null.

is_null() → Expr¶: Returns True if this expression is null.

static literal(value: Any) → Expr¶

Creates a new expression representing a scalar value.

value must be a valid PyArrow scalar value or easily castable to one.

null_treatment(null_treatment: datafusion.common.NullTreatment) → ExprFuncBuilder¶

Set the treatment for null values for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

order_by(*exprs: Expr | SortExpr) → ExprFuncBuilder¶

Set the ordering for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

over(window: Window) → Expr¶

Turn an aggregate function into a window function.

This function turns any aggregate function into a window function. With the exception of partition_by, how each of the parameters is used is determined by the underlying aggregate function.

Parameters:: window – Window definition

partition_by(*partition_by: Expr) → ExprFuncBuilder¶

Set the partitioning for a window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

python_value() → Any¶

Extracts the Expr value into a PyObject.

This is only valid for literal expressions.

Returns:: Python object representing literal value of the expression.

rex_call_operands() → list[Expr]¶

Return the operands of the expression based on it’s variant type.

Row expressions, Rex(s), operate on the concept of operands. Different variants of Expressions, Expr(s), store those operands in different datastructures. This function examines the Expr variant and returns the operands to the calling logic.

rex_call_operator() → str¶: Extracts the operator associated with a row expression type call.

rex_type() → datafusion.common.RexType¶

Return the Rex Type of this expression.

A Rex (Row Expression) specifies a single row of data.That specification could include user defined functions or types. RexType identifies the row as one of the possible valid RexType.

schema_name() → str¶

Returns the name of this expression as it should appear in a schema.

This name will not include any CAST expressions.

sort(ascending: bool = True, nulls_first: bool = True) → SortExpr¶

Creates a sort Expr from an existing Expr.

Parameters:

ascending – If true, sort in ascending order.
nulls_first – Return null values first.

to_variant() → Any¶: Convert this expression into a python object if possible.

types() → datafusion.common.DataTypeMap¶

Return the DataTypeMap.

Returns:: DataTypeMap which represents the PythonType, Arrow DataType, and SqlType Enum which this expression represents.

variant_name() → str¶

Returns the name of the Expr variant.

Ex: IsNotNull, Literal, BinaryExpr, etc

window_frame(window_frame: WindowFrame) → ExprFuncBuilder¶

Set the frame fora window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

__radd__¶

__rand__¶

__rmod__¶

__rmul__¶

__ror__¶

__rsub__¶

__rtruediv__¶

_to_pyarrow_types¶

expr¶

class datafusion.expr.SortExpr(expr: Expr, ascending: bool, nulls_first: bool)¶

Used to specify sorting on either a DataFrame or function.

This constructor should not be called by the end user.

__repr__() → str¶: Generate a string representation of this expression.

ascending() → bool¶: Return ascending property.

expr() → Expr¶: Return the raw expr backing the SortExpr.

nulls_first() → bool¶: Return nulls_first property.

raw_sort¶

class datafusion.expr.Window(partition_by: list[Expr] | None = None, window_frame: WindowFrame | None = None, order_by: list[SortExpr | Expr] | None = None, null_treatment: datafusion.common.NullTreatment | None = None)¶

Define reusable window parameters.

Construct a window definition.

Parameters:

partition_by – Partitions for window operation
window_frame – Define the start and end bounds of the window frame
order_by – Set ordering
null_treatment – Indicate how nulls are to be treated

_null_treatment = None¶

_order_by = None¶

_partition_by = None¶

_window_frame = None¶

class datafusion.expr.WindowFrame(units: str, start_bound: Any | None, end_bound: Any | None)¶

Defines a window frame for performing window operations.

Construct a window frame using the given parameters.

Parameters:

units – Should be one of rows, range, or groups.
start_bound – Sets the preceding bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.
end_bound – Sets the following bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.

get_frame_units() → str¶: Returns the window frame units for the bounds.

get_lower_bound() → WindowFrameBound¶: Returns starting bound.

get_upper_bound()¶: Returns end bound.

window_frame¶

class datafusion.expr.WindowFrameBound(frame_bound: datafusion._internal.expr.WindowFrameBound)¶

Defines a single window frame bound.

WindowFrame typically requires a start and end bound.

Constructs a window frame bound.

get_offset() → int | None¶: Returns the offset of the window frame.

is_current_row() → bool¶: Returns if the frame bound is current row.

is_following() → bool¶: Returns if the frame bound is following.

is_preceding() → bool¶: Returns if the frame bound is preceding.

is_unbounded() → bool¶: Returns if the frame bound is unbounded.

frame_bound¶

datafusion.expr.Aggregate¶

datafusion.expr.AggregateFunction¶

datafusion.expr.Alias¶

datafusion.expr.Analyze¶

datafusion.expr.Between¶

datafusion.expr.BinaryExpr¶

datafusion.expr.Case¶

datafusion.expr.Cast¶

datafusion.expr.Column¶

datafusion.expr.CreateMemoryTable¶

datafusion.expr.CreateView¶

datafusion.expr.Distinct¶

datafusion.expr.DropTable¶

datafusion.expr.EmptyRelation¶

datafusion.expr.Exists¶

datafusion.expr.Explain¶

datafusion.expr.Extension¶

datafusion.expr.Filter¶

datafusion.expr.GroupingSet¶

datafusion.expr.ILike¶

datafusion.expr.InList¶

datafusion.expr.InSubquery¶

datafusion.expr.IsFalse¶

datafusion.expr.IsNotFalse¶

datafusion.expr.IsNotNull¶

datafusion.expr.IsNotTrue¶

datafusion.expr.IsNotUnknown¶

datafusion.expr.IsNull¶

datafusion.expr.IsTrue¶

datafusion.expr.IsUnknown¶

datafusion.expr.Join¶

datafusion.expr.JoinConstraint¶

datafusion.expr.JoinType¶

datafusion.expr.Like¶

datafusion.expr.Limit¶

datafusion.expr.Literal¶

datafusion.expr.Negative¶

datafusion.expr.Not¶

datafusion.expr.Partitioning¶

datafusion.expr.Placeholder¶

datafusion.expr.Projection¶

datafusion.expr.Repartition¶

datafusion.expr.ScalarSubquery¶

datafusion.expr.ScalarVariable¶

datafusion.expr.SimilarTo¶

datafusion.expr.Sort¶

datafusion.expr.Subquery¶

datafusion.expr.SubqueryAlias¶

datafusion.expr.TableScan¶

datafusion.expr.TryCast¶

datafusion.expr.Union¶

datafusion.expr.Unnest¶

datafusion.expr.UnnestExpr¶

datafusion.expr.WindowExpr¶

previous

datafusion.dataframe

next

datafusion.functions