datafusion.expr

This module supports expressions, one of the core concepts in DataFusion.

See Expressions in the online documentation for more details.

Attributes

Aggregate

AggregateFunction

Alias

Analyze

Between

BinaryExpr

Case

Cast

Column

CreateMemoryTable

CreateView

CrossJoin

Distinct

DropTable

EmptyRelation

Exists

Explain

Extension

Filter

GroupingSet

ILike

InList

InSubquery

IsFalse

IsNotFalse

IsNotNull

IsNotTrue

IsNotUnknown

IsNull

IsTrue

IsUnknown

Join

JoinConstraint

JoinType

Like

Limit

Literal

Negative

Not

Partitioning

Placeholder

Projection

Repartition

ScalarSubquery

ScalarVariable

SimilarTo

Sort

SortExpr

Subquery

SubqueryAlias

TableScan

TryCast

Union

Unnest

UnnestExpr

Window

Classes

CaseBuilder

Builder class for constructing case statements.

Expr

Expression object.

WindowFrame

Defines a window frame for performing window operations.

WindowFrameBound

Defines a single window frame bound.

Module Contents

class datafusion.expr.CaseBuilder(case_builder: datafusion._internal.expr.CaseBuilder)

Builder class for constructing case statements.

An example usage would be as follows:

import datafusion.functions as f
from datafusion import lit, col
df.select(
    f.case(col("column_a")
    .when(lit(1), lit("One"))
    .when(lit(2), lit("Two"))
    .otherwise(lit("Unknown"))
)

Constructs a case builder.

This is not typically called by the end user directly. See datafusion.functions.case() instead.

end() Expr

Finish building a case statement.

Any non-matching cases will end in a null value.

otherwise(else_expr: Expr) Expr

Set a default value for the case statement.

when(when_expr: Expr, then_expr: Expr) CaseBuilder

Add a case to match against.

case_builder
class datafusion.expr.Expr(expr: datafusion._internal.expr.Expr)

Expression object.

Expressions are one of the core concepts in DataFusion. See Expressions in the online documentation for more information.

This constructor should not be called by the end user.

__add__(rhs: Any) Expr

Addition operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__and__(rhs: Expr) Expr

Logical AND.

__eq__(rhs: Any) Expr

Equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__ge__(rhs: Any) Expr

Greater than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__getitem__(key: str | int) Expr

Retrieve sub-object.

If key is a string, returns the subfield of the struct. If key is an integer, retrieves the element in the array. Note that the element index begins at 0, unlike array_element which begins at 1.

__gt__(rhs: Any) Expr

Greater than.

Accepts either an expression or any valid PyArrow scalar literal value.

__invert__() Expr

Binary not (~).

__le__(rhs: Any) Expr

Less than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__lt__(rhs: Any) Expr

Less than.

Accepts either an expression or any valid PyArrow scalar literal value.

__mod__(rhs: Any) Expr

Modulo operator (%).

Accepts either an expression or any valid PyArrow scalar literal value.

__mul__(rhs: Any) Expr

Multiplication operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__ne__(rhs: Any) Expr

Not equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__or__(rhs: Expr) Expr

Logical OR.

__repr__() str

Generate a string representation of this expression.

__richcmp__(other: Expr, op: int) Expr

Comparison operator.

__sub__(rhs: Any) Expr

Subtraction operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__truediv__(rhs: Any) Expr

Division operator.

Accepts either an expression or any valid PyArrow scalar literal value.

alias(name: str) Expr

Assign a name to the expression.

canonical_name() str

Returns a complete string representation of this expression.

cast(to: pyarrow.DataType[Any] | Type[float] | Type[int] | Type[str] | Type[bool]) Expr

Cast to a new data type.

static column(value: str) Expr

Creates a new expression representing a column.

column_name(plan: datafusion._internal.LogicalPlan) str

Compute the output column name based on the provided logical plan.

display_name() str

Returns the name of this expression as it should appear in a schema.

This name will not include any CAST expressions.

distinct() ExprFuncBuilder

Only evaluate distinct values for an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

filter(filter: Expr) ExprFuncBuilder

Filter an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

is_not_null() Expr

Returns True if this expression is not null.

is_null() Expr

Returns True if this expression is null.

static literal(value: Any) Expr

Creates a new expression representing a scalar value.

value must be a valid PyArrow scalar value or easily castable to one.

null_treatment(null_treatment: datafusion.common.NullTreatment) ExprFuncBuilder

Set the treatment for null values for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

order_by(*exprs: Expr) ExprFuncBuilder

Set the ordering for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

partition_by(*partition_by: Expr) ExprFuncBuilder

Set the partitioning for a window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

python_value() Any

Extracts the Expr value into a PyObject.

This is only valid for literal expressions.

Returns:

Python object representing literal value of the expression.

rex_call_operands() list[Expr]

Return the operands of the expression based on it’s variant type.

Row expressions, Rex(s), operate on the concept of operands. Different variants of Expressions, Expr(s), store those operands in different datastructures. This function examines the Expr variant and returns the operands to the calling logic.

rex_call_operator() str

Extracts the operator associated with a row expression type call.

rex_type() datafusion.common.RexType

Return the Rex Type of this expression.

A Rex (Row Expression) specifies a single row of data.That specification could include user defined functions or types. RexType identifies the row as one of the possible valid RexType.

sort(ascending: bool = True, nulls_first: bool = True) Expr

Creates a sort Expr from an existing Expr.

Parameters:
  • ascending – If true, sort in ascending order.

  • nulls_first – Return null values first.

to_variant() Any

Convert this expression into a python object if possible.

types() datafusion.common.DataTypeMap

Return the DataTypeMap.

Returns:

DataTypeMap which represents the PythonType, Arrow DataType, and SqlType Enum which this expression represents.

variant_name() str

Returns the name of the Expr variant.

Ex: IsNotNull, Literal, BinaryExpr, etc

window_frame(window_frame: WindowFrame) ExprFuncBuilder

Set the frame fora window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

__radd__
__rand__
__rmod__
__rmul__
__ror__
__rsub__
__rtruediv__
_to_pyarrow_types
expr
class datafusion.expr.WindowFrame(units: str, start_bound: Any | None, end_bound: Any | None)

Defines a window frame for performing window operations.

Construct a window frame using the given parameters.

Parameters:
  • units – Should be one of rows, range, or groups.

  • start_bound – Sets the preceding bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.

  • end_bound – Sets the following bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.

get_frame_units() str

Returns the window frame units for the bounds.

get_lower_bound() WindowFrameBound

Returns starting bound.

get_upper_bound()

Returns end bound.

window_frame
class datafusion.expr.WindowFrameBound(frame_bound: datafusion._internal.expr.WindowFrameBound)

Defines a single window frame bound.

WindowFrame typically requires a start and end bound.

Constructs a window frame bound.

get_offset() int | None

Returns the offset of the window frame.

is_current_row() bool

Returns if the frame bound is current row.

is_following() bool

Returns if the frame bound is following.

is_preceding() bool

Returns if the frame bound is preceding.

is_unbounded() bool

Returns if the frame bound is unbounded.

frame_bound
datafusion.expr.Aggregate
datafusion.expr.AggregateFunction
datafusion.expr.Alias
datafusion.expr.Analyze
datafusion.expr.Between
datafusion.expr.BinaryExpr
datafusion.expr.Case
datafusion.expr.Cast
datafusion.expr.Column
datafusion.expr.CreateMemoryTable
datafusion.expr.CreateView
datafusion.expr.CrossJoin
datafusion.expr.Distinct
datafusion.expr.DropTable
datafusion.expr.EmptyRelation
datafusion.expr.Exists
datafusion.expr.Explain
datafusion.expr.Extension
datafusion.expr.Filter
datafusion.expr.GroupingSet
datafusion.expr.ILike
datafusion.expr.InList
datafusion.expr.InSubquery
datafusion.expr.IsFalse
datafusion.expr.IsNotFalse
datafusion.expr.IsNotNull
datafusion.expr.IsNotTrue
datafusion.expr.IsNotUnknown
datafusion.expr.IsNull
datafusion.expr.IsTrue
datafusion.expr.IsUnknown
datafusion.expr.Join
datafusion.expr.JoinConstraint
datafusion.expr.JoinType
datafusion.expr.Like
datafusion.expr.Limit
datafusion.expr.Literal
datafusion.expr.Negative
datafusion.expr.Not
datafusion.expr.Partitioning
datafusion.expr.Placeholder
datafusion.expr.Projection
datafusion.expr.Repartition
datafusion.expr.ScalarSubquery
datafusion.expr.ScalarVariable
datafusion.expr.SimilarTo
datafusion.expr.Sort
datafusion.expr.SortExpr
datafusion.expr.Subquery
datafusion.expr.SubqueryAlias
datafusion.expr.TableScan
datafusion.expr.TryCast
datafusion.expr.Union
datafusion.expr.Unnest
datafusion.expr.UnnestExpr
datafusion.expr.Window