datafusion.expr

This module supports expressions, one of the core concepts in DataFusion.

See Expressions in the online documentation for more details.

Attributes

Aggregate

AggregateFunction

Alias

Analyze

Between

BinaryExpr

Case

Cast

Column

CopyTo

CreateCatalog

CreateCatalogSchema

CreateExternalTable

CreateFunction

CreateFunctionBody

CreateIndex

CreateMemoryTable

CreateView

Deallocate

DescribeTable

Distinct

DmlStatement

DropCatalogSchema

DropFunction

DropTable

DropView

EmptyRelation

Execute

Exists

Explain

Extension

FileType

Filter

GroupingSet

ILike

InList

InSubquery

IsFalse

IsNotFalse

IsNotNull

IsNotTrue

IsNotUnknown

IsNull

IsTrue

IsUnknown

Join

JoinConstraint

JoinType

Like

Limit

Literal

Negative

Not

OperateFunctionArg

Partitioning

Placeholder

Prepare

Projection

RecursiveQuery

Repartition

ScalarSubquery

ScalarVariable

SetVariable

SimilarTo

Sort

Subquery

SubqueryAlias

TableScan

TransactionAccessMode

TransactionConclusion

TransactionEnd

TransactionIsolationLevel

TransactionStart

TryCast

Union

Unnest

UnnestExpr

Values

WindowExpr

Classes

CaseBuilder

Builder class for constructing case statements.

Expr

Expression object.

SortExpr

Used to specify sorting on either a DataFrame or function.

Window

Define reusable window parameters.

WindowFrame

Defines a window frame for performing window operations.

WindowFrameBound

Defines a single window frame bound.

Module Contents

class datafusion.expr.CaseBuilder(case_builder: datafusion._internal.expr.CaseBuilder)

Builder class for constructing case statements.

An example usage would be as follows:

import datafusion.functions as f
from datafusion import lit, col
df.select(
    f.case(col("column_a")
    .when(lit(1), lit("One"))
    .when(lit(2), lit("Two"))
    .otherwise(lit("Unknown"))
)

Constructs a case builder.

This is not typically called by the end user directly. See datafusion.functions.case() instead.

end() Expr

Finish building a case statement.

Any non-matching cases will end in a null value.

otherwise(else_expr: Expr) Expr

Set a default value for the case statement.

when(when_expr: Expr, then_expr: Expr) CaseBuilder

Add a case to match against.

case_builder
class datafusion.expr.Expr(expr: datafusion._internal.expr.RawExpr)

Expression object.

Expressions are one of the core concepts in DataFusion. See Expressions in the online documentation for more information.

This constructor should not be called by the end user.

__add__(rhs: Any) Expr

Addition operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__and__(rhs: Expr) Expr

Logical AND.

__eq__(rhs: object) Expr

Equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__ge__(rhs: Any) Expr

Greater than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__getitem__(key: str | int) Expr

Retrieve sub-object.

If key is a string, returns the subfield of the struct. If key is an integer, retrieves the element in the array. Note that the element index begins at 0, unlike array_element which begins at 1.

__gt__(rhs: Any) Expr

Greater than.

Accepts either an expression or any valid PyArrow scalar literal value.

__invert__() Expr

Binary not (~).

__le__(rhs: Any) Expr

Less than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__lt__(rhs: Any) Expr

Less than.

Accepts either an expression or any valid PyArrow scalar literal value.

__mod__(rhs: Any) Expr

Modulo operator (%).

Accepts either an expression or any valid PyArrow scalar literal value.

__mul__(rhs: Any) Expr

Multiplication operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__ne__(rhs: object) Expr

Not equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__or__(rhs: Expr) Expr

Logical OR.

__repr__() str

Generate a string representation of this expression.

__richcmp__(other: Expr, op: int) Expr

Comparison operator.

__sub__(rhs: Any) Expr

Subtraction operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__truediv__(rhs: Any) Expr

Division operator.

Accepts either an expression or any valid PyArrow scalar literal value.

abs() Expr

Return the absolute value of a given number.

Returns:

Expr

A new expression representing the absolute value of the input expression.

acos() Expr

Returns the arc cosine or inverse cosine of a number.

Returns:

Expr

A new expression representing the arc cosine of the input expression.

acosh() Expr

Returns inverse hyperbolic cosine.

alias(name: str, metadata: dict[str, str] | None = None) Expr

Assign a name to the expression.

Parameters:
  • name – The name to assign to the expression.

  • metadata – Optional metadata to attach to the expression.

Returns:

A new expression with the assigned name.

array_dims() Expr

Returns an array of the array’s dimensions.

array_distinct() Expr

Returns distinct values from the array after removing duplicates.

array_empty() Expr

Returns a boolean indicating whether the array is empty.

array_length() Expr

Returns the length of the array.

array_ndims() Expr

Returns the number of dimensions of the array.

array_pop_back() Expr

Returns the array without the last element.

array_pop_front() Expr

Returns the array without the first element.

arrow_typeof() Expr

Returns the Arrow type of the expression.

ascii() Expr

Returns the numeric code of the first character of the argument.

asin() Expr

Returns the arc sine or inverse sine of a number.

asinh() Expr

Returns inverse hyperbolic sine.

atan() Expr

Returns inverse tangent of a number.

atanh() Expr

Returns inverse hyperbolic tangent.

between(low: Any, high: Any, negated: bool = False) Expr

Returns True if this expression is between a given range.

Parameters:
  • low – lower bound of the range (inclusive).

  • high – higher bound of the range (inclusive).

  • negated – negates whether the expression is between a given range

bit_length() Expr

Returns the number of bits in the string argument.

btrim() Expr

Removes all characters, spaces by default, from both sides of a string.

canonical_name() str

Returns a complete string representation of this expression.

cardinality() Expr

Returns the total number of elements in the array.

cast(to: pyarrow.DataType[Any] | type[float | int | str | bool]) Expr

Cast to a new data type.

cbrt() Expr

Returns the cube root of a number.

ceil() Expr

Returns the nearest integer greater than or equal to argument.

char_length() Expr

The number of characters in the string.

character_length() Expr

Returns the number of characters in the argument.

chr() Expr

Converts the Unicode code point to a UTF8 character.

static column(value: str) Expr

Creates a new expression representing a column.

column_name(plan: datafusion.plan.LogicalPlan) str

Compute the output column name based on the provided logical plan.

cos() Expr

Returns the cosine of the argument.

cosh() Expr

Returns the hyperbolic cosine of the argument.

cot() Expr

Returns the cotangent of the argument.

degrees() Expr

Converts the argument from radians to degrees.

display_name() str

Returns the name of this expression as it should appear in a schema.

This name will not include any CAST expressions.

distinct() ExprFuncBuilder

Only evaluate distinct values for an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

empty() Expr

This is an alias for array_empty().

exp() Expr

Returns the exponential of the argument.

factorial() Expr

Returns the factorial of the argument.

fill_nan(value: Any | Expr | None = None) Expr

Fill NaN values with a provided value.

fill_null(value: Any | Expr | None = None) Expr

Fill NULL values with a provided value.

filter(filter: Expr) ExprFuncBuilder

Filter an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

flatten() Expr

Flattens an array of arrays into a single array.

floor() Expr

Returns the nearest integer less than or equal to the argument.

from_unixtime() Expr

Converts an integer to RFC3339 timestamp format string.

initcap() Expr

Set the initial letter of each word to capital.

Converts the first letter of each word in string to uppercase and the remaining characters to lowercase.

is_not_null() Expr

Returns True if this expression is not null.

is_null() Expr

Returns True if this expression is null.

isnan() Expr

Returns true if a given number is +NaN or -NaN otherwise returns false.

iszero() Expr

Returns true if a given number is +0.0 or -0.0 otherwise returns false.

length() Expr

The number of characters in the string.

list_dims() Expr

Returns an array of the array’s dimensions.

This is an alias for array_dims().

list_distinct() Expr

Returns distinct values from the array after removing duplicates.

This is an alias for array_distinct().

list_length() Expr

Returns the length of the array.

This is an alias for array_length().

list_ndims() Expr

Returns the number of dimensions of the array.

This is an alias for array_ndims().

static literal(value: Any) Expr

Creates a new expression representing a scalar value.

value must be a valid PyArrow scalar value or easily castable to one.

ln() Expr

Returns the natural logarithm (base e) of the argument.

log10() Expr

Base 10 logarithm of the argument.

log2() Expr

Base 2 logarithm of the argument.

lower() Expr

Converts a string to lowercase.

ltrim() Expr

Removes all characters, spaces by default, from the beginning of a string.

md5() Expr

Computes an MD5 128-bit checksum for a string expression.

null_treatment(null_treatment: datafusion.common.NullTreatment) ExprFuncBuilder

Set the treatment for null values for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

octet_length() Expr

Returns the number of bytes of a string.

order_by(*exprs: Expr | SortExpr) ExprFuncBuilder

Set the ordering for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

over(window: Window) Expr

Turn an aggregate function into a window function.

This function turns any aggregate function into a window function. With the exception of partition_by, how each of the parameters is used is determined by the underlying aggregate function.

Parameters:

window – Window definition

partition_by(*partition_by: Expr) ExprFuncBuilder

Set the partitioning for a window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

python_value() Any

Extracts the Expr value into a PyObject.

This is only valid for literal expressions.

Returns:

Python object representing literal value of the expression.

radians() Expr

Converts the argument from degrees to radians.

reverse() Expr

Reverse the string argument.

rex_call_operands() list[Expr]

Return the operands of the expression based on it’s variant type.

Row expressions, Rex(s), operate on the concept of operands. Different variants of Expressions, Expr(s), store those operands in different datastructures. This function examines the Expr variant and returns the operands to the calling logic.

rex_call_operator() str

Extracts the operator associated with a row expression type call.

rex_type() datafusion.common.RexType

Return the Rex Type of this expression.

A Rex (Row Expression) specifies a single row of data.That specification could include user defined functions or types. RexType identifies the row as one of the possible valid RexType.

rtrim() Expr

Removes all characters, spaces by default, from the end of a string.

schema_name() str

Returns the name of this expression as it should appear in a schema.

This name will not include any CAST expressions.

sha224() Expr

Computes the SHA-224 hash of a binary string.

sha256() Expr

Computes the SHA-256 hash of a binary string.

sha384() Expr

Computes the SHA-384 hash of a binary string.

sha512() Expr

Computes the SHA-512 hash of a binary string.

signum() Expr

Returns the sign of the argument (-1, 0, +1).

sin() Expr

Returns the sine of the argument.

sinh() Expr

Returns the hyperbolic sine of the argument.

sort(ascending: bool = True, nulls_first: bool = True) SortExpr

Creates a sort Expr from an existing Expr.

Parameters:
  • ascending – If true, sort in ascending order.

  • nulls_first – Return null values first.

sqrt() Expr

Returns the square root of the argument.

static string_literal(value: str) Expr

Creates a new expression representing a UTF8 literal value.

It is different from literal because it is pa.string() instead of pa.string_view()

This is needed for cases where DataFusion is expecting a UTF8 instead of UTF8View literal, like in: https://github.com/apache/datafusion/blob/86740bfd3d9831d6b7c1d0e1bf4a21d91598a0ac/datafusion/functions/src/core/arrow_cast.rs#L179

tan() Expr

Returns the tangent of the argument.

tanh() Expr

Returns the hyperbolic tangent of the argument.

to_hex() Expr

Converts an integer to a hexadecimal string.

to_variant() Any

Convert this expression into a python object if possible.

trim() Expr

Removes all characters, spaces by default, from both sides of a string.

types() datafusion.common.DataTypeMap

Return the DataTypeMap.

Returns:

DataTypeMap which represents the PythonType, Arrow DataType, and SqlType Enum which this expression represents.

upper() Expr

Converts a string to uppercase.

variant_name() str

Returns the name of the Expr variant.

Ex: IsNotNull, Literal, BinaryExpr, etc

window_frame(window_frame: WindowFrame) ExprFuncBuilder

Set the frame fora window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

__radd__
__rand__
__rmod__
__rmul__
__ror__
__rsub__
__rtruediv__
_to_pyarrow_types: ClassVar[dict[type, pyarrow.DataType]]
expr
class datafusion.expr.SortExpr(expr: Expr, ascending: bool, nulls_first: bool)

Used to specify sorting on either a DataFrame or function.

This constructor should not be called by the end user.

__repr__() str

Generate a string representation of this expression.

ascending() bool

Return ascending property.

expr() Expr

Return the raw expr backing the SortExpr.

nulls_first() bool

Return nulls_first property.

raw_sort
class datafusion.expr.Window(partition_by: list[Expr] | None = None, window_frame: WindowFrame | None = None, order_by: list[SortExpr | Expr] | None = None, null_treatment: datafusion.common.NullTreatment | None = None)

Define reusable window parameters.

Construct a window definition.

Parameters:
  • partition_by – Partitions for window operation

  • window_frame – Define the start and end bounds of the window frame

  • order_by – Set ordering

  • null_treatment – Indicate how nulls are to be treated

_null_treatment = None
_order_by = None
_partition_by = None
_window_frame = None
class datafusion.expr.WindowFrame(units: str, start_bound: Any | None, end_bound: Any | None)

Defines a window frame for performing window operations.

Construct a window frame using the given parameters.

Parameters:
  • units – Should be one of rows, range, or groups.

  • start_bound – Sets the preceding bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.

  • end_bound – Sets the following bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.

get_frame_units() str

Returns the window frame units for the bounds.

get_lower_bound() WindowFrameBound

Returns starting bound.

get_upper_bound() WindowFrameBound

Returns end bound.

window_frame
class datafusion.expr.WindowFrameBound(frame_bound: datafusion._internal.expr.WindowFrameBound)

Defines a single window frame bound.

WindowFrame typically requires a start and end bound.

Constructs a window frame bound.

get_offset() int | None

Returns the offset of the window frame.

is_current_row() bool

Returns if the frame bound is current row.

is_following() bool

Returns if the frame bound is following.

is_preceding() bool

Returns if the frame bound is preceding.

is_unbounded() bool

Returns if the frame bound is unbounded.

frame_bound
datafusion.expr.Aggregate
datafusion.expr.AggregateFunction
datafusion.expr.Alias
datafusion.expr.Analyze
datafusion.expr.Between
datafusion.expr.BinaryExpr
datafusion.expr.Case
datafusion.expr.Cast
datafusion.expr.Column
datafusion.expr.CopyTo
datafusion.expr.CreateCatalog
datafusion.expr.CreateCatalogSchema
datafusion.expr.CreateExternalTable
datafusion.expr.CreateFunction
datafusion.expr.CreateFunctionBody
datafusion.expr.CreateIndex
datafusion.expr.CreateMemoryTable
datafusion.expr.CreateView
datafusion.expr.Deallocate
datafusion.expr.DescribeTable
datafusion.expr.Distinct
datafusion.expr.DmlStatement
datafusion.expr.DropCatalogSchema
datafusion.expr.DropFunction
datafusion.expr.DropTable
datafusion.expr.DropView
datafusion.expr.EmptyRelation
datafusion.expr.Execute
datafusion.expr.Exists
datafusion.expr.Explain
datafusion.expr.Extension
datafusion.expr.FileType
datafusion.expr.Filter
datafusion.expr.GroupingSet
datafusion.expr.ILike
datafusion.expr.InList
datafusion.expr.InSubquery
datafusion.expr.IsFalse
datafusion.expr.IsNotFalse
datafusion.expr.IsNotNull
datafusion.expr.IsNotTrue
datafusion.expr.IsNotUnknown
datafusion.expr.IsNull
datafusion.expr.IsTrue
datafusion.expr.IsUnknown
datafusion.expr.Join
datafusion.expr.JoinConstraint
datafusion.expr.JoinType
datafusion.expr.Like
datafusion.expr.Limit
datafusion.expr.Literal
datafusion.expr.Negative
datafusion.expr.Not
datafusion.expr.OperateFunctionArg
datafusion.expr.Partitioning
datafusion.expr.Placeholder
datafusion.expr.Prepare
datafusion.expr.Projection
datafusion.expr.RecursiveQuery
datafusion.expr.Repartition
datafusion.expr.ScalarSubquery
datafusion.expr.ScalarVariable
datafusion.expr.SetVariable
datafusion.expr.SimilarTo
datafusion.expr.Sort
datafusion.expr.Subquery
datafusion.expr.SubqueryAlias
datafusion.expr.TableScan
datafusion.expr.TransactionAccessMode
datafusion.expr.TransactionConclusion
datafusion.expr.TransactionEnd
datafusion.expr.TransactionIsolationLevel
datafusion.expr.TransactionStart
datafusion.expr.TryCast
datafusion.expr.Union
datafusion.expr.Unnest
datafusion.expr.UnnestExpr
datafusion.expr.Values
datafusion.expr.WindowExpr