datafusion.expr#

Expr — the logical expression type used to build DataFusion queries.

An Expr represents a computation over columns or literals: a column reference (col("a")), a literal (lit(5)), an operator combination (col("a") + lit(1)), or the output of a function from datafusion.functions. Expressions are passed to DataFrame methods such as select(), filter(), aggregate(), and sort().

Convenience constructors are re-exported at the package level: datafusion.col() / datafusion.column() for column references and datafusion.lit() / datafusion.literal() for scalar literals.

Examples

>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": [1, 2, 3]})
>>> df.select((col("a") * lit(10)).alias("ten_a")).to_pydict()
{'ten_a': [10, 20, 30]}

See Expressions in the online documentation for details on available operators and helpers.

Attributes#

Classes#

CaseBuilder

Builder class for constructing case statements.

Expr

Expression object.

GroupingSet

Factory for creating grouping set expressions.

SortExpr

Used to specify sorting on either a DataFrame or function.

Window

Define reusable window parameters.

WindowFrame

Defines a window frame for performing window operations.

WindowFrameBound

Defines a single window frame bound.

Functions#

coerce_to_expr(→ Expr)

Coerce a native Python value to an Expr literal, passing Expr through.

coerce_to_expr_list(→ list[Expr])

Coerce each item in an iterable to Expr via coerce_to_expr().

coerce_to_expr_or_none(→ Expr | None)

Coerce a value to Expr or pass None through unchanged.

ensure_expr(→ datafusion._internal.expr.Expr)

Return the internal expression from Expr or raise TypeError.

ensure_expr_list(→ list[datafusion._internal.expr.Expr])

Flatten an iterable of expressions, validating each via ensure_expr.

Module Contents#

class datafusion.expr.CaseBuilder(case_builder: datafusion._internal.expr.CaseBuilder)#

Builder class for constructing case statements.

Examples

>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": [1, 2, 3]})
>>> result = df.select(
...     dfn.functions.case(dfn.col("a"))
...     .when(dfn.lit(1), dfn.lit("One"))
...     .when(dfn.lit(2), dfn.lit("Two"))
...     .otherwise(dfn.lit("Other"))
...     .alias("label")
... )
>>> result.to_pydict()
{'label': ['One', 'Two', 'Other']}

Constructs a case builder.

This is not typically called by the end user directly. See datafusion.functions.case() instead.

end() Expr#

Finish building a case statement.

Any non-matching cases will end in a null value.

otherwise(else_expr: Expr) Expr#

Set a default value for the case statement.

when(when_expr: Expr, then_expr: Expr) CaseBuilder#

Add a case to match against.

case_builder#
class datafusion.expr.Expr(expr: datafusion._internal.expr.RawExpr)#

Expression object.

Expressions are one of the core concepts in DataFusion. See Expressions in the online documentation for more information.

This constructor should not be called by the end user.

__add__(rhs: Any) Expr#

Addition operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__and__(rhs: Expr) Expr#

Logical AND.

__eq__(rhs: object) Expr#

Equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__ge__(rhs: Any) Expr#

Greater than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__getitem__(key: str | int) Expr#

Retrieve sub-object.

If key is a string, returns the subfield of the struct. If key is an integer, retrieves the element in the array. Note that the element index begins at 0, unlike array_element() which begins at 1. If key is a slice, returns an array that contains a slice of the original array. Similar to integer indexing, this follows Python convention where the index begins at 0 unlike array_slice() which begins at 1.

__gt__(rhs: Any) Expr#

Greater than.

Accepts either an expression or any valid PyArrow scalar literal value.

__invert__() Expr#

Binary not (~).

__le__(rhs: Any) Expr#

Less than or equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__lt__(rhs: Any) Expr#

Less than.

Accepts either an expression or any valid PyArrow scalar literal value.

__mod__(rhs: Any) Expr#

Modulo operator (%).

Accepts either an expression or any valid PyArrow scalar literal value.

__mul__(rhs: Any) Expr#

Multiplication operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__ne__(rhs: object) Expr#

Not equal to.

Accepts either an expression or any valid PyArrow scalar literal value.

__or__(rhs: Expr) Expr#

Logical OR.

__reduce__() tuple[collections.abc.Callable[[bytes], Expr], tuple[bytes]]#

Pickle protocol hook.

Lets expressions be shipped to worker processes via pickle.dumps() / pickle.loads(). Built-in functions and Python UDFs (scalar, aggregate, window) travel inside the pickle bytes; only FFI-capsule UDFs require pre-registration on the worker. The worker’s SessionContext for resolving those references is looked up via datafusion.ipc.set_worker_ctx(), falling back to the global SessionContext if none has been installed on the worker.

Warning

Security pickle.loads() on the returned tuple executes arbitrary Python on the receiver, including any cloudpickled UDF callable embedded in the payload. Only unpickle expressions from trusted sources.

Warning

Portability Sender and receiver must run the same Python (major, minor) version; cloudpickle bytecode is not portable across minor versions. See to_bytes() for details on what travels by value vs. by reference.

Examples

>>> import pickle
>>> from datafusion import col, lit
>>> e = col("a") * lit(2)
>>> pickle.loads(pickle.dumps(e)).canonical_name()
'a * Int64(2)'

The encoding side honors a driver-side sender context installed via datafusion.ipc.set_sender_ctx() — that is how SessionContext.with_python_udf_inlining() propagates through pickle.dumps. The sender context is read by __reduce__, so copy.copy() and copy.deepcopy() — which also go through __reduce__ — pick it up too.

__repr__() str#

Generate a string representation of this expression.

__richcmp__(other: Expr, op: int) Expr#

Comparison operator.

__sub__(rhs: Any) Expr#

Subtraction operator.

Accepts either an expression or any valid PyArrow scalar literal value.

__truediv__(rhs: Any) Expr#

Division operator.

Accepts either an expression or any valid PyArrow scalar literal value.

classmethod _reconstruct(proto_bytes: bytes) Expr#

Internal entry point used by __reduce__() on unpickle.

Examples

>>> from datafusion import Expr, col, lit
>>> blob = (col("a") + lit(1)).to_bytes()
>>> Expr._reconstruct(blob).canonical_name()
'a + Int64(1)'
abs() Expr#

Return the absolute value of a given number.

Returns:#

Expr

A new expression representing the absolute value of the input expression.

acos() Expr#

Returns the arc cosine or inverse cosine of a number.

Returns:#

Expr

A new expression representing the arc cosine of the input expression.

acosh() Expr#

Returns inverse hyperbolic cosine.

alias(name: str, metadata: dict[str, str] | None = None) Expr#

Assign a name to the expression.

Parameters:
  • name – The name to assign to the expression.

  • metadata – Optional metadata to attach to the expression.

Returns:

A new expression with the assigned name.

array_dims() Expr#

Returns an array of the array’s dimensions.

array_distinct() Expr#

Returns distinct values from the array after removing duplicates.

array_empty() Expr#

Returns a boolean indicating whether the array is empty.

array_length() Expr#

Returns the length of the array.

array_ndims() Expr#

Returns the number of dimensions of the array.

array_pop_back() Expr#

Returns the array without the last element.

array_pop_front() Expr#

Returns the array without the first element.

arrow_typeof() Expr#

Returns the Arrow type of the expression.

ascii() Expr#

Returns the numeric code of the first character of the argument.

asin() Expr#

Returns the arc sine or inverse sine of a number.

asinh() Expr#

Returns inverse hyperbolic sine.

atan() Expr#

Returns inverse tangent of a number.

atanh() Expr#

Returns inverse hyperbolic tangent.

between(low: Any, high: Any, negated: bool = False) Expr#

Returns True if this expression is between a given range.

Parameters:
  • low – lower bound of the range (inclusive).

  • high – higher bound of the range (inclusive).

  • negated – negates whether the expression is between a given range

bit_length() Expr#

Returns the number of bits in the string argument.

btrim() Expr#

Removes all characters, spaces by default, from both sides of a string.

canonical_name() str#

Returns a complete string representation of this expression.

cardinality() Expr#

Returns the total number of elements in the array.

cast(to: pyarrow.DataType[Any] | type) Expr#

Cast to a new data type.

cbrt() Expr#

Returns the cube root of a number.

ceil() Expr#

Returns the nearest integer greater than or equal to argument.

char_length() Expr#

The number of characters in the string.

character_length() Expr#

Returns the number of characters in the argument.

chr() Expr#

Converts the Unicode code point to a UTF8 character.

static column(value: str) Expr#

Creates a new expression representing a column.

column_name(plan: datafusion.plan.LogicalPlan) str#

Compute the output column name based on the provided logical plan.

cos() Expr#

Returns the cosine of the argument.

cosh() Expr#

Returns the hyperbolic cosine of the argument.

cot() Expr#

Returns the cotangent of the argument.

degrees() Expr#

Converts the argument from radians to degrees.

distinct() ExprFuncBuilder#

Only evaluate distinct values for an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

empty() Expr#

This is an alias for array_empty().

exp() Expr#

Returns the exponential of the argument.

factorial() Expr#

Returns the factorial of the argument.

fill_nan(value: Any | Expr | None = None) Expr#

Fill NaN values with a provided value.

fill_null(value: Any | Expr | None = None) Expr#

Fill NULL values with a provided value.

filter(filter: Expr) ExprFuncBuilder#

Filter an aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

flatten() Expr#

Flattens an array of arrays into a single array.

floor() Expr#

Returns the nearest integer less than or equal to the argument.

classmethod from_bytes(buf: bytes, ctx: datafusion.context.SessionContext | None = None) Expr#

Reconstruct an expression from serialized bytes.

Accepts output of to_bytes() or pickle.dumps(). ctx is the SessionContext used to resolve any function references that travel by name (e.g. FFI UDFs, or Python UDFs sent with inlining disabled via SessionContext.with_python_udf_inlining()). When ctx is None the worker context installed via datafusion.ipc.set_worker_ctx() is consulted; if no worker context is installed, the global SessionContext is used (sufficient for built-ins and Python UDFs, plus any UDFs registered on the global context).

Warning

Security Decoding may invoke cloudpickle.loads on bytes embedded in the payload, which executes arbitrary Python code. Treat buf as code, not data — only decode bytes you produced yourself or received from a trusted sender.

Warning

Portability cloudpickle payloads are not portable across Python minor versions. The wire format stamps the sender’s (major, minor); if it does not match the current interpreter, this method raises ValueError naming both versions. Modules the UDF imports must also be importable on the receiver — see to_bytes() for by-value vs. by-reference details.

Examples

>>> from datafusion import Expr, col, lit
>>> blob = (col("a") + lit(1)).to_bytes()
>>> Expr.from_bytes(blob).canonical_name()
'a + Int64(1)'
from_unixtime() Expr#

Converts an integer to RFC3339 timestamp format string.

initcap() Expr#

Set the initial letter of each word to capital.

Converts the first letter of each word in string to uppercase and the remaining characters to lowercase.

is_not_null() Expr#

Returns True if this expression is not null.

is_null() Expr#

Returns True if this expression is null.

isnan() Expr#

Returns true if a given number is +NaN or -NaN otherwise returns false.

iszero() Expr#

Returns true if a given number is +0.0 or -0.0 otherwise returns false.

length() Expr#

The number of characters in the string.

list_dims() Expr#

Returns an array of the array’s dimensions.

This is an alias for array_dims().

list_distinct() Expr#

Returns distinct values from the array after removing duplicates.

This is an alias for array_distinct().

list_length() Expr#

Returns the length of the array.

This is an alias for array_length().

list_ndims() Expr#

Returns the number of dimensions of the array.

This is an alias for array_ndims().

static literal(value: Any) Expr#

Creates a new expression representing a scalar value.

value must be a valid PyArrow scalar value or easily castable to one.

static literal_with_metadata(value: Any, metadata: dict[str, str]) Expr#

Creates a new expression representing a scalar value with metadata.

Parameters:
  • value – A valid PyArrow scalar value or easily castable to one.

  • metadata – Metadata to attach to the expression.

ln() Expr#

Returns the natural logarithm (base e) of the argument.

log10() Expr#

Base 10 logarithm of the argument.

log2() Expr#

Base 2 logarithm of the argument.

lower() Expr#

Converts a string to lowercase.

ltrim() Expr#

Removes all characters, spaces by default, from the beginning of a string.

md5() Expr#

Computes an MD5 128-bit checksum for a string expression.

null_treatment(null_treatment: datafusion.common.NullTreatment) ExprFuncBuilder#

Set the treatment for null values for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

octet_length() Expr#

Returns the number of bytes of a string.

order_by(*exprs: Expr | SortExpr) ExprFuncBuilder#

Set the ordering for a window or aggregate function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

over(window: Window) Expr#

Turn an aggregate function into a window function.

This function turns any aggregate function into a window function. With the exception of partition_by, how each of the parameters is used is determined by the underlying aggregate function.

Parameters:

window – Window definition

partition_by(*partition_by: Expr) ExprFuncBuilder#

Set the partitioning for a window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

python_value() Any#

Extracts the Expr value into Any.

This is only valid for literal expressions.

Returns:

Python object representing literal value of the expression.

radians() Expr#

Converts the argument from degrees to radians.

reverse() Expr#

Reverse the string argument.

rex_call_operands() list[Expr]#

Return the operands of the expression based on it’s variant type.

Row expressions, Rex(s), operate on the concept of operands. Different variants of Expressions, Expr(s), store those operands in different datastructures. This function examines the Expr variant and returns the operands to the calling logic.

rex_call_operator() str#

Extracts the operator associated with a row expression type call.

rex_type() datafusion.common.RexType#

Return the Rex Type of this expression.

A Rex (Row Expression) specifies a single row of data.That specification could include user defined functions or types. RexType identifies the row as one of the possible valid RexType.

rtrim() Expr#

Removes all characters, spaces by default, from the end of a string.

schema_name() str#

Returns the name of this expression as it should appear in a schema.

This name will not include any CAST expressions.

sha224() Expr#

Computes the SHA-224 hash of a binary string.

sha256() Expr#

Computes the SHA-256 hash of a binary string.

sha384() Expr#

Computes the SHA-384 hash of a binary string.

sha512() Expr#

Computes the SHA-512 hash of a binary string.

signum() Expr#

Returns the sign of the argument (-1, 0, +1).

sin() Expr#

Returns the sine of the argument.

sinh() Expr#

Returns the hyperbolic sine of the argument.

sort(ascending: bool = True, nulls_first: bool = True) SortExpr#

Creates a sort Expr from an existing Expr.

Parameters:
  • ascending – If true, sort in ascending order.

  • nulls_first – Return null values first.

sqrt() Expr#

Returns the square root of the argument.

static string_literal(value: str) Expr#

Creates a new expression representing a UTF8 literal value.

It is different from literal because it is pa.string() instead of pa.string_view()

This is needed for cases where DataFusion is expecting a UTF8 instead of UTF8View literal, like in: apache/datafusion

tan() Expr#

Returns the tangent of the argument.

tanh() Expr#

Returns the hyperbolic tangent of the argument.

to_bytes(ctx: datafusion.context.SessionContext | None = None) bytes#

Serialize this expression to bytes for shipping to another process.

Use this — or pickle.dumps() — to send an expression to a worker process for distributed evaluation.

When ctx is supplied, encoding routes through that session’s installed LogicalExtensionCodec (so settings like SessionContext.with_python_udf_inlining() take effect). When ctx is None, the default codec is used (Python UDF inlining on, no user-installed extension codec).

Built-in functions travel inside the returned bytes. Python UDFs (scalar, aggregate, window) also inline by default, so the worker does not need to pre-register them; when the encoding session has SessionContext.with_python_udf_inlining() set to False, Python UDFs travel by name only and must be registered on the worker. UDFs imported via the FFI capsule protocol always travel by name only and must be registered on the worker.

Warning

Security Bytes returned here may embed a cloudpickled Python callable (when the expression carries a Python UDF). Reconstructing them via from_bytes() or pickle.loads() executes arbitrary Python on the receiver. Only accept payloads from trusted sources.

Warning

Portability cloudpickle serializes Python bytecode, which is not stable across Python minor versions. A payload produced on Python 3.11 will fail to load on Python 3.12. The wire format stamps the sender’s (major, minor); from_bytes() raises a ValueError naming both versions on mismatch.

cloudpickle captures the UDF callable by value — bytecode and closure cells inlined — but names the callable resolves via import are captured by reference (module path only) and must be importable on the receiver.

Self-contained — works anywhere:

# Lambda: bytecode captured inline
udf(lambda x: x * 2, [pa.int64()], pa.int64(),
    volatility="immutable")

# Locally-defined function: bytecode captured inline
def double(x):
    return x * 2
udf(double, [pa.int64()], pa.int64(), volatility="immutable")

# Closure over a local variable: value captured inline
factor = 3
udf(lambda x: x * factor, [pa.int64()], pa.int64(),
    volatility="immutable")

Requires matching environment on receiver:

# Top-level import: `foo` must be installed on receiver
from foo import double
udf(double, [pa.int64()], pa.int64(), volatility="immutable")

# Bound method of an imported class: same caveat
from mylib import Transformer
t = Transformer()
udf(t.transform, [pa.int64()], pa.int64(),
    volatility="immutable")

Examples

>>> from datafusion import col, lit
>>> blob = (col("a") + lit(1)).to_bytes()
>>> isinstance(blob, bytes)
True
to_hex() Expr#

Converts an integer to a hexadecimal string.

to_variant() Any#

Convert this expression into a python object if possible.

trim() Expr#

Removes all characters, spaces by default, from both sides of a string.

try_cast(to: pyarrow.DataType[Any] | type) Expr#

Cast to a new data type, returning NULL on failure.

Like cast() but produces NULL instead of erroring when the cast cannot be performed for a given row.

Examples

>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": ["oops"]})
>>> result = df.select(col("a").try_cast(pa.float64()).alias("c"))
>>> result.collect_column("c")[0].as_py() is None
True
types() datafusion.common.DataTypeMap#

Return the DataTypeMap.

Returns:

DataTypeMap which represents the PythonType, Arrow DataType, and SqlType Enum which this expression represents.

upper() Expr#

Converts a string to uppercase.

variant_name() str#

Returns the name of the Expr variant.

Ex: IsNotNull, Literal, BinaryExpr, etc

window_frame(window_frame: WindowFrame) ExprFuncBuilder#

Set the frame fora window function.

This function will create an ExprFuncBuilder that can be used to set parameters for either window or aggregate functions. If used on any other type of expression, an error will be generated when build() is called.

__radd__#
__rand__#
__rmod__#
__rmul__#
__ror__#
__rsub__#
__rtruediv__#
_to_pyarrow_types: ClassVar[dict[type, pyarrow.DataType]]#
expr#
class datafusion.expr.GroupingSet#

Factory for creating grouping set expressions.

Grouping sets control how aggregate() groups rows. Instead of a single GROUP BY, they produce multiple grouping levels in one pass — subtotals, cross-tabulations, or arbitrary column subsets.

Use grouping() in the aggregate list to tell which columns are aggregated across in each result row.

static cube(*exprs: Expr | str) Expr#

Create a CUBE grouping set for use with aggregate().

CUBE generates all possible subsets of the given column list as grouping sets. For example, cube(a, b) produces grouping sets (a, b), (a), (b), and () (grand total).

This is equivalent to GROUP BY CUBE(a, b) in SQL.

Parameters:

*exprs – Column expressions or column name strings to include in the cube.

Examples

With a single column, cube behaves identically to rollup():

>>> from datafusion.expr import GroupingSet
>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": [1, 1, 2], "b": [10, 20, 30]})
>>> result = df.aggregate(
...     [GroupingSet.cube(dfn.col("a"))],
...     [dfn.functions.sum(dfn.col("b")).alias("s"),
...      dfn.functions.grouping(dfn.col("a"))],
... ).sort(dfn.col("a").sort(nulls_first=False))
>>> result.collect_column("s").to_pylist()
[30, 30, 60]
static grouping_sets(*expr_lists: list[Expr | str]) Expr#

Create explicit grouping sets for use with aggregate().

Each argument is a list of column expressions or column name strings representing one grouping set. For example, grouping_sets([a], [b]) groups by a alone and by b alone in a single query.

This is equivalent to GROUP BY GROUPING SETS ((a), (b)) in SQL.

Parameters:

*expr_lists – Each positional argument is a list of expressions or column name strings forming one grouping set.

Examples

>>> from datafusion.expr import GroupingSet
>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict(
...     {"a": ["x", "x", "y"], "b": ["m", "n", "m"],
...      "c": [1, 2, 3]})
>>> result = df.aggregate(
...     [GroupingSet.grouping_sets(
...         [dfn.col("a")], [dfn.col("b")])],
...     [dfn.functions.sum(dfn.col("c")).alias("s"),
...      dfn.functions.grouping(dfn.col("a")),
...      dfn.functions.grouping(dfn.col("b"))],
... ).sort(
...     dfn.col("a").sort(nulls_first=False),
...     dfn.col("b").sort(nulls_first=False),
... )
>>> result.collect_column("s").to_pylist()
[3, 3, 4, 2]
static rollup(*exprs: Expr | str) Expr#

Create a ROLLUP grouping set for use with aggregate().

ROLLUP generates all prefixes of the given column list as grouping sets. For example, rollup(a, b) produces grouping sets (a, b), (a), and () (grand total).

This is equivalent to GROUP BY ROLLUP(a, b) in SQL.

Parameters:

*exprs – Column expressions or column name strings to include in the rollup.

Examples

>>> from datafusion.expr import GroupingSet
>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": [1, 1, 2], "b": [10, 20, 30]})
>>> result = df.aggregate(
...     [GroupingSet.rollup(dfn.col("a"))],
...     [dfn.functions.sum(dfn.col("b")).alias("s"),
...      dfn.functions.grouping(dfn.col("a"))],
... ).sort(dfn.col("a").sort(nulls_first=False))
>>> result.collect_column("s").to_pylist()
[30, 30, 60]
class datafusion.expr.SortExpr(expr: Expr, ascending: bool, nulls_first: bool)#

Used to specify sorting on either a DataFrame or function.

This constructor should not be called by the end user.

__repr__() str#

Generate a string representation of this expression.

ascending() bool#

Return ascending property.

expr() Expr#

Return the raw expr backing the SortExpr.

nulls_first() bool#

Return nulls_first property.

raw_sort#
class datafusion.expr.Window(partition_by: list[Expr] | Expr | None = None, window_frame: WindowFrame | None = None, order_by: list[SortExpr | Expr | str] | Expr | SortExpr | str | None = None, null_treatment: datafusion.common.NullTreatment | None = None)#

Define reusable window parameters.

Construct a window definition.

Parameters:
  • partition_by – Partitions for window operation

  • window_frame – Define the start and end bounds of the window frame

  • order_by – Set ordering

  • null_treatment – Indicate how nulls are to be treated

_null_treatment = None#
_order_by = None#
_partition_by = None#
_window_frame = None#
class datafusion.expr.WindowFrame(units: str, start_bound: Any | None, end_bound: Any | None)#

Defines a window frame for performing window operations.

Construct a window frame using the given parameters.

Parameters:
  • units – Should be one of rows, range, or groups.

  • start_bound – Sets the preceding bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.

  • end_bound – Sets the following bound. Must be >= 0. If none, this will be set to unbounded. If unit type is groups, this parameter must be set.

__repr__() str#

Print a string representation of the window frame.

get_frame_units() str#

Returns the window frame units for the bounds.

get_lower_bound() WindowFrameBound#

Returns starting bound.

get_upper_bound() WindowFrameBound#

Returns end bound.

window_frame#
class datafusion.expr.WindowFrameBound(frame_bound: datafusion._internal.expr.WindowFrameBound)#

Defines a single window frame bound.

WindowFrame typically requires a start and end bound.

Constructs a window frame bound.

get_offset() int | None#

Returns the offset of the window frame.

is_current_row() bool#

Returns if the frame bound is current row.

is_following() bool#

Returns if the frame bound is following.

is_preceding() bool#

Returns if the frame bound is preceding.

is_unbounded() bool#

Returns if the frame bound is unbounded.

frame_bound#
datafusion.expr.coerce_to_expr(value: Any) Expr#

Coerce a native Python value to an Expr literal, passing Expr through.

This is the complement of ensure_expr(): where ensure_expr rejects non-Expr values, coerce_to_expr wraps them via Expr.literal() so that functions can accept native Python types (int, float, str, bool, etc.) alongside Expr.

Parameters:

value – An Expr instance (returned as-is) or a Python literal to wrap.

Returns:

An Expr representing the value.

datafusion.expr.coerce_to_expr_list(values: collections.abc.Iterable[Any]) list[Expr]#

Coerce each item in an iterable to Expr via coerce_to_expr().

Parameters:

values – Iterable of Expr instances or Python literals to wrap.

Returns:

A list of Expr instances.

datafusion.expr.coerce_to_expr_or_none(value: Any | None) Expr | None#

Coerce a value to Expr or pass None through unchanged.

Same as coerce_to_expr() but accepts None for optional parameters.

Parameters:

value – An Expr instance, a Python literal to wrap, or None.

Returns:

An Expr representing the value, or None.

datafusion.expr.ensure_expr(value: Expr | Any) datafusion._internal.expr.Expr#

Return the internal expression from Expr or raise TypeError.

This helper rejects plain strings and other non-Expr values so higher level APIs consistently require explicit col() or lit() expressions.

See also

coerce_to_expr() — the opposite behavior: wraps non-Expr values as literals instead of rejecting them.

Parameters:

value – Candidate expression or other object.

Returns:

The internal expression representation.

Raises:

TypeError – If value is not an instance of Expr.

datafusion.expr.ensure_expr_list(exprs: collections.abc.Iterable[Expr | collections.abc.Iterable[Expr]]) list[datafusion._internal.expr.Expr]#

Flatten an iterable of expressions, validating each via ensure_expr.

Parameters:

exprs – Possibly nested iterable containing expressions.

Returns:

A flat list of raw expressions.

Raises:

TypeError – If any item is not an instance of Expr.

datafusion.expr.Aggregate#
datafusion.expr.AggregateFunction#
datafusion.expr.Alias#
datafusion.expr.Analyze#
datafusion.expr.Between#
datafusion.expr.BinaryExpr#
datafusion.expr.Case#
datafusion.expr.Cast#
datafusion.expr.Column#
datafusion.expr.CopyTo#
datafusion.expr.CreateCatalog#
datafusion.expr.CreateCatalogSchema#
datafusion.expr.CreateExternalTable#
datafusion.expr.CreateFunction#
datafusion.expr.CreateFunctionBody#
datafusion.expr.CreateIndex#
datafusion.expr.CreateMemoryTable#
datafusion.expr.CreateView#
datafusion.expr.Deallocate#
datafusion.expr.DescribeTable#
datafusion.expr.Distinct#
datafusion.expr.DmlStatement#
datafusion.expr.DropCatalogSchema#
datafusion.expr.DropFunction#
datafusion.expr.DropTable#
datafusion.expr.DropView#
datafusion.expr.EXPR_TYPE_ERROR = 'Use col()/column() or lit()/literal() to construct expressions'#
datafusion.expr.EmptyRelation#
datafusion.expr.Execute#
datafusion.expr.Exists#
datafusion.expr.Explain#
datafusion.expr.Extension#
datafusion.expr.FileType#
datafusion.expr.Filter#
datafusion.expr.HigherOrderFunction#
datafusion.expr.ILike#
datafusion.expr.InList#
datafusion.expr.InSubquery#
datafusion.expr.IsFalse#
datafusion.expr.IsNotFalse#
datafusion.expr.IsNotNull#
datafusion.expr.IsNotTrue#
datafusion.expr.IsNotUnknown#
datafusion.expr.IsNull#
datafusion.expr.IsTrue#
datafusion.expr.IsUnknown#
datafusion.expr.Join#
datafusion.expr.JoinConstraint#
datafusion.expr.JoinType#
datafusion.expr.Lambda#
datafusion.expr.LambdaVariable#
datafusion.expr.Like#
datafusion.expr.Limit#
datafusion.expr.Literal#
datafusion.expr.Negative#
datafusion.expr.Not#
datafusion.expr.OperateFunctionArg#
datafusion.expr.Partitioning#
datafusion.expr.Placeholder#
datafusion.expr.Prepare#
datafusion.expr.Projection#
datafusion.expr.RecursiveQuery#
datafusion.expr.Repartition#
datafusion.expr.ScalarSubquery#
datafusion.expr.ScalarVariable#
datafusion.expr.SetVariable#
datafusion.expr.SimilarTo#
datafusion.expr.Sort#
datafusion.expr.SortKey#
datafusion.expr.Subquery#
datafusion.expr.SubqueryAlias#
datafusion.expr.TableScan#
datafusion.expr.TransactionAccessMode#
datafusion.expr.TransactionConclusion#
datafusion.expr.TransactionEnd#
datafusion.expr.TransactionIsolationLevel#
datafusion.expr.TransactionStart#
datafusion.expr.TryCast#
datafusion.expr.Union#
datafusion.expr.Unnest#
datafusion.expr.UnnestExpr#
datafusion.expr.Values#
datafusion.expr.WindowExpr#