datafusion.functions¶

User functions for operating on Expr.

Functions¶

`abs`(→ datafusion.expr.Expr)	Return the absolute value of a given number.
`acos`(→ datafusion.expr.Expr)	Returns the arc cosine or inverse cosine of a number.
`acosh`(→ datafusion.expr.Expr)	Returns inverse hyperbolic cosine.
`alias`(→ datafusion.expr.Expr)	Creates an alias expression with an optional metadata dictionary.
`approx_distinct`(→ datafusion.expr.Expr)	Returns the approximate number of distinct values.
`approx_median`(→ datafusion.expr.Expr)	Returns the approximate median value.
`approx_percentile_cont`(→ datafusion.expr.Expr)	Returns the value that is approximately at a given percentile of `expr`.
`approx_percentile_cont_with_weight`(→ datafusion.expr.Expr)	Returns the value of the weighted approximate percentile.
`array`(→ datafusion.expr.Expr)	Returns an array using the specified input expressions.
`array_agg`(→ datafusion.expr.Expr)	Aggregate values into an array.
`array_append`(→ datafusion.expr.Expr)	Appends an element to the end of an array.
`array_cat`(→ datafusion.expr.Expr)	Concatenates the input arrays.
`array_concat`(→ datafusion.expr.Expr)	Concatenates the input arrays.
`array_dims`(→ datafusion.expr.Expr)	Returns an array of the array's dimensions.
`array_distinct`(→ datafusion.expr.Expr)	Returns distinct values from the array after removing duplicates.
`array_element`(→ datafusion.expr.Expr)	Extracts the element with the index n from the array.
`array_empty`(→ datafusion.expr.Expr)	Returns a boolean indicating whether the array is empty.
`array_except`(→ datafusion.expr.Expr)	Returns the elements that appear in `array1` but not in `array2`.
`array_extract`(→ datafusion.expr.Expr)	Extracts the element with the index n from the array.
`array_has`(→ datafusion.expr.Expr)	Returns true if the element appears in the first array, otherwise false.
`array_has_all`(→ datafusion.expr.Expr)	Determines if there is complete overlap `second_array` in `first_array`.
`array_has_any`(→ datafusion.expr.Expr)	Determine if there is an overlap between `first_array` and `second_array`.
`array_indexof`(→ datafusion.expr.Expr)	Return the position of the first occurrence of `element` in `array`.
`array_intersect`(→ datafusion.expr.Expr)	Returns the intersection of `array1` and `array2`.
`array_join`(→ datafusion.expr.Expr)	Converts each element to its text representation.
`array_length`(→ datafusion.expr.Expr)	Returns the length of the array.
`array_ndims`(→ datafusion.expr.Expr)	Returns the number of dimensions of the array.
`array_pop_back`(→ datafusion.expr.Expr)	Returns the array without the last element.
`array_pop_front`(→ datafusion.expr.Expr)	Returns the array without the first element.
`array_position`(→ datafusion.expr.Expr)	Return the position of the first occurrence of `element` in `array`.
`array_positions`(→ datafusion.expr.Expr)	Searches for an element in the array and returns all occurrences.
`array_prepend`(→ datafusion.expr.Expr)	Prepends an element to the beginning of an array.
`array_push_back`(→ datafusion.expr.Expr)	Appends an element to the end of an array.
`array_push_front`(→ datafusion.expr.Expr)	Prepends an element to the beginning of an array.
`array_remove`(→ datafusion.expr.Expr)	Removes the first element from the array equal to the given value.
`array_remove_all`(→ datafusion.expr.Expr)	Removes all elements from the array equal to the given value.
`array_remove_n`(→ datafusion.expr.Expr)	Removes the first `max` elements from the array equal to the given value.
`array_repeat`(→ datafusion.expr.Expr)	Returns an array containing `element` `count` times.
`array_replace`(→ datafusion.expr.Expr)	Replaces the first occurrence of `from_val` with `to_val`.
`array_replace_all`(→ datafusion.expr.Expr)	Replaces all occurrences of `from_val` with `to_val`.
`array_replace_n`(→ datafusion.expr.Expr)	Replace `n` occurrences of `from_val` with `to_val`.
`array_resize`(→ datafusion.expr.Expr)	Returns an array with the specified size filled.
`array_slice`(→ datafusion.expr.Expr)	Returns a slice of the array.
`array_sort`(→ datafusion.expr.Expr)	Sort an array.
`array_to_string`(→ datafusion.expr.Expr)	Converts each element to its text representation.
`array_union`(→ datafusion.expr.Expr)	Returns an array of the elements in the union of array1 and array2.
`arrow_cast`(→ datafusion.expr.Expr)	Casts an expression to a specified data type.
`arrow_typeof`(→ datafusion.expr.Expr)	Returns the Arrow type of the expression.
`ascii`(→ datafusion.expr.Expr)	Returns the numeric code of the first character of the argument.
`asin`(→ datafusion.expr.Expr)	Returns the arc sine or inverse sine of a number.
`asinh`(→ datafusion.expr.Expr)	Returns inverse hyperbolic sine.
`atan`(→ datafusion.expr.Expr)	Returns inverse tangent of a number.
`atan2`(→ datafusion.expr.Expr)	Returns inverse tangent of a division given in the argument.
`atanh`(→ datafusion.expr.Expr)	Returns inverse hyperbolic tangent.
`avg`(→ datafusion.expr.Expr)	Returns the average value.
`bit_and`(→ datafusion.expr.Expr)	Computes the bitwise AND of the argument.
`bit_length`(→ datafusion.expr.Expr)	Returns the number of bits in the string argument.
`bit_or`(→ datafusion.expr.Expr)	Computes the bitwise OR of the argument.
`bit_xor`(→ datafusion.expr.Expr)	Computes the bitwise XOR of the argument.
`bool_and`(→ datafusion.expr.Expr)	Computes the boolean AND of the argument.
`bool_or`(→ datafusion.expr.Expr)	Computes the boolean OR of the argument.
`btrim`(→ datafusion.expr.Expr)	Removes all characters, spaces by default, from both sides of a string.
`cardinality`(→ datafusion.expr.Expr)	Returns the total number of elements in the array.
`case`(→ datafusion.expr.CaseBuilder)	Create a case expression.
`cbrt`(→ datafusion.expr.Expr)	Returns the cube root of a number.
`ceil`(→ datafusion.expr.Expr)	Returns the nearest integer greater than or equal to argument.
`char_length`(→ datafusion.expr.Expr)	The number of characters in the `string`.
`character_length`(→ datafusion.expr.Expr)	Returns the number of characters in the argument.
`chr`(→ datafusion.expr.Expr)	Converts the Unicode code point to a UTF8 character.
`coalesce`(→ datafusion.expr.Expr)	Returns the value of the first expr in `args` which is not NULL.
`col`(→ datafusion.expr.Expr)	Creates a column reference expression.
`concat`(→ datafusion.expr.Expr)	Concatenates the text representations of all the arguments.
`concat_ws`(→ datafusion.expr.Expr)	Concatenates the list `args` with the separator.
`corr`(→ datafusion.expr.Expr)	Returns the correlation coefficient between `value1` and `value2`.
`cos`(→ datafusion.expr.Expr)	Returns the cosine of the argument.
`cosh`(→ datafusion.expr.Expr)	Returns the hyperbolic cosine of the argument.
`cot`(→ datafusion.expr.Expr)	Returns the cotangent of the argument.
`count`(→ datafusion.expr.Expr)	Returns the number of rows that match the given arguments.
`count_star`(→ datafusion.expr.Expr)	Create a COUNT(1) aggregate expression.
`covar`(→ datafusion.expr.Expr)	Computes the sample covariance.
`covar_pop`(→ datafusion.expr.Expr)	Computes the population covariance.
`covar_samp`(→ datafusion.expr.Expr)	Computes the sample covariance.
`cume_dist`(→ datafusion.expr.Expr)	Create a cumulative distribution window function.
`current_date`(→ datafusion.expr.Expr)	Returns current UTC date as a Date32 value.
`current_time`(→ datafusion.expr.Expr)	Returns current UTC time as a Time64 value.
`date_bin`(→ datafusion.expr.Expr)	Coerces an arbitrary timestamp to the start of the nearest specified interval.
`date_part`(→ datafusion.expr.Expr)	Extracts a subfield from the date.
`date_trunc`(→ datafusion.expr.Expr)	Truncates the date to a specified level of precision.
`datepart`(→ datafusion.expr.Expr)	Return a specified part of a date.
`datetrunc`(→ datafusion.expr.Expr)	Truncates the date to a specified level of precision.
`decode`(→ datafusion.expr.Expr)	Decode the `input`, using the `encoding`. encoding can be base64 or hex.
`degrees`(→ datafusion.expr.Expr)	Converts the argument from radians to degrees.
`dense_rank`(→ datafusion.expr.Expr)	Create a dense_rank window function.
`digest`(→ datafusion.expr.Expr)	Computes the binary hash of an expression using the specified algorithm.
`empty`(→ datafusion.expr.Expr)	This is an alias for `array_empty()`.
`encode`(→ datafusion.expr.Expr)	Encode the `input`, using the `encoding`. encoding can be base64 or hex.
`ends_with`(→ datafusion.expr.Expr)	Returns true if the `string` ends with the `suffix`, false otherwise.
`exp`(→ datafusion.expr.Expr)	Returns the exponential of the argument.
`extract`(→ datafusion.expr.Expr)	Extracts a subfield from the date.
`factorial`(→ datafusion.expr.Expr)	Returns the factorial of the argument.
`find_in_set`(→ datafusion.expr.Expr)	Find a string in a list of strings.
`first_value`(→ datafusion.expr.Expr)	Returns the first value in a group of values.
`flatten`(→ datafusion.expr.Expr)	Flattens an array of arrays into a single array.
`floor`(→ datafusion.expr.Expr)	Returns the nearest integer less than or equal to the argument.
`from_unixtime`(→ datafusion.expr.Expr)	Converts an integer to RFC3339 timestamp format string.
`gcd`(→ datafusion.expr.Expr)	Returns the greatest common divisor.
`in_list`(→ datafusion.expr.Expr)	Returns whether the argument is contained within the list `values`.
`initcap`(→ datafusion.expr.Expr)	Set the initial letter of each word to capital.
`isnan`(→ datafusion.expr.Expr)	Returns true if a given number is +NaN or -NaN otherwise returns false.
`iszero`(→ datafusion.expr.Expr)	Returns true if a given number is +0.0 or -0.0 otherwise returns false.
`lag`(→ datafusion.expr.Expr)	Create a lag window function.
`last_value`(→ datafusion.expr.Expr)	Returns the last value in a group of values.
`lcm`(→ datafusion.expr.Expr)	Returns the least common multiple.
`lead`(→ datafusion.expr.Expr)	Create a lead window function.
`left`(→ datafusion.expr.Expr)	Returns the first `n` characters in the `string`.
`length`(→ datafusion.expr.Expr)	The number of characters in the `string`.
`levenshtein`(→ datafusion.expr.Expr)	Returns the Levenshtein distance between the two given strings.
`list_append`(→ datafusion.expr.Expr)	Appends an element to the end of an array.
`list_cat`(→ datafusion.expr.Expr)	Concatenates the input arrays.
`list_concat`(→ datafusion.expr.Expr)	Concatenates the input arrays.
`list_dims`(→ datafusion.expr.Expr)	Returns an array of the array's dimensions.
`list_distinct`(→ datafusion.expr.Expr)	Returns distinct values from the array after removing duplicates.
`list_element`(→ datafusion.expr.Expr)	Extracts the element with the index n from the array.
`list_except`(→ datafusion.expr.Expr)	Returns the elements that appear in `array1` but not in the `array2`.
`list_extract`(→ datafusion.expr.Expr)	Extracts the element with the index n from the array.
`list_indexof`(→ datafusion.expr.Expr)	Return the position of the first occurrence of `element` in `array`.
`list_intersect`(→ datafusion.expr.Expr)	Returns an the intersection of `array1` and `array2`.
`list_join`(→ datafusion.expr.Expr)	Converts each element to its text representation.
`list_length`(→ datafusion.expr.Expr)	Returns the length of the array.
`list_ndims`(→ datafusion.expr.Expr)	Returns the number of dimensions of the array.
`list_position`(→ datafusion.expr.Expr)	Return the position of the first occurrence of `element` in `array`.
`list_positions`(→ datafusion.expr.Expr)	Searches for an element in the array and returns all occurrences.
`list_prepend`(→ datafusion.expr.Expr)	Prepends an element to the beginning of an array.
`list_push_back`(→ datafusion.expr.Expr)	Appends an element to the end of an array.
`list_push_front`(→ datafusion.expr.Expr)	Prepends an element to the beginning of an array.
`list_remove`(→ datafusion.expr.Expr)	Removes the first element from the array equal to the given value.
`list_remove_all`(→ datafusion.expr.Expr)	Removes all elements from the array equal to the given value.
`list_remove_n`(→ datafusion.expr.Expr)	Removes the first `max` elements from the array equal to the given value.
`list_repeat`(→ datafusion.expr.Expr)	Returns an array containing `element` `count` times.
`list_replace`(→ datafusion.expr.Expr)	Replaces the first occurrence of `from_val` with `to_val`.
`list_replace_all`(→ datafusion.expr.Expr)	Replaces all occurrences of `from_val` with `to_val`.
`list_replace_n`(→ datafusion.expr.Expr)	Replace `n` occurrences of `from_val` with `to_val`.
`list_resize`(→ datafusion.expr.Expr)	Returns an array with the specified size filled.
`list_slice`(→ datafusion.expr.Expr)	Returns a slice of the array.
`list_sort`(→ datafusion.expr.Expr)	This is an alias for `array_sort()`.
`list_to_string`(→ datafusion.expr.Expr)	Converts each element to its text representation.
`list_union`(→ datafusion.expr.Expr)	Returns an array of the elements in the union of array1 and array2.
`ln`(→ datafusion.expr.Expr)	Returns the natural logarithm (base e) of the argument.
`log`(→ datafusion.expr.Expr)	Returns the logarithm of a number for a particular `base`.
`log10`(→ datafusion.expr.Expr)	Base 10 logarithm of the argument.
`log2`(→ datafusion.expr.Expr)	Base 2 logarithm of the argument.
`lower`(→ datafusion.expr.Expr)	Converts a string to lowercase.
`lpad`(→ datafusion.expr.Expr)	Add left padding to a string.
`ltrim`(→ datafusion.expr.Expr)	Removes all characters, spaces by default, from the beginning of a string.
`make_array`(→ datafusion.expr.Expr)	Returns an array using the specified input expressions.
`make_date`(→ datafusion.expr.Expr)	Make a date from year, month and day component parts.
`make_list`(→ datafusion.expr.Expr)	Returns an array using the specified input expressions.
`max`(→ datafusion.expr.Expr)	Aggregate function that returns the maximum value of the argument.
`md5`(→ datafusion.expr.Expr)	Computes an MD5 128-bit checksum for a string expression.
`mean`(→ datafusion.expr.Expr)	Returns the average (mean) value of the argument.
`median`(→ datafusion.expr.Expr)	Computes the median of a set of numbers.
`min`(→ datafusion.expr.Expr)	Returns the minimum value of the argument.
`named_struct`(→ datafusion.expr.Expr)	Returns a struct with the given names and arguments pairs.
`nanvl`(→ datafusion.expr.Expr)	Returns `x` if `x` is not `NaN`. Otherwise returns `y`.
`now`(→ datafusion.expr.Expr)	Returns the current timestamp in nanoseconds.
`nth_value`(→ datafusion.expr.Expr)	Returns the n-th value in a group of values.
`ntile`(→ datafusion.expr.Expr)	Create a n-tile window function.
`nullif`(→ datafusion.expr.Expr)	Returns NULL if expr1 equals expr2; otherwise it returns expr1.
`nvl`(→ datafusion.expr.Expr)	Returns `x` if `x` is not `NULL`. Otherwise returns `y`.
`octet_length`(→ datafusion.expr.Expr)	Returns the number of bytes of a string.
`order_by`(→ datafusion.expr.SortExpr)	Creates a new sort expression.
`overlay`(→ datafusion.expr.Expr)	Replace a substring with a new substring.
`percent_rank`(→ datafusion.expr.Expr)	Create a percent_rank window function.
`pi`(→ datafusion.expr.Expr)	Returns an approximate value of π.
`pow`(→ datafusion.expr.Expr)	Returns `base` raised to the power of `exponent`.
`power`(→ datafusion.expr.Expr)	Returns `base` raised to the power of `exponent`.
`radians`(→ datafusion.expr.Expr)	Converts the argument from degrees to radians.
`random`(→ datafusion.expr.Expr)	Returns a random value in the range `0.0 <= x < 1.0`.
`range`(→ datafusion.expr.Expr)	Create a list of values in the range between start and stop.
`rank`(→ datafusion.expr.Expr)	Create a rank window function.
`regexp_count`(→ datafusion.expr.Expr)	Returns the number of matches in a string.
`regexp_like`(→ datafusion.expr.Expr)	Find if any regular expression (regex) matches exist.
`regexp_match`(→ datafusion.expr.Expr)	Perform regular expression (regex) matching.
`regexp_replace`(→ datafusion.expr.Expr)	Replaces substring(s) matching a PCRE-like regular expression.
`regr_avgx`(→ datafusion.expr.Expr)	Computes the average of the independent variable `x`.
`regr_avgy`(→ datafusion.expr.Expr)	Computes the average of the dependent variable `y`.
`regr_count`(→ datafusion.expr.Expr)	Counts the number of rows in which both expressions are not null.
`regr_intercept`(→ datafusion.expr.Expr)	Computes the intercept from the linear regression.
`regr_r2`(→ datafusion.expr.Expr)	Computes the R-squared value from linear regression.
`regr_slope`(→ datafusion.expr.Expr)	Computes the slope from linear regression.
`regr_sxx`(→ datafusion.expr.Expr)	Computes the sum of squares of the independent variable `x`.
`regr_sxy`(→ datafusion.expr.Expr)	Computes the sum of products of pairs of numbers.
`regr_syy`(→ datafusion.expr.Expr)	Computes the sum of squares of the dependent variable `y`.
`repeat`(→ datafusion.expr.Expr)	Repeats the `string` to `n` times.
`replace`(→ datafusion.expr.Expr)	Replaces all occurrences of `from_val` with `to_val` in the `string`.
`reverse`(→ datafusion.expr.Expr)	Reverse the string argument.
`right`(→ datafusion.expr.Expr)	Returns the last `n` characters in the `string`.
`round`(→ datafusion.expr.Expr)	Round the argument to the nearest integer.
`row_number`(→ datafusion.expr.Expr)	Create a row number window function.
`rpad`(→ datafusion.expr.Expr)	Add right padding to a string.
`rtrim`(→ datafusion.expr.Expr)	Removes all characters, spaces by default, from the end of a string.
`sha224`(→ datafusion.expr.Expr)	Computes the SHA-224 hash of a binary string.
`sha256`(→ datafusion.expr.Expr)	Computes the SHA-256 hash of a binary string.
`sha384`(→ datafusion.expr.Expr)	Computes the SHA-384 hash of a binary string.
`sha512`(→ datafusion.expr.Expr)	Computes the SHA-512 hash of a binary string.
`signum`(→ datafusion.expr.Expr)	Returns the sign of the argument (-1, 0, +1).
`sin`(→ datafusion.expr.Expr)	Returns the sine of the argument.
`sinh`(→ datafusion.expr.Expr)	Returns the hyperbolic sine of the argument.
`split_part`(→ datafusion.expr.Expr)	Split a string and return one part.
`sqrt`(→ datafusion.expr.Expr)	Returns the square root of the argument.
`starts_with`(→ datafusion.expr.Expr)	Returns true if string starts with prefix.
`stddev`(→ datafusion.expr.Expr)	Computes the standard deviation of the argument.
`stddev_pop`(→ datafusion.expr.Expr)	Computes the population standard deviation of the argument.
`stddev_samp`(→ datafusion.expr.Expr)	Computes the sample standard deviation of the argument.
`string_agg`(→ datafusion.expr.Expr)	Concatenates the input strings.
`strpos`(→ datafusion.expr.Expr)	Finds the position from where the `substring` matches the `string`.
`struct`(→ datafusion.expr.Expr)	Returns a struct with the given arguments.
`substr`(→ datafusion.expr.Expr)	Substring from the `position` to the end.
`substr_index`(→ datafusion.expr.Expr)	Returns an indexed substring.
`substring`(→ datafusion.expr.Expr)	Substring from the `position` with `length` characters.
`sum`(→ datafusion.expr.Expr)	Computes the sum of a set of numbers.
`tan`(→ datafusion.expr.Expr)	Returns the tangent of the argument.
`tanh`(→ datafusion.expr.Expr)	Returns the hyperbolic tangent of the argument.
`to_hex`(→ datafusion.expr.Expr)	Converts an integer to a hexadecimal string.
`to_timestamp`(→ datafusion.expr.Expr)	Converts a string and optional formats to a `Timestamp` in nanoseconds.
`to_timestamp_micros`(→ datafusion.expr.Expr)	Converts a string and optional formats to a `Timestamp` in microseconds.
`to_timestamp_millis`(→ datafusion.expr.Expr)	Converts a string and optional formats to a `Timestamp` in milliseconds.
`to_timestamp_nanos`(→ datafusion.expr.Expr)	Converts a string and optional formats to a `Timestamp` in nanoseconds.
`to_timestamp_seconds`(→ datafusion.expr.Expr)	Converts a string and optional formats to a `Timestamp` in seconds.
`to_unixtime`(→ datafusion.expr.Expr)	Converts a string and optional formats to a Unixtime.
`translate`(→ datafusion.expr.Expr)	Replaces the characters in `from_val` with the counterpart in `to_val`.
`trim`(→ datafusion.expr.Expr)	Removes all characters, spaces by default, from both sides of a string.
`trunc`(→ datafusion.expr.Expr)	Truncate the number toward zero with optional precision.
`upper`(→ datafusion.expr.Expr)	Converts a string to uppercase.
`uuid`(→ datafusion.expr.Expr)	Returns uuid v4 as a string value.
`var`(→ datafusion.expr.Expr)	Computes the sample variance of the argument.
`var_pop`(→ datafusion.expr.Expr)	Computes the population variance of the argument.
`var_samp`(→ datafusion.expr.Expr)	Computes the sample variance of the argument.
`var_sample`(→ datafusion.expr.Expr)	Computes the sample variance of the argument.
`when`(→ datafusion.expr.CaseBuilder)	Create a case expression that has no base expression.
`window`(→ datafusion.expr.Expr)	Creates a new Window function expression.

Module Contents¶

datafusion.functions.abs(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶

Return the absolute value of a given number.

Returns:¶

Expr: A new expression representing the absolute value of the input expression.

datafusion.functions.acos(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns the arc cosine or inverse cosine of a number.

Returns:¶

Expr: A new expression representing the arc cosine of the input expression.

datafusion.functions.acosh(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns inverse hyperbolic cosine.

datafusion.functions.alias(expr: datafusion.expr.Expr, name: str, metadata: dict[str, str] | None = None) → datafusion.expr.Expr¶

Creates an alias expression with an optional metadata dictionary.

Parameters:

expr – The expression to alias
name – The alias name
metadata – Optional metadata to attach to the column

Returns:

An expression with the given alias

datafusion.functions.approx_distinct(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the approximate number of distinct values.

This aggregate function is similar to count() with distinct set, but it will approximate the number of distinct entries. It may return significantly faster than count() for some DataFrames.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Values to check for distinct entries
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.approx_median(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the approximate median value.

This aggregate function is similar to median(), but it will only approximate the median. It may return significantly faster for some DataFrames.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment, and distinct.

Parameters:

expression – Values to find the median for
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.approx_percentile_cont(expression: datafusion.expr.Expr, percentile: float, num_centroids: int | None = None, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the value that is approximately at a given percentile of expr.

This aggregate function assumes the input values form a continuous distribution. Suppose you have a DataFrame which consists of 100 different test scores. If you called this function with a percentile of 0.9, it would return the value of the test score that is above 90% of the other test scores. The returned value may be between two of the values.

This function uses the [t-digest](https://arxiv.org/abs/1902.04023) algorithm to compute the percentil. You can limit the number of bins used in this algorithm by setting the num_centroids parameter.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Values for which to find the approximate percentile
percentile – This must be between 0.0 and 1.0, inclusive
num_centroids – Max bin size for the t-digest algorithm
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.approx_percentile_cont_with_weight(expression: datafusion.expr.Expr, weight: datafusion.expr.Expr, percentile: float, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the value of the weighted approximate percentile.

This aggregate function is similar to approx_percentile_cont() except that it uses the associated associated weights.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Values for which to find the approximate percentile
weight – Relative weight for each of the values in expression
percentile – This must be between 0.0 and 1.0, inclusive
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.array(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array using the specified input expressions.

This is an alias for make_array().

datafusion.functions.array_agg(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Aggregate values into an array.

Currently distinct and order_by cannot be used together. As a work around, consider array_sort() after aggregation. [Issue Tracker](https://github.com/apache/datafusion/issues/12371)

If using the builder functions described in ref:_aggregation this function ignores the option null_treatment.

Parameters:

expression – Values to combine into an array
distinct – If True, a single entry for each distinct value will be in the result
filter – If provided, only compute against rows for which the filter is True
order_by – Order the resultant array values

datafusion.functions.array_append(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶: Appends an element to the end of an array.

datafusion.functions.array_cat(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶

Concatenates the input arrays.

This is an alias for array_concat().

datafusion.functions.array_concat(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶: Concatenates the input arrays.

datafusion.functions.array_dims(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns an array of the array’s dimensions.

datafusion.functions.array_distinct(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns distinct values from the array after removing duplicates.

datafusion.functions.array_element(array: datafusion.expr.Expr, n: datafusion.expr.Expr) → datafusion.expr.Expr¶: Extracts the element with the index n from the array.

datafusion.functions.array_empty(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns a boolean indicating whether the array is empty.

datafusion.functions.array_except(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the elements that appear in array1 but not in array2.

datafusion.functions.array_extract(array: datafusion.expr.Expr, n: datafusion.expr.Expr) → datafusion.expr.Expr¶

Extracts the element with the index n from the array.

This is an alias for array_element().

datafusion.functions.array_has(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns true if the element appears in the first array, otherwise false.

datafusion.functions.array_has_all(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Determines if there is complete overlap second_array in first_array.

Returns true if each element of the second array appears in the first array. Otherwise, it returns false.

datafusion.functions.array_has_any(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Determine if there is an overlap between first_array and second_array.

Returns true if at least one element of the second array appears in the first array. Otherwise, it returns false.

datafusion.functions.array_indexof(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) → datafusion.expr.Expr¶

Return the position of the first occurrence of element in array.

This is an alias for array_position().

datafusion.functions.array_intersect(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the intersection of array1 and array2.

datafusion.functions.array_join(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts each element to its text representation.

This is an alias for array_to_string().

datafusion.functions.array_length(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the length of the array.

datafusion.functions.array_ndims(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the number of dimensions of the array.

datafusion.functions.array_pop_back(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the array without the last element.

datafusion.functions.array_pop_front(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the array without the first element.

datafusion.functions.array_position(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) → datafusion.expr.Expr¶: Return the position of the first occurrence of element in array.

datafusion.functions.array_positions(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶: Searches for an element in the array and returns all occurrences.

datafusion.functions.array_prepend(element: datafusion.expr.Expr, array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Prepends an element to the beginning of an array.

datafusion.functions.array_push_back(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶

Appends an element to the end of an array.

This is an alias for array_append().

datafusion.functions.array_push_front(element: datafusion.expr.Expr, array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Prepends an element to the beginning of an array.

This is an alias for array_prepend().

datafusion.functions.array_remove(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶: Removes the first element from the array equal to the given value.

datafusion.functions.array_remove_all(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶: Removes all elements from the array equal to the given value.

datafusion.functions.array_remove_n(array: datafusion.expr.Expr, element: datafusion.expr.Expr, max: datafusion.expr.Expr) → datafusion.expr.Expr¶: Removes the first max elements from the array equal to the given value.

datafusion.functions.array_repeat(element: datafusion.expr.Expr, count: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns an array containing element count times.

datafusion.functions.array_replace(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) → datafusion.expr.Expr¶: Replaces the first occurrence of from_val with to_val.

datafusion.functions.array_replace_all(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) → datafusion.expr.Expr¶: Replaces all occurrences of from_val with to_val.

datafusion.functions.array_replace_n(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr, max: datafusion.expr.Expr) → datafusion.expr.Expr¶

Replace n occurrences of from_val with to_val.

Replaces the first max occurrences of the specified element with another specified element.

datafusion.functions.array_resize(array: datafusion.expr.Expr, size: datafusion.expr.Expr, value: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array with the specified size filled.

If size is greater than the array length, the additional entries will be filled with the given value.

datafusion.functions.array_slice(array: datafusion.expr.Expr, begin: datafusion.expr.Expr, end: datafusion.expr.Expr, stride: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶: Returns a slice of the array.

datafusion.functions.array_sort(array: datafusion.expr.Expr, descending: bool = False, null_first: bool = False) → datafusion.expr.Expr¶

Sort an array.

Parameters:

array – The input array to sort.
descending – If True, sorts in descending order.
null_first – If True, nulls will be returned at the beginning of the array.

datafusion.functions.array_to_string(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts each element to its text representation.

datafusion.functions.array_union(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array of the elements in the union of array1 and array2.

Duplicate rows will not be returned.

datafusion.functions.arrow_cast(expr: datafusion.expr.Expr, data_type: datafusion.expr.Expr) → datafusion.expr.Expr¶: Casts an expression to a specified data type.

datafusion.functions.arrow_typeof(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the Arrow type of the expression.

datafusion.functions.ascii(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the numeric code of the first character of the argument.

datafusion.functions.asin(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the arc sine or inverse sine of a number.

datafusion.functions.asinh(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns inverse hyperbolic sine.

datafusion.functions.atan(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns inverse tangent of a number.

datafusion.functions.atan2(y: datafusion.expr.Expr, x: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns inverse tangent of a division given in the argument.

datafusion.functions.atanh(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns inverse hyperbolic tangent.

datafusion.functions.avg(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the average value.

This aggregate function expects a numeric expression and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Values to combine into an array
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bit_and(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the bitwise AND of the argument.

This aggregate function will bitwise compare every value in the input partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bit_length(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the number of bits in the string argument.

datafusion.functions.bit_or(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the bitwise OR of the argument.

This aggregate function will bitwise compare every value in the input partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bit_xor(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the bitwise XOR of the argument.

This aggregate function will bitwise compare every value in the input partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment.

Parameters:

expression – Argument to perform bitwise calculation on
distinct – If True, evaluate each unique value of expression only once
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bool_and(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the boolean AND of the argument.

This aggregate function will compare every value in the input partition. These are expected to be boolean values.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Argument to perform calculation on
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bool_or(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the boolean OR of the argument.

This aggregate function will compare every value in the input partition. These are expected to be boolean values.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Argument to perform calculation on
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.btrim(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Removes all characters, spaces by default, from both sides of a string.

datafusion.functions.cardinality(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the total number of elements in the array.

datafusion.functions.case(expr: datafusion.expr.Expr) → datafusion.expr.CaseBuilder¶

Create a case expression.

Create a CaseBuilder to match cases for the expression expr. See CaseBuilder for detailed usage.

datafusion.functions.cbrt(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the cube root of a number.

datafusion.functions.ceil(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the nearest integer greater than or equal to argument.

datafusion.functions.char_length(string: datafusion.expr.Expr) → datafusion.expr.Expr¶: The number of characters in the string.

datafusion.functions.character_length(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the number of characters in the argument.

datafusion.functions.chr(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts the Unicode code point to a UTF8 character.

datafusion.functions.coalesce(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the value of the first expr in args which is not NULL.

datafusion.functions.col(name: str) → datafusion.expr.Expr¶: Creates a column reference expression.

datafusion.functions.concat(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶

Concatenates the text representations of all the arguments.

NULL arguments are ignored.

datafusion.functions.concat_ws(separator: str, *args: datafusion.expr.Expr) → datafusion.expr.Expr¶

Concatenates the list args with the separator.

NULL arguments are ignored. separator should not be NULL.

datafusion.functions.corr(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the correlation coefficient between value1 and value2.

This aggregate function expects both values to be numeric and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

value_y – The dependent variable for correlation
value_x – The independent variable for correlation
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.cos(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the cosine of the argument.

datafusion.functions.cosh(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the hyperbolic cosine of the argument.

datafusion.functions.cot(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the cotangent of the argument.

datafusion.functions.count(expressions: datafusion.expr.Expr | list[datafusion.expr.Expr] | None = None, distinct: bool = False, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the number of rows that match the given arguments.

This aggregate function will count the non-null rows provided in the expression.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment.

Parameters:

expressions – Argument to perform bitwise calculation on
distinct – If True, a single entry for each distinct value will be in the result
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.count_star(filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Create a COUNT(1) aggregate expression.

This aggregate function will count all of the rows in the partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, distinct, and null_treatment.

Parameters:: filter – If provided, only count rows for which the filter is True

datafusion.functions.covar(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sample covariance.

This is an alias for covar_samp().

datafusion.functions.covar_pop(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the population covariance.

This aggregate function expects both values to be numeric and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

value_y – The dependent variable for covariance
value_x – The independent variable for covariance
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.covar_samp(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sample covariance.

This aggregate function expects both values to be numeric and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

value_y – The dependent variable for covariance
value_x – The independent variable for covariance
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.cume_dist(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a cumulative distribution window function.

This window function is similar to rank() except that the returned values are the ratio of the row number to the total numebr of rows. Here is an example of a dataframe with a window ordered by descending points and the associated cumulative distribution:

+--------+-----------+
| points | cume_dist |
+--------+-----------+
| 100    | 0.5       |
| 100    | 0.5       |
| 50     | 0.75      |
| 25     | 1.0       |
+--------+-----------+

Parameters:

partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.current_date() → datafusion.expr.Expr¶: Returns current UTC date as a Date32 value.

datafusion.functions.current_time() → datafusion.expr.Expr¶: Returns current UTC time as a Time64 value.

datafusion.functions.date_bin(stride: datafusion.expr.Expr, source: datafusion.expr.Expr, origin: datafusion.expr.Expr) → datafusion.expr.Expr¶: Coerces an arbitrary timestamp to the start of the nearest specified interval.

datafusion.functions.date_part(part: datafusion.expr.Expr, date: datafusion.expr.Expr) → datafusion.expr.Expr¶: Extracts a subfield from the date.

datafusion.functions.date_trunc(part: datafusion.expr.Expr, date: datafusion.expr.Expr) → datafusion.expr.Expr¶: Truncates the date to a specified level of precision.

datafusion.functions.datepart(part: datafusion.expr.Expr, date: datafusion.expr.Expr) → datafusion.expr.Expr¶

Return a specified part of a date.

This is an alias for date_part().

datafusion.functions.datetrunc(part: datafusion.expr.Expr, date: datafusion.expr.Expr) → datafusion.expr.Expr¶

Truncates the date to a specified level of precision.

This is an alias for date_trunc().

datafusion.functions.decode(expr: datafusion.expr.Expr, encoding: datafusion.expr.Expr) → datafusion.expr.Expr¶: Decode the input, using the encoding. encoding can be base64 or hex.

datafusion.functions.degrees(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts the argument from radians to degrees.

datafusion.functions.dense_rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a dense_rank window function.

This window function is similar to rank() except that the returned values will be consecutive. Here is an example of a dataframe with a window ordered by descending points and the associated dense rank:

+--------+------------+
| points | dense_rank |
+--------+------------+
| 100    | 1          |
| 100    | 1          |
| 50     | 2          |
| 25     | 3          |
+--------+------------+

Parameters:

partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.digest(value: datafusion.expr.Expr, method: datafusion.expr.Expr) → datafusion.expr.Expr¶

Computes the binary hash of an expression using the specified algorithm.

Standard algorithms are md5, sha224, sha256, sha384, sha512, blake2s, blake2b, and blake3.

datafusion.functions.empty(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: This is an alias for array_empty().

datafusion.functions.encode(expr: datafusion.expr.Expr, encoding: datafusion.expr.Expr) → datafusion.expr.Expr¶: Encode the input, using the encoding. encoding can be base64 or hex.

datafusion.functions.ends_with(arg: datafusion.expr.Expr, suffix: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns true if the string ends with the suffix, false otherwise.

datafusion.functions.exp(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the exponential of the argument.

datafusion.functions.extract(part: datafusion.expr.Expr, date: datafusion.expr.Expr) → datafusion.expr.Expr¶

Extracts a subfield from the date.

This is an alias for date_part().

datafusion.functions.factorial(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the factorial of the argument.

datafusion.functions.find_in_set(string: datafusion.expr.Expr, string_list: datafusion.expr.Expr) → datafusion.expr.Expr¶

Find a string in a list of strings.

Returns a value in the range of 1 to N if the string is in the string list string_list consisting of N substrings.

The string list is a string composed of substrings separated by , characters.

datafusion.functions.first_value(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) → datafusion.expr.Expr¶

Returns the first value in a group of values.

This aggregate function will return the first value in the partition.

If using the builder functions described in ref:_aggregation this function ignores the option distinct.

Parameters:

expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate
null_treatment – Assign whether to respect or ignore null values.

datafusion.functions.flatten(array: datafusion.expr.Expr) → datafusion.expr.Expr¶: Flattens an array of arrays into a single array.

datafusion.functions.floor(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the nearest integer less than or equal to the argument.

datafusion.functions.from_unixtime(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts an integer to RFC3339 timestamp format string.

datafusion.functions.gcd(x: datafusion.expr.Expr, y: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the greatest common divisor.

datafusion.functions.in_list(arg: datafusion.expr.Expr, values: list[datafusion.expr.Expr], negated: bool = False) → datafusion.expr.Expr¶: Returns whether the argument is contained within the list values.

datafusion.functions.initcap(string: datafusion.expr.Expr) → datafusion.expr.Expr¶

Set the initial letter of each word to capital.

Converts the first letter of each word in string to uppercase and the remaining characters to lowercase.

datafusion.functions.isnan(expr: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns true if a given number is +NaN or -NaN otherwise returns false.

datafusion.functions.iszero(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns true if a given number is +0.0 or -0.0 otherwise returns false.

datafusion.functions.lag(arg: datafusion.expr.Expr, shift_offset: int = 1, default_value: Any | None = None, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a lag window function.

Lag operation will return the argument that is in the previous shift_offset-th row in the partition. For example lag(col("b"), shift_offset=3, default_value=5) will return the 3rd previous value in column b. At the beginnig of the partition, where no values can be returned it will return the default value of 5.

Here is an example of both the lag and datafusion.functions.lead() functions on a simple DataFrame:

+--------+------+-----+
| points | lead | lag |
+--------+------+-----+
| 100    | 100  |     |
| 100    | 50   | 100 |
| 50     | 25   | 100 |
| 25     |      | 50  |
+--------+------+-----+

Parameters:

arg – Value to return
shift_offset – Number of rows before the current row.
default_value – Value to return if shift_offet row does not exist.
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.last_value(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) → datafusion.expr.Expr¶

Returns the last value in a group of values.

This aggregate function will return the last value in the partition.

If using the builder functions described in ref:_aggregation this function ignores the option distinct.

Parameters:

expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate
null_treatment – Assign whether to respect or ignore null values.

datafusion.functions.lcm(x: datafusion.expr.Expr, y: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the least common multiple.

datafusion.functions.lead(arg: datafusion.expr.Expr, shift_offset: int = 1, default_value: Any | None = None, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a lead window function.

Lead operation will return the argument that is in the next shift_offset-th row in the partition. For example lead(col("b"), shift_offset=3, default_value=5) will return the 3rd following value in column b. At the end of the partition, where no futher values can be returned it will return the default value of 5.

Here is an example of both the lead and datafusion.functions.lag() functions on a simple DataFrame:

+--------+------+-----+
| points | lead | lag |
+--------+------+-----+
| 100    | 100  |     |
| 100    | 50   | 100 |
| 50     | 25   | 100 |
| 25     |      | 50  |
+--------+------+-----+

To set window function parameters use the window builder approach described in the ref:_window_functions online documentation.

Parameters:

arg – Value to return
shift_offset – Number of rows following the current row.
default_value – Value to return if shift_offet row does not exist.
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.left(string: datafusion.expr.Expr, n: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the first n characters in the string.

datafusion.functions.length(string: datafusion.expr.Expr) → datafusion.expr.Expr¶: The number of characters in the string.

datafusion.functions.levenshtein(string1: datafusion.expr.Expr, string2: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the Levenshtein distance between the two given strings.

datafusion.functions.list_append(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶

Appends an element to the end of an array.

This is an alias for array_append().

datafusion.functions.list_cat(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶

Concatenates the input arrays.

This is an alias for array_concat(), array_cat().

datafusion.functions.list_concat(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶

Concatenates the input arrays.

This is an alias for array_concat(), array_cat().

datafusion.functions.list_dims(array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array of the array’s dimensions.

This is an alias for array_dims().

datafusion.functions.list_distinct(array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns distinct values from the array after removing duplicates.

This is an alias for array_distinct().

datafusion.functions.list_element(array: datafusion.expr.Expr, n: datafusion.expr.Expr) → datafusion.expr.Expr¶

Extracts the element with the index n from the array.

This is an alias for array_element().

datafusion.functions.list_except(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns the elements that appear in array1 but not in the array2.

This is an alias for array_except().

datafusion.functions.list_extract(array: datafusion.expr.Expr, n: datafusion.expr.Expr) → datafusion.expr.Expr¶

Extracts the element with the index n from the array.

This is an alias for array_element().

datafusion.functions.list_indexof(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) → datafusion.expr.Expr¶

Return the position of the first occurrence of element in array.

This is an alias for array_position().

datafusion.functions.list_intersect(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an the intersection of array1 and array2.

This is an alias for array_intersect().

datafusion.functions.list_join(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts each element to its text representation.

This is an alias for array_to_string().

datafusion.functions.list_length(array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns the length of the array.

This is an alias for array_length().

datafusion.functions.list_ndims(array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns the number of dimensions of the array.

This is an alias for array_ndims().

datafusion.functions.list_position(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) → datafusion.expr.Expr¶

Return the position of the first occurrence of element in array.

This is an alias for array_position().

datafusion.functions.list_positions(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶

Searches for an element in the array and returns all occurrences.

This is an alias for array_positions().

datafusion.functions.list_prepend(element: datafusion.expr.Expr, array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Prepends an element to the beginning of an array.

This is an alias for array_prepend().

datafusion.functions.list_push_back(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶

Appends an element to the end of an array.

This is an alias for array_append().

datafusion.functions.list_push_front(element: datafusion.expr.Expr, array: datafusion.expr.Expr) → datafusion.expr.Expr¶

Prepends an element to the beginning of an array.

This is an alias for array_prepend().

datafusion.functions.list_remove(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶

Removes the first element from the array equal to the given value.

This is an alias for array_remove().

datafusion.functions.list_remove_all(array: datafusion.expr.Expr, element: datafusion.expr.Expr) → datafusion.expr.Expr¶

Removes all elements from the array equal to the given value.

This is an alias for array_remove_all().

datafusion.functions.list_remove_n(array: datafusion.expr.Expr, element: datafusion.expr.Expr, max: datafusion.expr.Expr) → datafusion.expr.Expr¶

Removes the first max elements from the array equal to the given value.

This is an alias for array_remove_n().

datafusion.functions.list_repeat(element: datafusion.expr.Expr, count: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array containing element count times.

This is an alias for array_repeat().

datafusion.functions.list_replace(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) → datafusion.expr.Expr¶

Replaces the first occurrence of from_val with to_val.

This is an alias for array_replace().

datafusion.functions.list_replace_all(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) → datafusion.expr.Expr¶

Replaces all occurrences of from_val with to_val.

This is an alias for array_replace_all().

datafusion.functions.list_replace_n(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr, max: datafusion.expr.Expr) → datafusion.expr.Expr¶

Replace n occurrences of from_val with to_val.

Replaces the first max occurrences of the specified element with another specified element.

This is an alias for array_replace_n().

datafusion.functions.list_resize(array: datafusion.expr.Expr, size: datafusion.expr.Expr, value: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array with the specified size filled.

If size is greater than the array length, the additional entries will be filled with the given value. This is an alias for array_resize().

datafusion.functions.list_slice(array: datafusion.expr.Expr, begin: datafusion.expr.Expr, end: datafusion.expr.Expr, stride: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns a slice of the array.

This is an alias for array_slice().

datafusion.functions.list_sort(array: datafusion.expr.Expr, descending: bool = False, null_first: bool = False) → datafusion.expr.Expr¶: This is an alias for array_sort().

datafusion.functions.list_to_string(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts each element to its text representation.

This is an alias for array_to_string().

datafusion.functions.list_union(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array of the elements in the union of array1 and array2.

Duplicate rows will not be returned.

This is an alias for array_union().

datafusion.functions.ln(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the natural logarithm (base e) of the argument.

datafusion.functions.log(base: datafusion.expr.Expr, num: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the logarithm of a number for a particular base.

datafusion.functions.log10(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Base 10 logarithm of the argument.

datafusion.functions.log2(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Base 2 logarithm of the argument.

datafusion.functions.lower(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts a string to lowercase.

datafusion.functions.lpad(string: datafusion.expr.Expr, count: datafusion.expr.Expr, characters: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Add left padding to a string.

Extends the string to length length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right).

datafusion.functions.ltrim(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Removes all characters, spaces by default, from the beginning of a string.

datafusion.functions.make_array(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns an array using the specified input expressions.

datafusion.functions.make_date(year: datafusion.expr.Expr, month: datafusion.expr.Expr, day: datafusion.expr.Expr) → datafusion.expr.Expr¶: Make a date from year, month and day component parts.

datafusion.functions.make_list(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an array using the specified input expressions.

This is an alias for make_array().

datafusion.functions.max(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Aggregate function that returns the maximum value of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – The value to find the maximum of
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.md5(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Computes an MD5 128-bit checksum for a string expression.

datafusion.functions.mean(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the average (mean) value of the argument.

This is an alias for avg().

datafusion.functions.median(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the median of a set of numbers.

This aggregate function returns the median value of the expression for the given aggregate function.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment.

Parameters:

expression – The value to compute the median of
distinct – If True, a single entry for each distinct value will be in the result
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.min(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the minimum value of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – The value to find the minimum of
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.named_struct(name_pairs: list[tuple[str, datafusion.expr.Expr]]) → datafusion.expr.Expr¶: Returns a struct with the given names and arguments pairs.

datafusion.functions.nanvl(x: datafusion.expr.Expr, y: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns x if x is not NaN. Otherwise returns y.

datafusion.functions.now() → datafusion.expr.Expr¶

Returns the current timestamp in nanoseconds.

This will use the same value for all instances of now() in same statement.

datafusion.functions.nth_value(expression: datafusion.expr.Expr, n: int, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) → datafusion.expr.Expr¶

Returns the n-th value in a group of values.

This aggregate function will return the n-th value in the partition.

If using the builder functions described in ref:_aggregation this function ignores the option distinct.

Parameters:

expression – Argument to perform bitwise calculation on
n – Index of value to return. Starts at 1.
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate
null_treatment – Assign whether to respect or ignore null values.

datafusion.functions.ntile(groups: int, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a n-tile window function.

This window function orders the window frame into a give number of groups based on the ordering criteria. It then returns which group the current row is assigned to. Here is an example of a dataframe with a window ordered by descending points and the associated n-tile function:

+--------+-------+
| points | ntile |
+--------+-------+
| 120    | 1     |
| 100    | 1     |
| 80     | 2     |
| 60     | 2     |
| 40     | 3     |
| 20     | 3     |
+--------+-------+

Parameters:

groups – Number of groups for the n-tile to be divided into.
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.nullif(expr1: datafusion.expr.Expr, expr2: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns NULL if expr1 equals expr2; otherwise it returns expr1.

This can be used to perform the inverse operation of the COALESCE expression.

datafusion.functions.nvl(x: datafusion.expr.Expr, y: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns x if x is not NULL. Otherwise returns y.

datafusion.functions.octet_length(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the number of bytes of a string.

datafusion.functions.order_by(expr: datafusion.expr.Expr, ascending: bool = True, nulls_first: bool = True) → datafusion.expr.SortExpr¶: Creates a new sort expression.

datafusion.functions.overlay(string: datafusion.expr.Expr, substring: datafusion.expr.Expr, start: datafusion.expr.Expr, length: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Replace a substring with a new substring.

Replace the substring of string that starts at the start’th character and extends for length characters with new substring.

datafusion.functions.percent_rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a percent_rank window function.

This window function is similar to rank() except that the returned values are the percentage from 0.0 to 1.0 from first to last. Here is an example of a dataframe with a window ordered by descending points and the associated percent rank:

+--------+--------------+
| points | percent_rank |
+--------+--------------+
| 100    | 0.0          |
| 100    | 0.0          |
| 50     | 0.666667     |
| 25     | 1.0          |
+--------+--------------+

Parameters:

partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.pi() → datafusion.expr.Expr¶: Returns an approximate value of π.

datafusion.functions.pow(base: datafusion.expr.Expr, exponent: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns base raised to the power of exponent.

This is an alias of power().

datafusion.functions.power(base: datafusion.expr.Expr, exponent: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns base raised to the power of exponent.

datafusion.functions.radians(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts the argument from degrees to radians.

datafusion.functions.random() → datafusion.expr.Expr¶: Returns a random value in the range 0.0 <= x < 1.0.

datafusion.functions.range(start: datafusion.expr.Expr, stop: datafusion.expr.Expr, step: datafusion.expr.Expr) → datafusion.expr.Expr¶: Create a list of values in the range between start and stop.

datafusion.functions.rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a rank window function.

Returns the rank based upon the window order. Consecutive equal values will receive the same rank, but the next different value will not be consecutive but rather the number of rows that preceed it plus one. This is similar to Olympic medals. If two people tie for gold, the next place is bronze. There would be no silver medal. Here is an example of a dataframe with a window ordered by descending points and the associated rank.

You should set order_by to produce meaningful results:

+--------+------+
| points | rank |
+--------+------+
| 100    | 1    |
| 100    | 1    |
| 50     | 3    |
| 25     | 4    |
+--------+------+

Parameters:

partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.regexp_count(string: datafusion.expr.Expr, pattern: datafusion.expr.Expr, start: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Returns the number of matches in a string.

Optional start position (the first position is 1) to search for the regular expression.

datafusion.functions.regexp_like(string: datafusion.expr.Expr, regex: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Find if any regular expression (regex) matches exist.

Tests a string using a regular expression returning true if at least one match, false otherwise.

datafusion.functions.regexp_match(string: datafusion.expr.Expr, regex: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Perform regular expression (regex) matching.

Returns an array with each element containing the leftmost-first match of the corresponding index in regex to string in string.

datafusion.functions.regexp_replace(string: datafusion.expr.Expr, pattern: datafusion.expr.Expr, replacement: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Replaces substring(s) matching a PCRE-like regular expression.

The full list of supported features and syntax can be found at <https://docs.rs/regex/latest/regex/#syntax>

Supported flags with the addition of ‘g’ can be found at <https://docs.rs/regex/latest/regex/#grouping-and-flags>

datafusion.functions.regr_avgx(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the average of the independent variable x.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_avgy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the average of the dependent variable y.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_count(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Counts the number of rows in which both expressions are not null.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_intercept(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the intercept from the linear regression.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_r2(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the R-squared value from linear regression.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_slope(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the slope from linear regression.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_sxx(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sum of squares of the independent variable x.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_sxy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sum of products of pairs of numbers.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_syy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sum of squares of the dependent variable y.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.repeat(string: datafusion.expr.Expr, n: datafusion.expr.Expr) → datafusion.expr.Expr¶: Repeats the string to n times.

datafusion.functions.replace(string: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) → datafusion.expr.Expr¶: Replaces all occurrences of from_val with to_val in the string.

datafusion.functions.reverse(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Reverse the string argument.

datafusion.functions.right(string: datafusion.expr.Expr, n: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the last n characters in the string.

datafusion.functions.round(value: datafusion.expr.Expr, decimal_places: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Round the argument to the nearest integer.

If the optional decimal_places is specified, round to the nearest number of decimal places. You can specify a negative number of decimal places. For example round(lit(125.2345), lit(-2)) would yield a value of 100.0.

datafusion.functions.row_number(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Create a row number window function.

Returns the row number of the window function.

Here is an example of the row_number on a simple DataFrame:

+--------+------------+
| points | row number |
+--------+------------+
| 100    | 1          |
| 100    | 2          |
| 50     | 3          |
| 25     | 4          |
+--------+------------+

Parameters:

partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.

datafusion.functions.rpad(string: datafusion.expr.Expr, count: datafusion.expr.Expr, characters: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Add right padding to a string.

Extends the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated.

datafusion.functions.rtrim(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Removes all characters, spaces by default, from the end of a string.

datafusion.functions.sha224(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Computes the SHA-224 hash of a binary string.

datafusion.functions.sha256(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Computes the SHA-256 hash of a binary string.

datafusion.functions.sha384(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Computes the SHA-384 hash of a binary string.

datafusion.functions.sha512(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Computes the SHA-512 hash of a binary string.

datafusion.functions.signum(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the sign of the argument (-1, 0, +1).

datafusion.functions.sin(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the sine of the argument.

datafusion.functions.sinh(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the hyperbolic sine of the argument.

datafusion.functions.split_part(string: datafusion.expr.Expr, delimiter: datafusion.expr.Expr, index: datafusion.expr.Expr) → datafusion.expr.Expr¶

Split a string and return one part.

Splits a string based on a delimiter and picks out the desired field based on the index.

datafusion.functions.sqrt(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the square root of the argument.

datafusion.functions.starts_with(string: datafusion.expr.Expr, prefix: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns true if string starts with prefix.

datafusion.functions.stddev(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the standard deviation of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – The value to find the minimum of
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.stddev_pop(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the population standard deviation of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – The value to find the minimum of
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.stddev_samp(arg: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sample standard deviation of the argument.

This is an alias for stddev().

datafusion.functions.string_agg(expression: datafusion.expr.Expr, delimiter: str, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) → datafusion.expr.Expr¶

Concatenates the input strings.

This aggregate function will concatenate input strings, ignoring null values, and seperating them with the specified delimiter. Non-string values will be converted to their string equivalents.

If using the builder functions described in ref:_aggregation this function ignores the options distinct and null_treatment.

Parameters:

expression – Argument to perform bitwise calculation on
delimiter – Text to place between each value of expression
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate

datafusion.functions.strpos(string: datafusion.expr.Expr, substring: datafusion.expr.Expr) → datafusion.expr.Expr¶: Finds the position from where the substring matches the string.

datafusion.functions.struct(*args: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns a struct with the given arguments.

datafusion.functions.substr(string: datafusion.expr.Expr, position: datafusion.expr.Expr) → datafusion.expr.Expr¶: Substring from the position to the end.

datafusion.functions.substr_index(string: datafusion.expr.Expr, delimiter: datafusion.expr.Expr, count: datafusion.expr.Expr) → datafusion.expr.Expr¶

Returns an indexed substring.

The return will be the string from before count occurrences of delimiter.

datafusion.functions.substring(string: datafusion.expr.Expr, position: datafusion.expr.Expr, length: datafusion.expr.Expr) → datafusion.expr.Expr¶: Substring from the position with length characters.

datafusion.functions.sum(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sum of a set of numbers.

This aggregate function expects a numeric expression.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – Values to combine into an array
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.tan(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the tangent of the argument.

datafusion.functions.tanh(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Returns the hyperbolic tangent of the argument.

datafusion.functions.to_hex(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts an integer to a hexadecimal string.

datafusion.functions.to_timestamp(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts a string and optional formats to a Timestamp in nanoseconds.

For usage of formatters see the rust chrono package strftime package.

[Documentation here.](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)

datafusion.functions.to_timestamp_micros(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts a string and optional formats to a Timestamp in microseconds.

See to_timestamp() for a description on how to use formatters.

datafusion.functions.to_timestamp_millis(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts a string and optional formats to a Timestamp in milliseconds.

See to_timestamp() for a description on how to use formatters.

datafusion.functions.to_timestamp_nanos(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts a string and optional formats to a Timestamp in nanoseconds.

See to_timestamp() for a description on how to use formatters.

datafusion.functions.to_timestamp_seconds(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) → datafusion.expr.Expr¶

Converts a string and optional formats to a Timestamp in seconds.

See to_timestamp() for a description on how to use formatters.

datafusion.functions.to_unixtime(string: datafusion.expr.Expr, *format_arguments: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts a string and optional formats to a Unixtime.

datafusion.functions.translate(string: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) → datafusion.expr.Expr¶: Replaces the characters in from_val with the counterpart in to_val.

datafusion.functions.trim(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Removes all characters, spaces by default, from both sides of a string.

datafusion.functions.trunc(num: datafusion.expr.Expr, precision: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶: Truncate the number toward zero with optional precision.

datafusion.functions.upper(arg: datafusion.expr.Expr) → datafusion.expr.Expr¶: Converts a string to uppercase.

datafusion.functions.uuid() → datafusion.expr.Expr¶: Returns uuid v4 as a string value.

datafusion.functions.var(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sample variance of the argument.

This is an alias for var_samp().

datafusion.functions.var_pop(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the population variance of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – The variable to compute the variance for
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.var_samp(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sample variance of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:

expression – The variable to compute the variance for
filter – If provided, only compute against rows for which the filter is True

datafusion.functions.var_sample(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) → datafusion.expr.Expr¶

Computes the sample variance of the argument.

This is an alias for var_samp().

datafusion.functions.when(when: datafusion.expr.Expr, then: datafusion.expr.Expr) → datafusion.expr.CaseBuilder¶

Create a case expression that has no base expression.

Create a CaseBuilder to match cases for the expression expr. See CaseBuilder for detailed usage.

datafusion.functions.window(name: str, args: list[datafusion.expr.Expr], partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, window_frame: datafusion.expr.WindowFrame | None = None, ctx: datafusion.context.SessionContext | None = None) → datafusion.expr.Expr¶

Creates a new Window function expression.

This interface will soon be deprecated. Instead of using this interface, users should call the window functions directly. For example, to perform a lag use:

df.select(functions.lag(col("a")).partition_by(col("b")).build())

datafusion.expr

datafusion.html_formatter