datafusion.functions

User functions for operating on Expr.

Functions

abs(→ datafusion.expr.Expr)

Return the absolute value of a given number.

acos(→ datafusion.expr.Expr)

Returns the arc cosine or inverse cosine of a number.

acosh(→ datafusion.expr.Expr)

Returns inverse hyperbolic cosine.

alias(→ datafusion.expr.Expr)

Creates an alias expression.

approx_distinct(→ datafusion.expr.Expr)

Returns the approximate number of distinct values.

approx_median(→ datafusion.expr.Expr)

Returns the approximate median value.

approx_percentile_cont(→ datafusion.expr.Expr)

Returns the value that is approximately at a given percentile of expr.

approx_percentile_cont_with_weight(→ datafusion.expr.Expr)

Returns the value of the weighted approximate percentile.

array(→ datafusion.expr.Expr)

Returns an array using the specified input expressions.

array_agg(→ datafusion.expr.Expr)

Aggregate values into an array.

array_append(→ datafusion.expr.Expr)

Appends an element to the end of an array.

array_cat(→ datafusion.expr.Expr)

Concatenates the input arrays.

array_concat(→ datafusion.expr.Expr)

Concatenates the input arrays.

array_dims(→ datafusion.expr.Expr)

Returns an array of the array's dimensions.

array_distinct(→ datafusion.expr.Expr)

Returns distinct values from the array after removing duplicates.

array_element(→ datafusion.expr.Expr)

Extracts the element with the index n from the array.

array_except(→ datafusion.expr.Expr)

Returns the elements that appear in array1 but not in array2.

array_extract(→ datafusion.expr.Expr)

Extracts the element with the index n from the array.

array_has(→ datafusion.expr.Expr)

Returns true if the element appears in the first array, otherwise false.

array_has_all(→ datafusion.expr.Expr)

Determines if there is complete overlap second_array in first_array.

array_has_any(→ datafusion.expr.Expr)

Determine if there is an overlap between first_array and second_array.

array_indexof(→ datafusion.expr.Expr)

Return the position of the first occurrence of element in array.

array_intersect(→ datafusion.expr.Expr)

Returns the intersection of array1 and array2.

array_join(→ datafusion.expr.Expr)

Converts each element to its text representation.

array_length(→ datafusion.expr.Expr)

Returns the length of the array.

array_ndims(→ datafusion.expr.Expr)

Returns the number of dimensions of the array.

array_pop_back(→ datafusion.expr.Expr)

Returns the array without the last element.

array_pop_front(→ datafusion.expr.Expr)

Returns the array without the first element.

array_position(→ datafusion.expr.Expr)

Return the position of the first occurrence of element in array.

array_positions(→ datafusion.expr.Expr)

Searches for an element in the array and returns all occurrences.

array_prepend(→ datafusion.expr.Expr)

Prepends an element to the beginning of an array.

array_push_back(→ datafusion.expr.Expr)

Appends an element to the end of an array.

array_push_front(→ datafusion.expr.Expr)

Prepends an element to the beginning of an array.

array_remove(→ datafusion.expr.Expr)

Removes the first element from the array equal to the given value.

array_remove_all(→ datafusion.expr.Expr)

Removes all elements from the array equal to the given value.

array_remove_n(→ datafusion.expr.Expr)

Removes the first max elements from the array equal to the given value.

array_repeat(→ datafusion.expr.Expr)

Returns an array containing element count times.

array_replace(→ datafusion.expr.Expr)

Replaces the first occurrence of from_val with to_val.

array_replace_all(→ datafusion.expr.Expr)

Replaces all occurrences of from_val with to_val.

array_replace_n(→ datafusion.expr.Expr)

Replace n occurrences of from_val with to_val.

array_resize(→ datafusion.expr.Expr)

Returns an array with the specified size filled.

array_slice(→ datafusion.expr.Expr)

Returns a slice of the array.

array_sort(→ datafusion.expr.Expr)

Sort an array.

array_to_string(→ datafusion.expr.Expr)

Converts each element to its text representation.

array_union(→ datafusion.expr.Expr)

Returns an array of the elements in the union of array1 and array2.

arrow_typeof(→ datafusion.expr.Expr)

Returns the Arrow type of the expression.

ascii(→ datafusion.expr.Expr)

Returns the numeric code of the first character of the argument.

asin(→ datafusion.expr.Expr)

Returns the arc sine or inverse sine of a number.

asinh(→ datafusion.expr.Expr)

Returns inverse hyperbolic sine.

atan(→ datafusion.expr.Expr)

Returns inverse tangent of a number.

atan2(→ datafusion.expr.Expr)

Returns inverse tangent of a division given in the argument.

atanh(→ datafusion.expr.Expr)

Returns inverse hyperbolic tangent.

avg(→ datafusion.expr.Expr)

Returns the average value.

bit_and(→ datafusion.expr.Expr)

Computes the bitwise AND of the argument.

bit_length(→ datafusion.expr.Expr)

Returns the number of bits in the string argument.

bit_or(→ datafusion.expr.Expr)

Computes the bitwise OR of the argument.

bit_xor(→ datafusion.expr.Expr)

Computes the bitwise XOR of the argument.

bool_and(→ datafusion.expr.Expr)

Computes the boolean AND of the argument.

bool_or(→ datafusion.expr.Expr)

Computes the boolean OR of the argument.

btrim(→ datafusion.expr.Expr)

Removes all characters, spaces by default, from both sides of a string.

case(→ datafusion.expr.CaseBuilder)

Create a case expression.

cbrt(→ datafusion.expr.Expr)

Returns the cube root of a number.

ceil(→ datafusion.expr.Expr)

Returns the nearest integer greater than or equal to argument.

char_length(→ datafusion.expr.Expr)

The number of characters in the string.

character_length(→ datafusion.expr.Expr)

Returns the number of characters in the argument.

chr(→ datafusion.expr.Expr)

Converts the Unicode code point to a UTF8 character.

coalesce(→ datafusion.expr.Expr)

Returns the value of the first expr in args which is not NULL.

col(→ datafusion.expr.Expr)

Creates a column reference expression.

concat(→ datafusion.expr.Expr)

Concatenates the text representations of all the arguments.

concat_ws(→ datafusion.expr.Expr)

Concatenates the list args with the separator.

corr(→ datafusion.expr.Expr)

Returns the correlation coefficient between value1 and value2.

cos(→ datafusion.expr.Expr)

Returns the cosine of the argument.

cosh(→ datafusion.expr.Expr)

Returns the hyperbolic cosine of the argument.

cot(→ datafusion.expr.Expr)

Returns the cotangent of the argument.

count(→ datafusion.expr.Expr)

Returns the number of rows that match the given arguments.

count_star(→ datafusion.expr.Expr)

Create a COUNT(1) aggregate expression.

covar(→ datafusion.expr.Expr)

Computes the sample covariance.

covar_pop(→ datafusion.expr.Expr)

Computes the population covariance.

covar_samp(→ datafusion.expr.Expr)

Computes the sample covariance.

cume_dist(→ datafusion.expr.Expr)

Create a cumulative distribution window function.

current_date(→ datafusion.expr.Expr)

Returns current UTC date as a Date32 value.

current_time(→ datafusion.expr.Expr)

Returns current UTC time as a Time64 value.

date_bin(→ datafusion.expr.Expr)

Coerces an arbitrary timestamp to the start of the nearest specified interval.

date_part(→ datafusion.expr.Expr)

Extracts a subfield from the date.

date_trunc(→ datafusion.expr.Expr)

Truncates the date to a specified level of precision.

datepart(→ datafusion.expr.Expr)

Return a specified part of a date.

datetrunc(→ datafusion.expr.Expr)

Truncates the date to a specified level of precision.

decode(→ datafusion.expr.Expr)

Decode the input, using the encoding. encoding can be base64 or hex.

degrees(→ datafusion.expr.Expr)

Converts the argument from radians to degrees.

dense_rank(→ datafusion.expr.Expr)

Create a dense_rank window function.

digest(→ datafusion.expr.Expr)

Computes the binary hash of an expression using the specified algorithm.

encode(→ datafusion.expr.Expr)

Encode the input, using the encoding. encoding can be base64 or hex.

ends_with(→ datafusion.expr.Expr)

Returns true if the string ends with the suffix, false otherwise.

exp(→ datafusion.expr.Expr)

Returns the exponential of the argument.

factorial(→ datafusion.expr.Expr)

Returns the factorial of the argument.

find_in_set(→ datafusion.expr.Expr)

Find a string in a list of strings.

first_value(→ datafusion.expr.Expr)

Returns the first value in a group of values.

flatten(→ datafusion.expr.Expr)

Flattens an array of arrays into a single array.

floor(→ datafusion.expr.Expr)

Returns the nearest integer less than or equal to the argument.

from_unixtime(→ datafusion.expr.Expr)

Converts an integer to RFC3339 timestamp format string.

gcd(→ datafusion.expr.Expr)

Returns the greatest common divisor.

in_list(→ datafusion.expr.Expr)

Returns whether the argument is contained within the list values.

initcap(→ datafusion.expr.Expr)

Set the initial letter of each word to capital.

isnan(→ datafusion.expr.Expr)

Returns true if a given number is +NaN or -NaN otherwise returns false.

iszero(→ datafusion.expr.Expr)

Returns true if a given number is +0.0 or -0.0 otherwise returns false.

lag(→ datafusion.expr.Expr)

Create a lag window function.

last_value(→ datafusion.expr.Expr)

Returns the last value in a group of values.

lcm(→ datafusion.expr.Expr)

Returns the least common multiple.

lead(→ datafusion.expr.Expr)

Create a lead window function.

left(→ datafusion.expr.Expr)

Returns the first n characters in the string.

length(→ datafusion.expr.Expr)

The number of characters in the string.

levenshtein(→ datafusion.expr.Expr)

Returns the Levenshtein distance between the two given strings.

list_append(→ datafusion.expr.Expr)

Appends an element to the end of an array.

list_dims(→ datafusion.expr.Expr)

Returns an array of the array's dimensions.

list_distinct(→ datafusion.expr.Expr)

Returns distinct values from the array after removing duplicates.

list_element(→ datafusion.expr.Expr)

Extracts the element with the index n from the array.

list_except(→ datafusion.expr.Expr)

Returns the elements that appear in array1 but not in the array2.

list_extract(→ datafusion.expr.Expr)

Extracts the element with the index n from the array.

list_indexof(→ datafusion.expr.Expr)

Return the position of the first occurrence of element in array.

list_intersect(→ datafusion.expr.Expr)

Returns an the intersection of array1 and array2.

list_join(→ datafusion.expr.Expr)

Converts each element to its text representation.

list_length(→ datafusion.expr.Expr)

Returns the length of the array.

list_ndims(→ datafusion.expr.Expr)

Returns the number of dimensions of the array.

list_position(→ datafusion.expr.Expr)

Return the position of the first occurrence of element in array.

list_positions(→ datafusion.expr.Expr)

Searches for an element in the array and returns all occurrences.

list_prepend(→ datafusion.expr.Expr)

Prepends an element to the beginning of an array.

list_push_back(→ datafusion.expr.Expr)

Appends an element to the end of an array.

list_push_front(→ datafusion.expr.Expr)

Prepends an element to the beginning of an array.

list_remove(→ datafusion.expr.Expr)

Removes the first element from the array equal to the given value.

list_remove_all(→ datafusion.expr.Expr)

Removes all elements from the array equal to the given value.

list_remove_n(→ datafusion.expr.Expr)

Removes the first max elements from the array equal to the given value.

list_replace(→ datafusion.expr.Expr)

Replaces the first occurrence of from_val with to_val.

list_replace_all(→ datafusion.expr.Expr)

Replaces all occurrences of from_val with to_val.

list_replace_n(→ datafusion.expr.Expr)

Replace n occurrences of from_val with to_val.

list_resize(→ datafusion.expr.Expr)

Returns an array with the specified size filled.

list_slice(→ datafusion.expr.Expr)

Returns a slice of the array.

list_sort(→ datafusion.expr.Expr)

This is an alias for array_sort().

list_to_string(→ datafusion.expr.Expr)

Converts each element to its text representation.

list_union(→ datafusion.expr.Expr)

Returns an array of the elements in the union of array1 and array2.

ln(→ datafusion.expr.Expr)

Returns the natural logarithm (base e) of the argument.

log(→ datafusion.expr.Expr)

Returns the logarithm of a number for a particular base.

log10(→ datafusion.expr.Expr)

Base 10 logarithm of the argument.

log2(→ datafusion.expr.Expr)

Base 2 logarithm of the argument.

lower(→ datafusion.expr.Expr)

Converts a string to lowercase.

lpad(→ datafusion.expr.Expr)

Add left padding to a string.

ltrim(→ datafusion.expr.Expr)

Removes all characters, spaces by default, from the beginning of a string.

make_array(→ datafusion.expr.Expr)

Returns an array using the specified input expressions.

make_date(→ datafusion.expr.Expr)

Make a date from year, month and day component parts.

max(→ datafusion.expr.Expr)

Aggregate function that returns the maximum value of the argument.

md5(→ datafusion.expr.Expr)

Computes an MD5 128-bit checksum for a string expression.

mean(→ datafusion.expr.Expr)

Returns the average (mean) value of the argument.

median(→ datafusion.expr.Expr)

Computes the median of a set of numbers.

min(→ datafusion.expr.Expr)

Returns the minimum value of the argument.

named_struct(→ datafusion.expr.Expr)

Returns a struct with the given names and arguments pairs.

nanvl(→ datafusion.expr.Expr)

Returns x if x is not NaN. Otherwise returns y.

now(→ datafusion.expr.Expr)

Returns the current timestamp in nanoseconds.

nth_value(→ datafusion.expr.Expr)

Returns the n-th value in a group of values.

ntile(→ datafusion.expr.Expr)

Create a n-tile window function.

nullif(→ datafusion.expr.Expr)

Returns NULL if expr1 equals expr2; otherwise it returns expr1.

octet_length(→ datafusion.expr.Expr)

Returns the number of bytes of a string.

order_by(→ datafusion.expr.SortExpr)

Creates a new sort expression.

overlay(→ datafusion.expr.Expr)

Replace a substring with a new substring.

percent_rank(→ datafusion.expr.Expr)

Create a percent_rank window function.

pi(→ datafusion.expr.Expr)

Returns an approximate value of π.

pow(→ datafusion.expr.Expr)

Returns base raised to the power of exponent.

power(→ datafusion.expr.Expr)

Returns base raised to the power of exponent.

radians(→ datafusion.expr.Expr)

Converts the argument from degrees to radians.

random(→ datafusion.expr.Expr)

Returns a random value in the range 0.0 <= x < 1.0.

range(→ datafusion.expr.Expr)

Create a list of values in the range between start and stop.

rank(→ datafusion.expr.Expr)

Create a rank window function.

regexp_like(→ datafusion.expr.Expr)

Find if any regular expression (regex) matches exist.

regexp_match(→ datafusion.expr.Expr)

Perform regular expression (regex) matching.

regexp_replace(→ datafusion.expr.Expr)

Replaces substring(s) matching a PCRE-like regular expression.

regr_avgx(→ datafusion.expr.Expr)

Computes the average of the independent variable x.

regr_avgy(→ datafusion.expr.Expr)

Computes the average of the dependent variable y.

regr_count(→ datafusion.expr.Expr)

Counts the number of rows in which both expressions are not null.

regr_intercept(→ datafusion.expr.Expr)

Computes the intercept from the linear regression.

regr_r2(→ datafusion.expr.Expr)

Computes the R-squared value from linear regression.

regr_slope(→ datafusion.expr.Expr)

Computes the slope from linear regression.

regr_sxx(→ datafusion.expr.Expr)

Computes the sum of squares of the independent variable x.

regr_sxy(→ datafusion.expr.Expr)

Computes the sum of products of pairs of numbers.

regr_syy(→ datafusion.expr.Expr)

Computes the sum of squares of the dependent variable y.

repeat(→ datafusion.expr.Expr)

Repeats the string to n times.

replace(→ datafusion.expr.Expr)

Replaces all occurrences of from_val with to_val in the string.

reverse(→ datafusion.expr.Expr)

Reverse the string argument.

right(→ datafusion.expr.Expr)

Returns the last n characters in the string.

round() → datafusion.expr.Expr)

Round the argument to the nearest integer.

row_number(→ datafusion.expr.Expr)

Create a row number window function.

rpad(→ datafusion.expr.Expr)

Add right padding to a string.

rtrim(→ datafusion.expr.Expr)

Removes all characters, spaces by default, from the end of a string.

sha224(→ datafusion.expr.Expr)

Computes the SHA-224 hash of a binary string.

sha256(→ datafusion.expr.Expr)

Computes the SHA-256 hash of a binary string.

sha384(→ datafusion.expr.Expr)

Computes the SHA-384 hash of a binary string.

sha512(→ datafusion.expr.Expr)

Computes the SHA-512 hash of a binary string.

signum(→ datafusion.expr.Expr)

Returns the sign of the argument (-1, 0, +1).

sin(→ datafusion.expr.Expr)

Returns the sine of the argument.

sinh(→ datafusion.expr.Expr)

Returns the hyperbolic sine of the argument.

split_part(→ datafusion.expr.Expr)

Split a string and return one part.

sqrt(→ datafusion.expr.Expr)

Returns the square root of the argument.

starts_with(→ datafusion.expr.Expr)

Returns true if string starts with prefix.

stddev(→ datafusion.expr.Expr)

Computes the standard deviation of the argument.

stddev_pop(→ datafusion.expr.Expr)

Computes the population standard deviation of the argument.

stddev_samp(→ datafusion.expr.Expr)

Computes the sample standard deviation of the argument.

string_agg(→ datafusion.expr.Expr)

Concatenates the input strings.

strpos(→ datafusion.expr.Expr)

Finds the position from where the substring matches the string.

struct(→ datafusion.expr.Expr)

Returns a struct with the given arguments.

substr(→ datafusion.expr.Expr)

Substring from the position to the end.

substr_index(→ datafusion.expr.Expr)

Returns an indexed substring.

substring(→ datafusion.expr.Expr)

Substring from the position with length characters.

sum(→ datafusion.expr.Expr)

Computes the sum of a set of numbers.

tan(→ datafusion.expr.Expr)

Returns the tangent of the argument.

tanh(→ datafusion.expr.Expr)

Returns the hyperbolic tangent of the argument.

to_hex(→ datafusion.expr.Expr)

Converts an integer to a hexadecimal string.

to_timestamp(→ datafusion.expr.Expr)

Converts a string and optional formats to a Timestamp in nanoseconds.

to_timestamp_micros(→ datafusion.expr.Expr)

Converts a string and optional formats to a Timestamp in microseconds.

to_timestamp_millis(→ datafusion.expr.Expr)

Converts a string and optional formats to a Timestamp in milliseconds.

to_timestamp_seconds(→ datafusion.expr.Expr)

Converts a string and optional formats to a Timestamp in seconds.

to_unixtime(→ datafusion.expr.Expr)

Converts a string and optional formats to a Unixtime.

translate(→ datafusion.expr.Expr)

Replaces the characters in from_val with the counterpart in to_val.

trim(→ datafusion.expr.Expr)

Removes all characters, spaces by default, from both sides of a string.

trunc(→ datafusion.expr.Expr)

Truncate the number toward zero with optional precision.

upper(→ datafusion.expr.Expr)

Converts a string to uppercase.

uuid(→ datafusion.expr.Expr)

Returns uuid v4 as a string value.

var(→ datafusion.expr.Expr)

Computes the sample variance of the argument.

var_pop(→ datafusion.expr.Expr)

Computes the population variance of the argument.

var_samp(→ datafusion.expr.Expr)

Computes the sample variance of the argument.

var_sample(→ datafusion.expr.Expr)

Computes the sample variance of the argument.

when(→ datafusion.expr.CaseBuilder)

Create a case expression that has no base expression.

window(→ datafusion.expr.Expr)

Creates a new Window function expression.

Module Contents

datafusion.functions.abs(arg: datafusion.expr.Expr) datafusion.expr.Expr

Return the absolute value of a given number.

Returns:

Expr

A new expression representing the absolute value of the input expression.

datafusion.functions.acos(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the arc cosine or inverse cosine of a number.

Returns:

Expr

A new expression representing the arc cosine of the input expression.

datafusion.functions.acosh(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns inverse hyperbolic cosine.

datafusion.functions.alias(expr: datafusion.expr.Expr, name: str) datafusion.expr.Expr

Creates an alias expression.

datafusion.functions.approx_distinct(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the approximate number of distinct values.

This aggregate function is similar to count() with distinct set, but it will approximate the number of distinct entries. It may return significantly faster than count() for some DataFrames.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Values to check for distinct entries

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.approx_median(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the approximate median value.

This aggregate function is similar to median(), but it will only approximate the median. It may return significantly faster for some DataFrames.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment, and distinct.

Parameters:
  • expression – Values to find the median for

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.approx_percentile_cont(expression: datafusion.expr.Expr, percentile: float, num_centroids: int | None = None, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the value that is approximately at a given percentile of expr.

This aggregate function assumes the input values form a continuous distribution. Suppose you have a DataFrame which consists of 100 different test scores. If you called this function with a percentile of 0.9, it would return the value of the test score that is above 90% of the other test scores. The returned value may be between two of the values.

This function uses the [t-digest](https://arxiv.org/abs/1902.04023) algorithm to compute the percentil. You can limit the number of bins used in this algorithm by setting the num_centroids parameter.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Values for which to find the approximate percentile

  • percentile – This must be between 0.0 and 1.0, inclusive

  • num_centroids – Max bin size for the t-digest algorithm

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.approx_percentile_cont_with_weight(expression: datafusion.expr.Expr, weight: datafusion.expr.Expr, percentile: float, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the value of the weighted approximate percentile.

This aggregate function is similar to approx_percentile_cont() except that it uses the associated associated weights.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Values for which to find the approximate percentile

  • weight – Relative weight for each of the values in expression

  • percentile – This must be between 0.0 and 1.0, inclusive

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.array(*args: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array using the specified input expressions.

This is an alias for make_array().

datafusion.functions.array_agg(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Aggregate values into an array.

Currently distinct and order_by cannot be used together. As a work around, consider array_sort() after aggregation. [Issue Tracker](https://github.com/apache/datafusion/issues/12371)

If using the builder functions described in ref:_aggregation this function ignores the option null_treatment.

Parameters:
  • expression – Values to combine into an array

  • distinct – If True, a single entry for each distinct value will be in the result

  • filter – If provided, only compute against rows for which the filter is True

  • order_by – Order the resultant array values

datafusion.functions.array_append(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Appends an element to the end of an array.

datafusion.functions.array_cat(*args: datafusion.expr.Expr) datafusion.expr.Expr

Concatenates the input arrays.

This is an alias for array_concat().

datafusion.functions.array_concat(*args: datafusion.expr.Expr) datafusion.expr.Expr

Concatenates the input arrays.

datafusion.functions.array_dims(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array of the array’s dimensions.

datafusion.functions.array_distinct(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns distinct values from the array after removing duplicates.

datafusion.functions.array_element(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr

Extracts the element with the index n from the array.

datafusion.functions.array_except(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr

Returns the elements that appear in array1 but not in array2.

datafusion.functions.array_extract(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr

Extracts the element with the index n from the array.

This is an alias for array_element().

datafusion.functions.array_has(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) datafusion.expr.Expr

Returns true if the element appears in the first array, otherwise false.

datafusion.functions.array_has_all(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) datafusion.expr.Expr

Determines if there is complete overlap second_array in first_array.

Returns true if each element of the second array appears in the first array. Otherwise, it returns false.

datafusion.functions.array_has_any(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) datafusion.expr.Expr

Determine if there is an overlap between first_array and second_array.

Returns true if at least one element of the second array appears in the first array. Otherwise, it returns false.

datafusion.functions.array_indexof(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr

Return the position of the first occurrence of element in array.

This is an alias for array_position().

datafusion.functions.array_intersect(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr

Returns the intersection of array1 and array2.

datafusion.functions.array_join(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr

Converts each element to its text representation.

This is an alias for array_to_string().

datafusion.functions.array_length(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns the length of the array.

datafusion.functions.array_ndims(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns the number of dimensions of the array.

datafusion.functions.array_pop_back(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns the array without the last element.

datafusion.functions.array_pop_front(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns the array without the first element.

datafusion.functions.array_position(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr

Return the position of the first occurrence of element in array.

datafusion.functions.array_positions(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Searches for an element in the array and returns all occurrences.

datafusion.functions.array_prepend(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr

Prepends an element to the beginning of an array.

datafusion.functions.array_push_back(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Appends an element to the end of an array.

This is an alias for array_append().

datafusion.functions.array_push_front(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr

Prepends an element to the beginning of an array.

This is an alias for array_prepend().

datafusion.functions.array_remove(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Removes the first element from the array equal to the given value.

datafusion.functions.array_remove_all(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Removes all elements from the array equal to the given value.

datafusion.functions.array_remove_n(array: datafusion.expr.Expr, element: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr

Removes the first max elements from the array equal to the given value.

datafusion.functions.array_repeat(element: datafusion.expr.Expr, count: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array containing element count times.

datafusion.functions.array_replace(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr

Replaces the first occurrence of from_val with to_val.

datafusion.functions.array_replace_all(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr

Replaces all occurrences of from_val with to_val.

datafusion.functions.array_replace_n(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr

Replace n occurrences of from_val with to_val.

Replaces the first max occurrences of the specified element with another specified element.

datafusion.functions.array_resize(array: datafusion.expr.Expr, size: datafusion.expr.Expr, value: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array with the specified size filled.

If size is greater than the array length, the additional entries will be filled with the given value.

datafusion.functions.array_slice(array: datafusion.expr.Expr, begin: datafusion.expr.Expr, end: datafusion.expr.Expr, stride: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns a slice of the array.

datafusion.functions.array_sort(array: datafusion.expr.Expr, descending: bool = False, null_first: bool = False) datafusion.expr.Expr

Sort an array.

Parameters:
  • array – The input array to sort.

  • descending – If True, sorts in descending order.

  • null_first – If True, nulls will be returned at the beginning of the array.

datafusion.functions.array_to_string(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr

Converts each element to its text representation.

datafusion.functions.array_union(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array of the elements in the union of array1 and array2.

Duplicate rows will not be returned.

datafusion.functions.arrow_typeof(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the Arrow type of the expression.

datafusion.functions.ascii(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the numeric code of the first character of the argument.

datafusion.functions.asin(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the arc sine or inverse sine of a number.

datafusion.functions.asinh(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns inverse hyperbolic sine.

datafusion.functions.atan(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns inverse tangent of a number.

datafusion.functions.atan2(y: datafusion.expr.Expr, x: datafusion.expr.Expr) datafusion.expr.Expr

Returns inverse tangent of a division given in the argument.

datafusion.functions.atanh(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns inverse hyperbolic tangent.

datafusion.functions.avg(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the average value.

This aggregate function expects a numeric expression and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Values to combine into an array

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bit_and(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the bitwise AND of the argument.

This aggregate function will bitwise compare every value in the input partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Argument to perform bitwise calculation on

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bit_length(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the number of bits in the string argument.

datafusion.functions.bit_or(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the bitwise OR of the argument.

This aggregate function will bitwise compare every value in the input partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Argument to perform bitwise calculation on

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bit_xor(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the bitwise XOR of the argument.

This aggregate function will bitwise compare every value in the input partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment.

Parameters:
  • expression – Argument to perform bitwise calculation on

  • distinct – If True, evaluate each unique value of expression only once

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bool_and(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the boolean AND of the argument.

This aggregate function will compare every value in the input partition. These are expected to be boolean values.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Argument to perform calculation on

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.bool_or(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the boolean OR of the argument.

This aggregate function will compare every value in the input partition. These are expected to be boolean values.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Argument to perform calculation on

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.btrim(arg: datafusion.expr.Expr) datafusion.expr.Expr

Removes all characters, spaces by default, from both sides of a string.

datafusion.functions.case(expr: datafusion.expr.Expr) datafusion.expr.CaseBuilder

Create a case expression.

Create a CaseBuilder to match cases for the expression expr. See CaseBuilder for detailed usage.

datafusion.functions.cbrt(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the cube root of a number.

datafusion.functions.ceil(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the nearest integer greater than or equal to argument.

datafusion.functions.char_length(string: datafusion.expr.Expr) datafusion.expr.Expr

The number of characters in the string.

datafusion.functions.character_length(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the number of characters in the argument.

datafusion.functions.chr(arg: datafusion.expr.Expr) datafusion.expr.Expr

Converts the Unicode code point to a UTF8 character.

datafusion.functions.coalesce(*args: datafusion.expr.Expr) datafusion.expr.Expr

Returns the value of the first expr in args which is not NULL.

datafusion.functions.col(name: str) datafusion.expr.Expr

Creates a column reference expression.

datafusion.functions.concat(*args: datafusion.expr.Expr) datafusion.expr.Expr

Concatenates the text representations of all the arguments.

NULL arguments are ignored.

datafusion.functions.concat_ws(separator: str, *args: datafusion.expr.Expr) datafusion.expr.Expr

Concatenates the list args with the separator.

NULL arguments are ignored. separator should not be NULL.

datafusion.functions.corr(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the correlation coefficient between value1 and value2.

This aggregate function expects both values to be numeric and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • value_y – The dependent variable for correlation

  • value_x – The independent variable for correlation

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.cos(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the cosine of the argument.

datafusion.functions.cosh(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the hyperbolic cosine of the argument.

datafusion.functions.cot(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the cotangent of the argument.

datafusion.functions.count(expressions: datafusion.expr.Expr | list[datafusion.expr.Expr] | None = None, distinct: bool = False, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the number of rows that match the given arguments.

This aggregate function will count the non-null rows provided in the expression.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment.

Parameters:
  • expressions – Argument to perform bitwise calculation on

  • distinct – If True, a single entry for each distinct value will be in the result

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.count_star(filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Create a COUNT(1) aggregate expression.

This aggregate function will count all of the rows in the partition.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, distinct, and null_treatment.

Parameters:

filter – If provided, only count rows for which the filter is True

datafusion.functions.covar(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sample covariance.

This is an alias for covar_samp().

datafusion.functions.covar_pop(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the population covariance.

This aggregate function expects both values to be numeric and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • value_y – The dependent variable for covariance

  • value_x – The independent variable for covariance

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.covar_samp(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sample covariance.

This aggregate function expects both values to be numeric and will return a float.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • value_y – The dependent variable for covariance

  • value_x – The independent variable for covariance

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.cume_dist(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a cumulative distribution window function.

This window function is similar to rank() except that the returned values are the ratio of the row number to the total numebr of rows. Here is an example of a dataframe with a window ordered by descending points and the associated cumulative distribution:

+--------+-----------+
| points | cume_dist |
+--------+-----------+
| 100    | 0.5       |
| 100    | 0.5       |
| 50     | 0.75      |
| 25     | 1.0       |
+--------+-----------+
Parameters:
  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.current_date() datafusion.expr.Expr

Returns current UTC date as a Date32 value.

datafusion.functions.current_time() datafusion.expr.Expr

Returns current UTC time as a Time64 value.

datafusion.functions.date_bin(stride: datafusion.expr.Expr, source: datafusion.expr.Expr, origin: datafusion.expr.Expr) datafusion.expr.Expr

Coerces an arbitrary timestamp to the start of the nearest specified interval.

datafusion.functions.date_part(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr

Extracts a subfield from the date.

datafusion.functions.date_trunc(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr

Truncates the date to a specified level of precision.

datafusion.functions.datepart(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr

Return a specified part of a date.

This is an alias for date_part().

datafusion.functions.datetrunc(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr

Truncates the date to a specified level of precision.

This is an alias for date_trunc().

datafusion.functions.decode(input: datafusion.expr.Expr, encoding: datafusion.expr.Expr) datafusion.expr.Expr

Decode the input, using the encoding. encoding can be base64 or hex.

datafusion.functions.degrees(arg: datafusion.expr.Expr) datafusion.expr.Expr

Converts the argument from radians to degrees.

datafusion.functions.dense_rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a dense_rank window function.

This window function is similar to rank() except that the returned values will be consecutive. Here is an example of a dataframe with a window ordered by descending points and the associated dense rank:

+--------+------------+
| points | dense_rank |
+--------+------------+
| 100    | 1          |
| 100    | 1          |
| 50     | 2          |
| 25     | 3          |
+--------+------------+
Parameters:
  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.digest(value: datafusion.expr.Expr, method: datafusion.expr.Expr) datafusion.expr.Expr

Computes the binary hash of an expression using the specified algorithm.

Standard algorithms are md5, sha224, sha256, sha384, sha512, blake2s, blake2b, and blake3.

datafusion.functions.encode(input: datafusion.expr.Expr, encoding: datafusion.expr.Expr) datafusion.expr.Expr

Encode the input, using the encoding. encoding can be base64 or hex.

datafusion.functions.ends_with(arg: datafusion.expr.Expr, suffix: datafusion.expr.Expr) datafusion.expr.Expr

Returns true if the string ends with the suffix, false otherwise.

datafusion.functions.exp(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the exponential of the argument.

datafusion.functions.factorial(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the factorial of the argument.

datafusion.functions.find_in_set(string: datafusion.expr.Expr, string_list: datafusion.expr.Expr) datafusion.expr.Expr

Find a string in a list of strings.

Returns a value in the range of 1 to N if the string is in the string list string_list consisting of N substrings.

The string list is a string composed of substrings separated by , characters.

datafusion.functions.first_value(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) datafusion.expr.Expr

Returns the first value in a group of values.

This aggregate function will return the first value in the partition.

If using the builder functions described in ref:_aggregation this function ignores the option distinct.

Parameters:
  • expression – Argument to perform bitwise calculation on

  • filter – If provided, only compute against rows for which the filter is True

  • order_by – Set the ordering of the expression to evaluate

  • null_treatment – Assign whether to respect or ignull null values.

datafusion.functions.flatten(array: datafusion.expr.Expr) datafusion.expr.Expr

Flattens an array of arrays into a single array.

datafusion.functions.floor(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the nearest integer less than or equal to the argument.

datafusion.functions.from_unixtime(arg: datafusion.expr.Expr) datafusion.expr.Expr

Converts an integer to RFC3339 timestamp format string.

datafusion.functions.gcd(x: datafusion.expr.Expr, y: datafusion.expr.Expr) datafusion.expr.Expr

Returns the greatest common divisor.

datafusion.functions.in_list(arg: datafusion.expr.Expr, values: list[datafusion.expr.Expr], negated: bool = False) datafusion.expr.Expr

Returns whether the argument is contained within the list values.

datafusion.functions.initcap(string: datafusion.expr.Expr) datafusion.expr.Expr

Set the initial letter of each word to capital.

Converts the first letter of each word in string to uppercase and the remaining characters to lowercase.

datafusion.functions.isnan(expr: datafusion.expr.Expr) datafusion.expr.Expr

Returns true if a given number is +NaN or -NaN otherwise returns false.

datafusion.functions.iszero(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns true if a given number is +0.0 or -0.0 otherwise returns false.

datafusion.functions.lag(arg: datafusion.expr.Expr, shift_offset: int = 1, default_value: Any | None = None, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a lag window function.

Lag operation will return the argument that is in the previous shift_offset-th row in the partition. For example lag(col("b"), shift_offset=3, default_value=5) will return the 3rd previous value in column b. At the beginnig of the partition, where no values can be returned it will return the default value of 5.

Here is an example of both the lag and datafusion.functions.lead() functions on a simple DataFrame:

+--------+------+-----+
| points | lead | lag |
+--------+------+-----+
| 100    | 100  |     |
| 100    | 50   | 100 |
| 50     | 25   | 100 |
| 25     |      | 50  |
+--------+------+-----+
Parameters:
  • arg – Value to return

  • shift_offset – Number of rows before the current row.

  • default_value – Value to return if shift_offet row does not exist.

  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.last_value(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) datafusion.expr.Expr

Returns the last value in a group of values.

This aggregate function will return the last value in the partition.

If using the builder functions described in ref:_aggregation this function ignores the option distinct.

Parameters:
  • expression – Argument to perform bitwise calculation on

  • filter – If provided, only compute against rows for which the filter is True

  • order_by – Set the ordering of the expression to evaluate

  • null_treatment – Assign whether to respect or ignull null values.

datafusion.functions.lcm(x: datafusion.expr.Expr, y: datafusion.expr.Expr) datafusion.expr.Expr

Returns the least common multiple.

datafusion.functions.lead(arg: datafusion.expr.Expr, shift_offset: int = 1, default_value: Any | None = None, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a lead window function.

Lead operation will return the argument that is in the next shift_offset-th row in the partition. For example lead(col("b"), shift_offset=3, default_value=5) will return the 3rd following value in column b. At the end of the partition, where no futher values can be returned it will return the default value of 5.

Here is an example of both the lead and datafusion.functions.lag() functions on a simple DataFrame:

+--------+------+-----+
| points | lead | lag |
+--------+------+-----+
| 100    | 100  |     |
| 100    | 50   | 100 |
| 50     | 25   | 100 |
| 25     |      | 50  |
+--------+------+-----+

To set window function parameters use the window builder approach described in the ref:_window_functions online documentation.

Parameters:
  • arg – Value to return

  • shift_offset – Number of rows following the current row.

  • default_value – Value to return if shift_offet row does not exist.

  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.left(string: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr

Returns the first n characters in the string.

datafusion.functions.length(string: datafusion.expr.Expr) datafusion.expr.Expr

The number of characters in the string.

datafusion.functions.levenshtein(string1: datafusion.expr.Expr, string2: datafusion.expr.Expr) datafusion.expr.Expr

Returns the Levenshtein distance between the two given strings.

datafusion.functions.list_append(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Appends an element to the end of an array.

This is an alias for array_append().

datafusion.functions.list_dims(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array of the array’s dimensions.

This is an alias for array_dims().

datafusion.functions.list_distinct(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns distinct values from the array after removing duplicates.

This is an alias for array_distinct().

datafusion.functions.list_element(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr

Extracts the element with the index n from the array.

This is an alias for array_element().

datafusion.functions.list_except(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr

Returns the elements that appear in array1 but not in the array2.

This is an alias for array_except().

datafusion.functions.list_extract(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr

Extracts the element with the index n from the array.

This is an alias for array_element().

datafusion.functions.list_indexof(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr

Return the position of the first occurrence of element in array.

This is an alias for array_position().

datafusion.functions.list_intersect(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr

Returns an the intersection of array1 and array2.

This is an alias for array_intersect().

datafusion.functions.list_join(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr

Converts each element to its text representation.

This is an alias for array_to_string().

datafusion.functions.list_length(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns the length of the array.

This is an alias for array_length().

datafusion.functions.list_ndims(array: datafusion.expr.Expr) datafusion.expr.Expr

Returns the number of dimensions of the array.

This is an alias for array_ndims().

datafusion.functions.list_position(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr

Return the position of the first occurrence of element in array.

This is an alias for array_position().

datafusion.functions.list_positions(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Searches for an element in the array and returns all occurrences.

This is an alias for array_positions().

datafusion.functions.list_prepend(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr

Prepends an element to the beginning of an array.

This is an alias for array_prepend().

datafusion.functions.list_push_back(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Appends an element to the end of an array.

This is an alias for array_append().

datafusion.functions.list_push_front(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr

Prepends an element to the beginning of an array.

This is an alias for array_prepend().

datafusion.functions.list_remove(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Removes the first element from the array equal to the given value.

This is an alias for array_remove().

datafusion.functions.list_remove_all(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr

Removes all elements from the array equal to the given value.

This is an alias for array_remove_all().

datafusion.functions.list_remove_n(array: datafusion.expr.Expr, element: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr

Removes the first max elements from the array equal to the given value.

This is an alias for array_remove_n().

datafusion.functions.list_replace(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr

Replaces the first occurrence of from_val with to_val.

This is an alias for array_replace().

datafusion.functions.list_replace_all(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr

Replaces all occurrences of from_val with to_val.

This is an alias for array_replace_all().

datafusion.functions.list_replace_n(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr

Replace n occurrences of from_val with to_val.

Replaces the first max occurrences of the specified element with another specified element.

This is an alias for array_replace_n().

datafusion.functions.list_resize(array: datafusion.expr.Expr, size: datafusion.expr.Expr, value: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array with the specified size filled.

If size is greater than the array length, the additional entries will be filled with the given value. This is an alias for array_resize().

datafusion.functions.list_slice(array: datafusion.expr.Expr, begin: datafusion.expr.Expr, end: datafusion.expr.Expr, stride: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns a slice of the array.

This is an alias for array_slice().

datafusion.functions.list_sort(array: datafusion.expr.Expr, descending: bool = False, null_first: bool = False) datafusion.expr.Expr

This is an alias for array_sort().

datafusion.functions.list_to_string(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr

Converts each element to its text representation.

This is an alias for array_to_string().

datafusion.functions.list_union(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array of the elements in the union of array1 and array2.

Duplicate rows will not be returned.

This is an alias for array_union().

datafusion.functions.ln(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the natural logarithm (base e) of the argument.

datafusion.functions.log(base: datafusion.expr.Expr, num: datafusion.expr.Expr) datafusion.expr.Expr

Returns the logarithm of a number for a particular base.

datafusion.functions.log10(arg: datafusion.expr.Expr) datafusion.expr.Expr

Base 10 logarithm of the argument.

datafusion.functions.log2(arg: datafusion.expr.Expr) datafusion.expr.Expr

Base 2 logarithm of the argument.

datafusion.functions.lower(arg: datafusion.expr.Expr) datafusion.expr.Expr

Converts a string to lowercase.

datafusion.functions.lpad(string: datafusion.expr.Expr, count: datafusion.expr.Expr, characters: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Add left padding to a string.

Extends the string to length length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right).

datafusion.functions.ltrim(arg: datafusion.expr.Expr) datafusion.expr.Expr

Removes all characters, spaces by default, from the beginning of a string.

datafusion.functions.make_array(*args: datafusion.expr.Expr) datafusion.expr.Expr

Returns an array using the specified input expressions.

datafusion.functions.make_date(year: datafusion.expr.Expr, month: datafusion.expr.Expr, day: datafusion.expr.Expr) datafusion.expr.Expr

Make a date from year, month and day component parts.

datafusion.functions.max(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Aggregate function that returns the maximum value of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – The value to find the maximum of

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.md5(arg: datafusion.expr.Expr) datafusion.expr.Expr

Computes an MD5 128-bit checksum for a string expression.

datafusion.functions.mean(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the average (mean) value of the argument.

This is an alias for avg().

datafusion.functions.median(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the median of a set of numbers.

This aggregate function returns the median value of the expression for the given aggregate function.

If using the builder functions described in ref:_aggregation this function ignores the options order_by and null_treatment.

Parameters:
  • expression – The value to compute the median of

  • distinct – If True, a single entry for each distinct value will be in the result

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.min(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Returns the minimum value of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – The value to find the minimum of

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.named_struct(name_pairs: list[tuple[str, datafusion.expr.Expr]]) datafusion.expr.Expr

Returns a struct with the given names and arguments pairs.

datafusion.functions.nanvl(x: datafusion.expr.Expr, y: datafusion.expr.Expr) datafusion.expr.Expr

Returns x if x is not NaN. Otherwise returns y.

datafusion.functions.now() datafusion.expr.Expr

Returns the current timestamp in nanoseconds.

This will use the same value for all instances of now() in same statement.

datafusion.functions.nth_value(expression: datafusion.expr.Expr, n: int, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) datafusion.expr.Expr

Returns the n-th value in a group of values.

This aggregate function will return the n-th value in the partition.

If using the builder functions described in ref:_aggregation this function ignores the option distinct.

Parameters:
  • expression – Argument to perform bitwise calculation on

  • n – Index of value to return. Starts at 1.

  • filter – If provided, only compute against rows for which the filter is True

  • order_by – Set the ordering of the expression to evaluate

  • null_treatment – Assign whether to respect or ignull null values.

datafusion.functions.ntile(groups: int, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a n-tile window function.

This window function orders the window frame into a give number of groups based on the ordering criteria. It then returns which group the current row is assigned to. Here is an example of a dataframe with a window ordered by descending points and the associated n-tile function:

+--------+-------+
| points | ntile |
+--------+-------+
| 120    | 1     |
| 100    | 1     |
| 80     | 2     |
| 60     | 2     |
| 40     | 3     |
| 20     | 3     |
+--------+-------+
Parameters:
  • groups – Number of groups for the n-tile to be divided into.

  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.nullif(expr1: datafusion.expr.Expr, expr2: datafusion.expr.Expr) datafusion.expr.Expr

Returns NULL if expr1 equals expr2; otherwise it returns expr1.

This can be used to perform the inverse operation of the COALESCE expression.

datafusion.functions.octet_length(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the number of bytes of a string.

datafusion.functions.order_by(expr: datafusion.expr.Expr, ascending: bool = True, nulls_first: bool = True) datafusion.expr.SortExpr

Creates a new sort expression.

datafusion.functions.overlay(string: datafusion.expr.Expr, substring: datafusion.expr.Expr, start: datafusion.expr.Expr, length: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Replace a substring with a new substring.

Replace the substring of string that starts at the start’th character and extends for length characters with new substring.

datafusion.functions.percent_rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a percent_rank window function.

This window function is similar to rank() except that the returned values are the percentage from 0.0 to 1.0 from first to last. Here is an example of a dataframe with a window ordered by descending points and the associated percent rank:

+--------+--------------+
| points | percent_rank |
+--------+--------------+
| 100    | 0.0          |
| 100    | 0.0          |
| 50     | 0.666667     |
| 25     | 1.0          |
+--------+--------------+
Parameters:
  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.pi() datafusion.expr.Expr

Returns an approximate value of π.

datafusion.functions.pow(base: datafusion.expr.Expr, exponent: datafusion.expr.Expr) datafusion.expr.Expr

Returns base raised to the power of exponent.

This is an alias of power().

datafusion.functions.power(base: datafusion.expr.Expr, exponent: datafusion.expr.Expr) datafusion.expr.Expr

Returns base raised to the power of exponent.

datafusion.functions.radians(arg: datafusion.expr.Expr) datafusion.expr.Expr

Converts the argument from degrees to radians.

datafusion.functions.random() datafusion.expr.Expr

Returns a random value in the range 0.0 <= x < 1.0.

datafusion.functions.range(start: datafusion.expr.Expr, stop: datafusion.expr.Expr, step: datafusion.expr.Expr) datafusion.expr.Expr

Create a list of values in the range between start and stop.

datafusion.functions.rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a rank window function.

Returns the rank based upon the window order. Consecutive equal values will receive the same rank, but the next different value will not be consecutive but rather the number of rows that preceed it plus one. This is similar to Olympic medals. If two people tie for gold, the next place is bronze. There would be no silver medal. Here is an example of a dataframe with a window ordered by descending points and the associated rank.

You should set order_by to produce meaningful results:

+--------+------+
| points | rank |
+--------+------+
| 100    | 1    |
| 100    | 1    |
| 50     | 3    |
| 25     | 4    |
+--------+------+
Parameters:
  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.regexp_like(string: datafusion.expr.Expr, regex: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Find if any regular expression (regex) matches exist.

Tests a string using a regular expression returning true if at least one match, false otherwise.

datafusion.functions.regexp_match(string: datafusion.expr.Expr, regex: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Perform regular expression (regex) matching.

Returns an array with each element containing the leftmost-first match of the corresponding index in regex to string in string.

datafusion.functions.regexp_replace(string: datafusion.expr.Expr, pattern: datafusion.expr.Expr, replacement: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Replaces substring(s) matching a PCRE-like regular expression.

The full list of supported features and syntax can be found at <https://docs.rs/regex/latest/regex/#syntax>

Supported flags with the addition of ‘g’ can be found at <https://docs.rs/regex/latest/regex/#grouping-and-flags>

datafusion.functions.regr_avgx(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the average of the independent variable x.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_avgy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the average of the dependent variable y.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_count(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Counts the number of rows in which both expressions are not null.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_intercept(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the intercept from the linear regression.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_r2(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the R-squared value from linear regression.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_slope(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the slope from linear regression.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_sxx(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sum of squares of the independent variable x.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_sxy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sum of products of pairs of numbers.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.regr_syy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sum of squares of the dependent variable y.

This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • y – The linear regression dependent variable

  • x – The linear regression independent variable

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.repeat(string: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr

Repeats the string to n times.

datafusion.functions.replace(string: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr

Replaces all occurrences of from_val with to_val in the string.

datafusion.functions.reverse(arg: datafusion.expr.Expr) datafusion.expr.Expr

Reverse the string argument.

datafusion.functions.right(string: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr

Returns the last n characters in the string.

datafusion.functions.round(value: datafusion.expr.Expr, decimal_places: datafusion.expr.Expr = Expr.literal(0)) datafusion.expr.Expr

Round the argument to the nearest integer.

If the optional decimal_places is specified, round to the nearest number of decimal places. You can specify a negative number of decimal places. For example round(lit(125.2345), lit(-2)) would yield a value of 100.0.

datafusion.functions.row_number(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Create a row number window function.

Returns the row number of the window function.

Here is an example of the row_number on a simple DataFrame:

+--------+------------+
| points | row number |
+--------+------------+
| 100    | 1          |
| 100    | 2          |
| 50     | 3          |
| 25     | 4          |
+--------+------------+
Parameters:
  • partition_by – Expressions to partition the window frame on.

  • order_by – Set ordering within the window frame.

datafusion.functions.rpad(string: datafusion.expr.Expr, count: datafusion.expr.Expr, characters: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Add right padding to a string.

Extends the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated.

datafusion.functions.rtrim(arg: datafusion.expr.Expr) datafusion.expr.Expr

Removes all characters, spaces by default, from the end of a string.

datafusion.functions.sha224(arg: datafusion.expr.Expr) datafusion.expr.Expr

Computes the SHA-224 hash of a binary string.

datafusion.functions.sha256(arg: datafusion.expr.Expr) datafusion.expr.Expr

Computes the SHA-256 hash of a binary string.

datafusion.functions.sha384(arg: datafusion.expr.Expr) datafusion.expr.Expr

Computes the SHA-384 hash of a binary string.

datafusion.functions.sha512(arg: datafusion.expr.Expr) datafusion.expr.Expr

Computes the SHA-512 hash of a binary string.

datafusion.functions.signum(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the sign of the argument (-1, 0, +1).

datafusion.functions.sin(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the sine of the argument.

datafusion.functions.sinh(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the hyperbolic sine of the argument.

datafusion.functions.split_part(string: datafusion.expr.Expr, delimiter: datafusion.expr.Expr, index: datafusion.expr.Expr) datafusion.expr.Expr

Split a string and return one part.

Splits a string based on a delimiter and picks out the desired field based on the index.

datafusion.functions.sqrt(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the square root of the argument.

datafusion.functions.starts_with(string: datafusion.expr.Expr, prefix: datafusion.expr.Expr) datafusion.expr.Expr

Returns true if string starts with prefix.

datafusion.functions.stddev(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the standard deviation of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – The value to find the minimum of

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.stddev_pop(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the population standard deviation of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – The value to find the minimum of

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.stddev_samp(arg: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sample standard deviation of the argument.

This is an alias for stddev().

datafusion.functions.string_agg(expression: datafusion.expr.Expr, delimiter: str, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None) datafusion.expr.Expr

Concatenates the input strings.

This aggregate function will concatenate input strings, ignoring null values, and seperating them with the specified delimiter. Non-string values will be converted to their string equivalents.

If using the builder functions described in ref:_aggregation this function ignores the options distinct and null_treatment.

Parameters:
  • expression – Argument to perform bitwise calculation on

  • delimiter – Text to place between each value of expression

  • filter – If provided, only compute against rows for which the filter is True

  • order_by – Set the ordering of the expression to evaluate

datafusion.functions.strpos(string: datafusion.expr.Expr, substring: datafusion.expr.Expr) datafusion.expr.Expr

Finds the position from where the substring matches the string.

datafusion.functions.struct(*args: datafusion.expr.Expr) datafusion.expr.Expr

Returns a struct with the given arguments.

datafusion.functions.substr(string: datafusion.expr.Expr, position: datafusion.expr.Expr) datafusion.expr.Expr

Substring from the position to the end.

datafusion.functions.substr_index(string: datafusion.expr.Expr, delimiter: datafusion.expr.Expr, count: datafusion.expr.Expr) datafusion.expr.Expr

Returns an indexed substring.

The return will be the string from before count occurrences of delimiter.

datafusion.functions.substring(string: datafusion.expr.Expr, position: datafusion.expr.Expr, length: datafusion.expr.Expr) datafusion.expr.Expr

Substring from the position with length characters.

datafusion.functions.sum(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sum of a set of numbers.

This aggregate function expects a numeric expression.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – Values to combine into an array

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.tan(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the tangent of the argument.

datafusion.functions.tanh(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns the hyperbolic tangent of the argument.

datafusion.functions.to_hex(arg: datafusion.expr.Expr) datafusion.expr.Expr

Converts an integer to a hexadecimal string.

datafusion.functions.to_timestamp(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr

Converts a string and optional formats to a Timestamp in nanoseconds.

For usage of formatters see the rust chrono package strftime package.

[Documentation here.](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)

datafusion.functions.to_timestamp_micros(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr

Converts a string and optional formats to a Timestamp in microseconds.

See to_timestamp() for a description on how to use formatters.

datafusion.functions.to_timestamp_millis(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr

Converts a string and optional formats to a Timestamp in milliseconds.

See to_timestamp() for a description on how to use formatters.

datafusion.functions.to_timestamp_seconds(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr

Converts a string and optional formats to a Timestamp in seconds.

See to_timestamp() for a description on how to use formatters.

datafusion.functions.to_unixtime(string: datafusion.expr.Expr, *format_arguments: datafusion.expr.Expr) datafusion.expr.Expr

Converts a string and optional formats to a Unixtime.

datafusion.functions.translate(string: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr

Replaces the characters in from_val with the counterpart in to_val.

datafusion.functions.trim(arg: datafusion.expr.Expr) datafusion.expr.Expr

Removes all characters, spaces by default, from both sides of a string.

datafusion.functions.trunc(num: datafusion.expr.Expr, precision: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Truncate the number toward zero with optional precision.

datafusion.functions.upper(arg: datafusion.expr.Expr) datafusion.expr.Expr

Converts a string to uppercase.

datafusion.functions.uuid(arg: datafusion.expr.Expr) datafusion.expr.Expr

Returns uuid v4 as a string value.

datafusion.functions.var(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sample variance of the argument.

This is an alias for var_samp().

datafusion.functions.var_pop(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the population variance of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – The variable to compute the variance for

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.var_samp(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sample variance of the argument.

If using the builder functions described in ref:_aggregation this function ignores the options order_by, null_treatment, and distinct.

Parameters:
  • expression – The variable to compute the variance for

  • filter – If provided, only compute against rows for which the filter is True

datafusion.functions.var_sample(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr

Computes the sample variance of the argument.

This is an alias for var_samp().

datafusion.functions.when(when: datafusion.expr.Expr, then: datafusion.expr.Expr) datafusion.expr.CaseBuilder

Create a case expression that has no base expression.

Create a CaseBuilder to match cases for the expression expr. See CaseBuilder for detailed usage.

datafusion.functions.window(name: str, args: list[datafusion.expr.Expr], partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr | datafusion.expr.SortExpr] | None = None, window_frame: datafusion.expr.WindowFrame | None = None, ctx: datafusion.context.SessionContext | None = None) datafusion.expr.Expr

Creates a new Window function expression.

This interface will soon be deprecated. Instead of using this interface, users should call the window functions directly. For example, to perform a lag use:

df.select(functions.lag(col("a")).partition_by(col("b")).build())