datafusion.functions¶
User functions for operating on Expr
.
Functions¶
|
Return the absolute value of a given number. |
|
Returns the arc cosine or inverse cosine of a number. |
|
Returns inverse hyperbolic cosine. |
|
Creates an alias expression. |
|
Returns the approximate number of distinct values. |
|
Returns the approximate median value. |
|
Returns the value that is approximately at a given percentile of |
|
Returns the value of the weighted approximate percentile. |
|
Returns an array using the specified input expressions. |
|
Aggregate values into an array. |
|
Appends an element to the end of an array. |
|
Concatenates the input arrays. |
|
Concatenates the input arrays. |
|
Returns an array of the array's dimensions. |
|
Returns distinct values from the array after removing duplicates. |
|
Extracts the element with the index n from the array. |
|
Returns the elements that appear in |
|
Extracts the element with the index n from the array. |
|
Returns true if the element appears in the first array, otherwise false. |
|
Determines if there is complete overlap |
|
Determine if there is an overlap between |
|
Return the position of the first occurrence of |
|
Returns the intersection of |
|
Converts each element to its text representation. |
|
Returns the length of the array. |
|
Returns the number of dimensions of the array. |
|
Returns the array without the last element. |
|
Returns the array without the first element. |
|
Return the position of the first occurrence of |
|
Searches for an element in the array and returns all occurrences. |
|
Prepends an element to the beginning of an array. |
|
Appends an element to the end of an array. |
|
Prepends an element to the beginning of an array. |
|
Removes the first element from the array equal to the given value. |
|
Removes all elements from the array equal to the given value. |
|
Removes the first |
|
Returns an array containing |
|
Replaces the first occurrence of |
|
Replaces all occurrences of |
|
Replace |
|
Returns an array with the specified size filled. |
|
Returns a slice of the array. |
|
Sort an array. |
|
Converts each element to its text representation. |
|
Returns an array of the elements in the union of array1 and array2. |
|
Returns the Arrow type of the expression. |
|
Returns the numeric code of the first character of the argument. |
|
Returns the arc sine or inverse sine of a number. |
|
Returns inverse hyperbolic sine. |
|
Returns inverse tangent of a number. |
|
Returns inverse tangent of a division given in the argument. |
|
Returns inverse hyperbolic tangent. |
|
Returns the average value. |
|
Computes the bitwise AND of the argument. |
|
Returns the number of bits in the string argument. |
|
Computes the bitwise OR of the argument. |
|
Computes the bitwise XOR of the argument. |
|
Computes the boolean AND of the argument. |
|
Computes the boolean OR of the argument. |
|
Removes all characters, spaces by default, from both sides of a string. |
|
Create a case expression. |
|
Returns the cube root of a number. |
|
Returns the nearest integer greater than or equal to argument. |
|
The number of characters in the |
|
Returns the number of characters in the argument. |
|
Converts the Unicode code point to a UTF8 character. |
|
Returns the value of the first expr in |
|
Creates a column reference expression. |
|
Concatenates the text representations of all the arguments. |
|
Concatenates the list |
|
Returns the correlation coefficient between |
|
Returns the cosine of the argument. |
|
Returns the hyperbolic cosine of the argument. |
|
Returns the cotangent of the argument. |
|
Returns the number of rows that match the given arguments. |
|
Create a COUNT(1) aggregate expression. |
|
Computes the sample covariance. |
|
Computes the population covariance. |
|
Computes the sample covariance. |
|
Create a cumulative distribution window function. |
|
Returns current UTC date as a Date32 value. |
|
Returns current UTC time as a Time64 value. |
|
Coerces an arbitrary timestamp to the start of the nearest specified interval. |
|
Extracts a subfield from the date. |
|
Truncates the date to a specified level of precision. |
|
Return a specified part of a date. |
|
Truncates the date to a specified level of precision. |
|
Decode the |
|
Converts the argument from radians to degrees. |
|
Create a dense_rank window function. |
|
Computes the binary hash of an expression using the specified algorithm. |
|
Encode the |
|
Returns true if the |
|
Returns the exponential of the argument. |
|
Returns the factorial of the argument. |
|
Find a string in a list of strings. |
|
Returns the first value in a group of values. |
|
Flattens an array of arrays into a single array. |
|
Returns the nearest integer less than or equal to the argument. |
|
Converts an integer to RFC3339 timestamp format string. |
|
Returns the greatest common divisor. |
|
Returns whether the argument is contained within the list |
|
Set the initial letter of each word to capital. |
|
Returns true if a given number is +NaN or -NaN otherwise returns false. |
|
Returns true if a given number is +0.0 or -0.0 otherwise returns false. |
|
Create a lag window function. |
|
Returns the last value in a group of values. |
|
Returns the least common multiple. |
|
Create a lead window function. |
|
Returns the first |
|
The number of characters in the |
|
Returns the Levenshtein distance between the two given strings. |
|
Appends an element to the end of an array. |
|
Returns an array of the array's dimensions. |
|
Returns distinct values from the array after removing duplicates. |
|
Extracts the element with the index n from the array. |
|
Returns the elements that appear in |
|
Extracts the element with the index n from the array. |
|
Return the position of the first occurrence of |
|
Returns an the intersection of |
|
Converts each element to its text representation. |
|
Returns the length of the array. |
|
Returns the number of dimensions of the array. |
|
Return the position of the first occurrence of |
|
Searches for an element in the array and returns all occurrences. |
|
Prepends an element to the beginning of an array. |
|
Appends an element to the end of an array. |
|
Prepends an element to the beginning of an array. |
|
Removes the first element from the array equal to the given value. |
|
Removes all elements from the array equal to the given value. |
|
Removes the first |
|
Replaces the first occurrence of |
|
Replaces all occurrences of |
|
Replace |
|
Returns an array with the specified size filled. |
|
Returns a slice of the array. |
|
This is an alias for |
|
Converts each element to its text representation. |
|
Returns an array of the elements in the union of array1 and array2. |
|
Returns the natural logarithm (base e) of the argument. |
|
Returns the logarithm of a number for a particular |
|
Base 10 logarithm of the argument. |
|
Base 2 logarithm of the argument. |
|
Converts a string to lowercase. |
|
Add left padding to a string. |
|
Removes all characters, spaces by default, from the beginning of a string. |
|
Returns an array using the specified input expressions. |
|
Make a date from year, month and day component parts. |
|
Aggregate function that returns the maximum value of the argument. |
|
Computes an MD5 128-bit checksum for a string expression. |
|
Returns the average (mean) value of the argument. |
|
Computes the median of a set of numbers. |
|
Returns the minimum value of the argument. |
|
Returns a struct with the given names and arguments pairs. |
|
Returns |
|
Returns the current timestamp in nanoseconds. |
|
Returns the n-th value in a group of values. |
|
Create a n-tile window function. |
|
Returns NULL if expr1 equals expr2; otherwise it returns expr1. |
|
Returns the number of bytes of a string. |
|
Creates a new sort expression. |
|
Replace a substring with a new substring. |
|
Create a percent_rank window function. |
|
Returns an approximate value of π. |
|
Returns |
|
Returns |
|
Converts the argument from degrees to radians. |
|
Returns a random value in the range |
|
Create a list of values in the range between start and stop. |
|
Create a rank window function. |
|
Find if any regular expression (regex) matches exist. |
|
Perform regular expression (regex) matching. |
|
Replaces substring(s) matching a PCRE-like regular expression. |
|
Computes the average of the independent variable |
|
Computes the average of the dependent variable |
|
Counts the number of rows in which both expressions are not null. |
|
Computes the intercept from the linear regression. |
|
Computes the R-squared value from linear regression. |
|
Computes the slope from linear regression. |
|
Computes the sum of squares of the independent variable |
|
Computes the sum of products of pairs of numbers. |
|
Computes the sum of squares of the dependent variable |
|
Repeats the |
|
Replaces all occurrences of |
|
Reverse the string argument. |
|
Returns the last |
|
Round the argument to the nearest integer. |
|
Create a row number window function. |
|
Add right padding to a string. |
|
Removes all characters, spaces by default, from the end of a string. |
|
Computes the SHA-224 hash of a binary string. |
|
Computes the SHA-256 hash of a binary string. |
|
Computes the SHA-384 hash of a binary string. |
|
Computes the SHA-512 hash of a binary string. |
|
Returns the sign of the argument (-1, 0, +1). |
|
Returns the sine of the argument. |
|
Returns the hyperbolic sine of the argument. |
|
Split a string and return one part. |
|
Returns the square root of the argument. |
|
Returns true if string starts with prefix. |
|
Computes the standard deviation of the argument. |
|
Computes the population standard deviation of the argument. |
|
Computes the sample standard deviation of the argument. |
|
Concatenates the input strings. |
|
Finds the position from where the |
|
Returns a struct with the given arguments. |
|
Substring from the |
|
Returns an indexed substring. |
|
Substring from the |
|
Computes the sum of a set of numbers. |
|
Returns the tangent of the argument. |
|
Returns the hyperbolic tangent of the argument. |
|
Converts an integer to a hexadecimal string. |
|
Converts a string and optional formats to a |
|
Converts a string and optional formats to a |
|
Converts a string and optional formats to a |
|
Converts a string and optional formats to a |
|
Converts a string and optional formats to a Unixtime. |
|
Replaces the characters in |
|
Removes all characters, spaces by default, from both sides of a string. |
|
Truncate the number toward zero with optional precision. |
|
Converts a string to uppercase. |
|
Returns uuid v4 as a string value. |
|
Computes the sample variance of the argument. |
|
Computes the population variance of the argument. |
|
Computes the sample variance of the argument. |
|
Computes the sample variance of the argument. |
|
Create a case expression that has no base expression. |
|
Creates a new Window function expression. |
Module Contents¶
- datafusion.functions.abs(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Return the absolute value of a given number.
Returns:¶
- Expr
A new expression representing the absolute value of the input expression.
- datafusion.functions.acos(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the arc cosine or inverse cosine of a number.
Returns:¶
- Expr
A new expression representing the arc cosine of the input expression.
- datafusion.functions.acosh(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns inverse hyperbolic cosine.
- datafusion.functions.alias(expr: datafusion.expr.Expr, name: str) datafusion.expr.Expr ¶
Creates an alias expression.
- datafusion.functions.approx_distinct(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the approximate number of distinct values.
This aggregate function is similar to
count()
with distinct set, but it will approximate the number of distinct entries. It may return significantly faster thancount()
for some DataFrames.If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Values to check for distinct entries
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.approx_median(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the approximate median value.
This aggregate function is similar to
median()
, but it will only approximate the median. It may return significantly faster for some DataFrames.If using the builder functions described in ref:_aggregation this function ignores the options
order_by
andnull_treatment
, anddistinct
.- Parameters:
expression – Values to find the median for
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.approx_percentile_cont(expression: datafusion.expr.Expr, percentile: float, num_centroids: int | None = None, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the value that is approximately at a given percentile of
expr
.This aggregate function assumes the input values form a continuous distribution. Suppose you have a DataFrame which consists of 100 different test scores. If you called this function with a percentile of 0.9, it would return the value of the test score that is above 90% of the other test scores. The returned value may be between two of the values.
This function uses the [t-digest](https://arxiv.org/abs/1902.04023) algorithm to compute the percentil. You can limit the number of bins used in this algorithm by setting the
num_centroids
parameter.If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Values for which to find the approximate percentile
percentile – This must be between 0.0 and 1.0, inclusive
num_centroids – Max bin size for the t-digest algorithm
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.approx_percentile_cont_with_weight(expression: datafusion.expr.Expr, weight: datafusion.expr.Expr, percentile: float, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the value of the weighted approximate percentile.
This aggregate function is similar to
approx_percentile_cont()
except that it uses the associated associated weights.If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Values for which to find the approximate percentile
weight – Relative weight for each of the values in
expression
percentile – This must be between 0.0 and 1.0, inclusive
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.array(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array using the specified input expressions.
This is an alias for
make_array()
.
- datafusion.functions.array_agg(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Aggregate values into an array.
Currently
distinct
andorder_by
cannot be used together. As a work around, considerarray_sort()
after aggregation. [Issue Tracker](https://github.com/apache/datafusion/issues/12371)If using the builder functions described in ref:_aggregation this function ignores the option
null_treatment
.- Parameters:
expression – Values to combine into an array
distinct – If True, a single entry for each distinct value will be in the result
filter – If provided, only compute against rows for which the filter is True
order_by – Order the resultant array values
- datafusion.functions.array_append(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Appends an element to the end of an array.
- datafusion.functions.array_cat(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Concatenates the input arrays.
This is an alias for
array_concat()
.
- datafusion.functions.array_concat(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Concatenates the input arrays.
- datafusion.functions.array_dims(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array of the array’s dimensions.
- datafusion.functions.array_distinct(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns distinct values from the array after removing duplicates.
- datafusion.functions.array_element(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr ¶
Extracts the element with the index n from the array.
- datafusion.functions.array_except(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the elements that appear in
array1
but not inarray2
.
- datafusion.functions.array_extract(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr ¶
Extracts the element with the index n from the array.
This is an alias for
array_element()
.
- datafusion.functions.array_has(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns true if the element appears in the first array, otherwise false.
- datafusion.functions.array_has_all(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Determines if there is complete overlap
second_array
infirst_array
.Returns true if each element of the second array appears in the first array. Otherwise, it returns false.
- datafusion.functions.array_has_any(first_array: datafusion.expr.Expr, second_array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Determine if there is an overlap between
first_array
andsecond_array
.Returns true if at least one element of the second array appears in the first array. Otherwise, it returns false.
- datafusion.functions.array_indexof(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr ¶
Return the position of the first occurrence of
element
inarray
.This is an alias for
array_position()
.
- datafusion.functions.array_intersect(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the intersection of
array1
andarray2
.
- datafusion.functions.array_join(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts each element to its text representation.
This is an alias for
array_to_string()
.
- datafusion.functions.array_length(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the length of the array.
- datafusion.functions.array_ndims(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the number of dimensions of the array.
- datafusion.functions.array_pop_back(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the array without the last element.
- datafusion.functions.array_pop_front(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the array without the first element.
- datafusion.functions.array_position(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr ¶
Return the position of the first occurrence of
element
inarray
.
- datafusion.functions.array_positions(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Searches for an element in the array and returns all occurrences.
- datafusion.functions.array_prepend(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Prepends an element to the beginning of an array.
- datafusion.functions.array_push_back(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Appends an element to the end of an array.
This is an alias for
array_append()
.
- datafusion.functions.array_push_front(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Prepends an element to the beginning of an array.
This is an alias for
array_prepend()
.
- datafusion.functions.array_remove(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes the first element from the array equal to the given value.
- datafusion.functions.array_remove_all(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes all elements from the array equal to the given value.
- datafusion.functions.array_remove_n(array: datafusion.expr.Expr, element: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes the first
max
elements from the array equal to the given value.
- datafusion.functions.array_repeat(element: datafusion.expr.Expr, count: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array containing
element
count
times.
- datafusion.functions.array_replace(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replaces the first occurrence of
from_val
withto_val
.
- datafusion.functions.array_replace_all(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replaces all occurrences of
from_val
withto_val
.
- datafusion.functions.array_replace_n(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replace
n
occurrences offrom_val
withto_val
.Replaces the first
max
occurrences of the specified element with another specified element.
- datafusion.functions.array_resize(array: datafusion.expr.Expr, size: datafusion.expr.Expr, value: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array with the specified size filled.
If
size
is greater than thearray
length, the additional entries will be filled with the givenvalue
.
- datafusion.functions.array_slice(array: datafusion.expr.Expr, begin: datafusion.expr.Expr, end: datafusion.expr.Expr, stride: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns a slice of the array.
- datafusion.functions.array_sort(array: datafusion.expr.Expr, descending: bool = False, null_first: bool = False) datafusion.expr.Expr ¶
Sort an array.
- Parameters:
array – The input array to sort.
descending – If True, sorts in descending order.
null_first – If True, nulls will be returned at the beginning of the array.
- datafusion.functions.array_to_string(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts each element to its text representation.
- datafusion.functions.array_union(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array of the elements in the union of array1 and array2.
Duplicate rows will not be returned.
- datafusion.functions.arrow_typeof(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the Arrow type of the expression.
- datafusion.functions.ascii(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the numeric code of the first character of the argument.
- datafusion.functions.asin(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the arc sine or inverse sine of a number.
- datafusion.functions.asinh(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns inverse hyperbolic sine.
- datafusion.functions.atan(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns inverse tangent of a number.
- datafusion.functions.atan2(y: datafusion.expr.Expr, x: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns inverse tangent of a division given in the argument.
- datafusion.functions.atanh(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns inverse hyperbolic tangent.
- datafusion.functions.avg(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the average value.
This aggregate function expects a numeric expression and will return a float.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Values to combine into an array
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.bit_and(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the bitwise AND of the argument.
This aggregate function will bitwise compare every value in the input partition.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.bit_length(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the number of bits in the string argument.
- datafusion.functions.bit_or(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the bitwise OR of the argument.
This aggregate function will bitwise compare every value in the input partition.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.bit_xor(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the bitwise XOR of the argument.
This aggregate function will bitwise compare every value in the input partition.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
andnull_treatment
.- Parameters:
expression – Argument to perform bitwise calculation on
distinct – If True, evaluate each unique value of expression only once
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.bool_and(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the boolean AND of the argument.
This aggregate function will compare every value in the input partition. These are expected to be boolean values.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Argument to perform calculation on
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.bool_or(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the boolean OR of the argument.
This aggregate function will compare every value in the input partition. These are expected to be boolean values.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Argument to perform calculation on
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.btrim(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes all characters, spaces by default, from both sides of a string.
- datafusion.functions.case(expr: datafusion.expr.Expr) datafusion.expr.CaseBuilder ¶
Create a case expression.
Create a
CaseBuilder
to match cases for the expressionexpr
. SeeCaseBuilder
for detailed usage.
- datafusion.functions.cbrt(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the cube root of a number.
- datafusion.functions.ceil(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the nearest integer greater than or equal to argument.
- datafusion.functions.char_length(string: datafusion.expr.Expr) datafusion.expr.Expr ¶
The number of characters in the
string
.
- datafusion.functions.character_length(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the number of characters in the argument.
- datafusion.functions.chr(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts the Unicode code point to a UTF8 character.
- datafusion.functions.coalesce(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the value of the first expr in
args
which is not NULL.
- datafusion.functions.col(name: str) datafusion.expr.Expr ¶
Creates a column reference expression.
- datafusion.functions.concat(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Concatenates the text representations of all the arguments.
NULL arguments are ignored.
- datafusion.functions.concat_ws(separator: str, *args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Concatenates the list
args
with the separator.NULL
arguments are ignored.separator
should not beNULL
.
- datafusion.functions.corr(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the correlation coefficient between
value1
andvalue2
.This aggregate function expects both values to be numeric and will return a float.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
value_y – The dependent variable for correlation
value_x – The independent variable for correlation
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.cos(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the cosine of the argument.
- datafusion.functions.cosh(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the hyperbolic cosine of the argument.
- datafusion.functions.cot(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the cotangent of the argument.
- datafusion.functions.count(expressions: datafusion.expr.Expr | list[datafusion.expr.Expr] | None = None, distinct: bool = False, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the number of rows that match the given arguments.
This aggregate function will count the non-null rows provided in the expression.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
andnull_treatment
.- Parameters:
expressions – Argument to perform bitwise calculation on
distinct – If True, a single entry for each distinct value will be in the result
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.count_star(filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Create a COUNT(1) aggregate expression.
This aggregate function will count all of the rows in the partition.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,distinct
, andnull_treatment
.- Parameters:
filter – If provided, only count rows for which the filter is True
- datafusion.functions.covar(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sample covariance.
This is an alias for
covar_samp()
.
- datafusion.functions.covar_pop(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the population covariance.
This aggregate function expects both values to be numeric and will return a float.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
value_y – The dependent variable for covariance
value_x – The independent variable for covariance
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.covar_samp(value_y: datafusion.expr.Expr, value_x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sample covariance.
This aggregate function expects both values to be numeric and will return a float.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
value_y – The dependent variable for covariance
value_x – The independent variable for covariance
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.cume_dist(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a cumulative distribution window function.
This window function is similar to
rank()
except that the returned values are the ratio of the row number to the total numebr of rows. Here is an example of a dataframe with a window ordered by descendingpoints
and the associated cumulative distribution:+--------+-----------+ | points | cume_dist | +--------+-----------+ | 100 | 0.5 | | 100 | 0.5 | | 50 | 0.75 | | 25 | 1.0 | +--------+-----------+
- Parameters:
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.current_date() datafusion.expr.Expr ¶
Returns current UTC date as a Date32 value.
- datafusion.functions.current_time() datafusion.expr.Expr ¶
Returns current UTC time as a Time64 value.
- datafusion.functions.date_bin(stride: datafusion.expr.Expr, source: datafusion.expr.Expr, origin: datafusion.expr.Expr) datafusion.expr.Expr ¶
Coerces an arbitrary timestamp to the start of the nearest specified interval.
- datafusion.functions.date_part(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr ¶
Extracts a subfield from the date.
- datafusion.functions.date_trunc(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr ¶
Truncates the date to a specified level of precision.
- datafusion.functions.datepart(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr ¶
Return a specified part of a date.
This is an alias for
date_part()
.
- datafusion.functions.datetrunc(part: datafusion.expr.Expr, date: datafusion.expr.Expr) datafusion.expr.Expr ¶
Truncates the date to a specified level of precision.
This is an alias for
date_trunc()
.
- datafusion.functions.decode(input: datafusion.expr.Expr, encoding: datafusion.expr.Expr) datafusion.expr.Expr ¶
Decode the
input
, using theencoding
. encoding can be base64 or hex.
- datafusion.functions.degrees(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts the argument from radians to degrees.
- datafusion.functions.dense_rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a dense_rank window function.
This window function is similar to
rank()
except that the returned values will be consecutive. Here is an example of a dataframe with a window ordered by descendingpoints
and the associated dense rank:+--------+------------+ | points | dense_rank | +--------+------------+ | 100 | 1 | | 100 | 1 | | 50 | 2 | | 25 | 3 | +--------+------------+
- Parameters:
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.digest(value: datafusion.expr.Expr, method: datafusion.expr.Expr) datafusion.expr.Expr ¶
Computes the binary hash of an expression using the specified algorithm.
Standard algorithms are md5, sha224, sha256, sha384, sha512, blake2s, blake2b, and blake3.
- datafusion.functions.encode(input: datafusion.expr.Expr, encoding: datafusion.expr.Expr) datafusion.expr.Expr ¶
Encode the
input
, using theencoding
. encoding can be base64 or hex.
- datafusion.functions.ends_with(arg: datafusion.expr.Expr, suffix: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns true if the
string
ends with thesuffix
, false otherwise.
- datafusion.functions.exp(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the exponential of the argument.
- datafusion.functions.factorial(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the factorial of the argument.
- datafusion.functions.find_in_set(string: datafusion.expr.Expr, string_list: datafusion.expr.Expr) datafusion.expr.Expr ¶
Find a string in a list of strings.
Returns a value in the range of 1 to N if the string is in the string list
string_list
consisting of N substrings.The string list is a string composed of substrings separated by
,
characters.
- datafusion.functions.first_value(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) datafusion.expr.Expr ¶
Returns the first value in a group of values.
This aggregate function will return the first value in the partition.
If using the builder functions described in ref:_aggregation this function ignores the option
distinct
.- Parameters:
expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate
null_treatment – Assign whether to respect or ignull null values.
- datafusion.functions.flatten(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Flattens an array of arrays into a single array.
- datafusion.functions.floor(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the nearest integer less than or equal to the argument.
- datafusion.functions.from_unixtime(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts an integer to RFC3339 timestamp format string.
- datafusion.functions.gcd(x: datafusion.expr.Expr, y: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the greatest common divisor.
- datafusion.functions.in_list(arg: datafusion.expr.Expr, values: list[datafusion.expr.Expr], negated: bool = False) datafusion.expr.Expr ¶
Returns whether the argument is contained within the list
values
.
- datafusion.functions.initcap(string: datafusion.expr.Expr) datafusion.expr.Expr ¶
Set the initial letter of each word to capital.
Converts the first letter of each word in
string
to uppercase and the remaining characters to lowercase.
- datafusion.functions.isnan(expr: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns true if a given number is +NaN or -NaN otherwise returns false.
- datafusion.functions.iszero(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns true if a given number is +0.0 or -0.0 otherwise returns false.
- datafusion.functions.lag(arg: datafusion.expr.Expr, shift_offset: int = 1, default_value: Any | None = None, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a lag window function.
Lag operation will return the argument that is in the previous shift_offset-th row in the partition. For example
lag(col("b"), shift_offset=3, default_value=5)
will return the 3rd previous value in columnb
. At the beginnig of the partition, where no values can be returned it will return the default value of 5.Here is an example of both the
lag
anddatafusion.functions.lead()
functions on a simple DataFrame:+--------+------+-----+ | points | lead | lag | +--------+------+-----+ | 100 | 100 | | | 100 | 50 | 100 | | 50 | 25 | 100 | | 25 | | 50 | +--------+------+-----+
- Parameters:
arg – Value to return
shift_offset – Number of rows before the current row.
default_value – Value to return if shift_offet row does not exist.
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.last_value(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) datafusion.expr.Expr ¶
Returns the last value in a group of values.
This aggregate function will return the last value in the partition.
If using the builder functions described in ref:_aggregation this function ignores the option
distinct
.- Parameters:
expression – Argument to perform bitwise calculation on
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate
null_treatment – Assign whether to respect or ignull null values.
- datafusion.functions.lcm(x: datafusion.expr.Expr, y: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the least common multiple.
- datafusion.functions.lead(arg: datafusion.expr.Expr, shift_offset: int = 1, default_value: Any | None = None, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a lead window function.
Lead operation will return the argument that is in the next shift_offset-th row in the partition. For example
lead(col("b"), shift_offset=3, default_value=5)
will return the 3rd following value in columnb
. At the end of the partition, where no futher values can be returned it will return the default value of 5.Here is an example of both the
lead
anddatafusion.functions.lag()
functions on a simple DataFrame:+--------+------+-----+ | points | lead | lag | +--------+------+-----+ | 100 | 100 | | | 100 | 50 | 100 | | 50 | 25 | 100 | | 25 | | 50 | +--------+------+-----+
To set window function parameters use the window builder approach described in the ref:_window_functions online documentation.
- Parameters:
arg – Value to return
shift_offset – Number of rows following the current row.
default_value – Value to return if shift_offet row does not exist.
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.left(string: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the first
n
characters in thestring
.
- datafusion.functions.length(string: datafusion.expr.Expr) datafusion.expr.Expr ¶
The number of characters in the
string
.
- datafusion.functions.levenshtein(string1: datafusion.expr.Expr, string2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the Levenshtein distance between the two given strings.
- datafusion.functions.list_append(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Appends an element to the end of an array.
This is an alias for
array_append()
.
- datafusion.functions.list_dims(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array of the array’s dimensions.
This is an alias for
array_dims()
.
- datafusion.functions.list_distinct(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns distinct values from the array after removing duplicates.
This is an alias for
array_distinct()
.
- datafusion.functions.list_element(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr ¶
Extracts the element with the index n from the array.
This is an alias for
array_element()
.
- datafusion.functions.list_except(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the elements that appear in
array1
but not in thearray2
.This is an alias for
array_except()
.
- datafusion.functions.list_extract(array: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr ¶
Extracts the element with the index n from the array.
This is an alias for
array_element()
.
- datafusion.functions.list_indexof(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr ¶
Return the position of the first occurrence of
element
inarray
.This is an alias for
array_position()
.
- datafusion.functions.list_intersect(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an the intersection of
array1
andarray2
.This is an alias for
array_intersect()
.
- datafusion.functions.list_join(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts each element to its text representation.
This is an alias for
array_to_string()
.
- datafusion.functions.list_length(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the length of the array.
This is an alias for
array_length()
.
- datafusion.functions.list_ndims(array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the number of dimensions of the array.
This is an alias for
array_ndims()
.
- datafusion.functions.list_position(array: datafusion.expr.Expr, element: datafusion.expr.Expr, index: int | None = 1) datafusion.expr.Expr ¶
Return the position of the first occurrence of
element
inarray
.This is an alias for
array_position()
.
- datafusion.functions.list_positions(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Searches for an element in the array and returns all occurrences.
This is an alias for
array_positions()
.
- datafusion.functions.list_prepend(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Prepends an element to the beginning of an array.
This is an alias for
array_prepend()
.
- datafusion.functions.list_push_back(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Appends an element to the end of an array.
This is an alias for
array_append()
.
- datafusion.functions.list_push_front(element: datafusion.expr.Expr, array: datafusion.expr.Expr) datafusion.expr.Expr ¶
Prepends an element to the beginning of an array.
This is an alias for
array_prepend()
.
- datafusion.functions.list_remove(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes the first element from the array equal to the given value.
This is an alias for
array_remove()
.
- datafusion.functions.list_remove_all(array: datafusion.expr.Expr, element: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes all elements from the array equal to the given value.
This is an alias for
array_remove_all()
.
- datafusion.functions.list_remove_n(array: datafusion.expr.Expr, element: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes the first
max
elements from the array equal to the given value.This is an alias for
array_remove_n()
.
- datafusion.functions.list_replace(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replaces the first occurrence of
from_val
withto_val
.This is an alias for
array_replace()
.
- datafusion.functions.list_replace_all(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replaces all occurrences of
from_val
withto_val
.This is an alias for
array_replace_all()
.
- datafusion.functions.list_replace_n(array: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr, max: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replace
n
occurrences offrom_val
withto_val
.Replaces the first
max
occurrences of the specified element with another specified element.This is an alias for
array_replace_n()
.
- datafusion.functions.list_resize(array: datafusion.expr.Expr, size: datafusion.expr.Expr, value: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array with the specified size filled.
If
size
is greater than thearray
length, the additional entries will be filled with the givenvalue
. This is an alias forarray_resize()
.
- datafusion.functions.list_slice(array: datafusion.expr.Expr, begin: datafusion.expr.Expr, end: datafusion.expr.Expr, stride: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns a slice of the array.
This is an alias for
array_slice()
.
- datafusion.functions.list_sort(array: datafusion.expr.Expr, descending: bool = False, null_first: bool = False) datafusion.expr.Expr ¶
This is an alias for
array_sort()
.
- datafusion.functions.list_to_string(expr: datafusion.expr.Expr, delimiter: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts each element to its text representation.
This is an alias for
array_to_string()
.
- datafusion.functions.list_union(array1: datafusion.expr.Expr, array2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array of the elements in the union of array1 and array2.
Duplicate rows will not be returned.
This is an alias for
array_union()
.
- datafusion.functions.ln(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the natural logarithm (base e) of the argument.
- datafusion.functions.log(base: datafusion.expr.Expr, num: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the logarithm of a number for a particular
base
.
- datafusion.functions.log10(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Base 10 logarithm of the argument.
- datafusion.functions.log2(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Base 2 logarithm of the argument.
- datafusion.functions.lower(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts a string to lowercase.
- datafusion.functions.lpad(string: datafusion.expr.Expr, count: datafusion.expr.Expr, characters: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Add left padding to a string.
Extends the string to length length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right).
- datafusion.functions.ltrim(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes all characters, spaces by default, from the beginning of a string.
- datafusion.functions.make_array(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an array using the specified input expressions.
- datafusion.functions.make_date(year: datafusion.expr.Expr, month: datafusion.expr.Expr, day: datafusion.expr.Expr) datafusion.expr.Expr ¶
Make a date from year, month and day component parts.
- datafusion.functions.max(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Aggregate function that returns the maximum value of the argument.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – The value to find the maximum of
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.md5(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Computes an MD5 128-bit checksum for a string expression.
- datafusion.functions.mean(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the average (mean) value of the argument.
This is an alias for
avg()
.
- datafusion.functions.median(expression: datafusion.expr.Expr, distinct: bool = False, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the median of a set of numbers.
This aggregate function returns the median value of the expression for the given aggregate function.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
andnull_treatment
.- Parameters:
expression – The value to compute the median of
distinct – If True, a single entry for each distinct value will be in the result
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.min(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Returns the minimum value of the argument.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – The value to find the minimum of
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.named_struct(name_pairs: list[tuple[str, datafusion.expr.Expr]]) datafusion.expr.Expr ¶
Returns a struct with the given names and arguments pairs.
- datafusion.functions.nanvl(x: datafusion.expr.Expr, y: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns
x
ifx
is notNaN
. Otherwise returnsy
.
- datafusion.functions.now() datafusion.expr.Expr ¶
Returns the current timestamp in nanoseconds.
This will use the same value for all instances of now() in same statement.
- datafusion.functions.nth_value(expression: datafusion.expr.Expr, n: int, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr] | None = None, null_treatment: datafusion.common.NullTreatment = NullTreatment.RESPECT_NULLS) datafusion.expr.Expr ¶
Returns the n-th value in a group of values.
This aggregate function will return the n-th value in the partition.
If using the builder functions described in ref:_aggregation this function ignores the option
distinct
.- Parameters:
expression – Argument to perform bitwise calculation on
n – Index of value to return. Starts at 1.
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate
null_treatment – Assign whether to respect or ignull null values.
- datafusion.functions.ntile(groups: int, partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a n-tile window function.
This window function orders the window frame into a give number of groups based on the ordering criteria. It then returns which group the current row is assigned to. Here is an example of a dataframe with a window ordered by descending
points
and the associated n-tile function:+--------+-------+ | points | ntile | +--------+-------+ | 120 | 1 | | 100 | 1 | | 80 | 2 | | 60 | 2 | | 40 | 3 | | 20 | 3 | +--------+-------+
- Parameters:
groups – Number of groups for the n-tile to be divided into.
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.nullif(expr1: datafusion.expr.Expr, expr2: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns NULL if expr1 equals expr2; otherwise it returns expr1.
This can be used to perform the inverse operation of the COALESCE expression.
- datafusion.functions.octet_length(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the number of bytes of a string.
- datafusion.functions.order_by(expr: datafusion.expr.Expr, ascending: bool = True, nulls_first: bool = True) datafusion.expr.Expr ¶
Creates a new sort expression.
- datafusion.functions.overlay(string: datafusion.expr.Expr, substring: datafusion.expr.Expr, start: datafusion.expr.Expr, length: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Replace a substring with a new substring.
Replace the substring of string that starts at the
start
’th character and extends forlength
characters with new substring.
- datafusion.functions.percent_rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a percent_rank window function.
This window function is similar to
rank()
except that the returned values are the percentage from 0.0 to 1.0 from first to last. Here is an example of a dataframe with a window ordered by descendingpoints
and the associated percent rank:+--------+--------------+ | points | percent_rank | +--------+--------------+ | 100 | 0.0 | | 100 | 0.0 | | 50 | 0.666667 | | 25 | 1.0 | +--------+--------------+
- Parameters:
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.pi() datafusion.expr.Expr ¶
Returns an approximate value of π.
- datafusion.functions.pow(base: datafusion.expr.Expr, exponent: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns
base
raised to the power ofexponent
.This is an alias of
power()
.
- datafusion.functions.power(base: datafusion.expr.Expr, exponent: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns
base
raised to the power ofexponent
.
- datafusion.functions.radians(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts the argument from degrees to radians.
- datafusion.functions.random() datafusion.expr.Expr ¶
Returns a random value in the range
0.0 <= x < 1.0
.
- datafusion.functions.range(start: datafusion.expr.Expr, stop: datafusion.expr.Expr, step: datafusion.expr.Expr) datafusion.expr.Expr ¶
Create a list of values in the range between start and stop.
- datafusion.functions.rank(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a rank window function.
Returns the rank based upon the window order. Consecutive equal values will receive the same rank, but the next different value will not be consecutive but rather the number of rows that preceed it plus one. This is similar to Olympic medals. If two people tie for gold, the next place is bronze. There would be no silver medal. Here is an example of a dataframe with a window ordered by descending
points
and the associated rank.You should set
order_by
to produce meaningful results:+--------+------+ | points | rank | +--------+------+ | 100 | 1 | | 100 | 1 | | 50 | 3 | | 25 | 4 | +--------+------+
- Parameters:
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.regexp_like(string: datafusion.expr.Expr, regex: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Find if any regular expression (regex) matches exist.
Tests a string using a regular expression returning true if at least one match, false otherwise.
- datafusion.functions.regexp_match(string: datafusion.expr.Expr, regex: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Perform regular expression (regex) matching.
Returns an array with each element containing the leftmost-first match of the corresponding index in
regex
to string instring
.
- datafusion.functions.regexp_replace(string: datafusion.expr.Expr, pattern: datafusion.expr.Expr, replacement: datafusion.expr.Expr, flags: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Replaces substring(s) matching a PCRE-like regular expression.
The full list of supported features and syntax can be found at <https://docs.rs/regex/latest/regex/#syntax>
Supported flags with the addition of ‘g’ can be found at <https://docs.rs/regex/latest/regex/#grouping-and-flags>
- datafusion.functions.regr_avgx(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the average of the independent variable
x
.This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_avgy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the average of the dependent variable
y
.This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_count(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Counts the number of rows in which both expressions are not null.
This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_intercept(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the intercept from the linear regression.
This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_r2(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the R-squared value from linear regression.
This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_slope(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the slope from linear regression.
This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_sxx(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sum of squares of the independent variable
x
.This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_sxy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sum of products of pairs of numbers.
This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.regr_syy(y: datafusion.expr.Expr, x: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sum of squares of the dependent variable
y
.This is a linear regression aggregate function. Only non-null pairs of the inputs are evaluated.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
y – The linear regression dependent variable
x – The linear regression independent variable
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.repeat(string: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr ¶
Repeats the
string
ton
times.
- datafusion.functions.replace(string: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replaces all occurrences of
from_val
withto_val
in thestring
.
- datafusion.functions.reverse(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Reverse the string argument.
- datafusion.functions.right(string: datafusion.expr.Expr, n: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the last
n
characters in thestring
.
- datafusion.functions.round(value: datafusion.expr.Expr, decimal_places: datafusion.expr.Expr = Expr.literal(0)) datafusion.expr.Expr ¶
Round the argument to the nearest integer.
If the optional
decimal_places
is specified, round to the nearest number of decimal places. You can specify a negative number of decimal places. For exampleround(lit(125.2345), lit(-2))
would yield a value of100.0
.
- datafusion.functions.row_number(partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Create a row number window function.
Returns the row number of the window function.
Here is an example of the
row_number
on a simple DataFrame:+--------+------------+ | points | row number | +--------+------------+ | 100 | 1 | | 100 | 2 | | 50 | 3 | | 25 | 4 | +--------+------------+
- Parameters:
partition_by – Expressions to partition the window frame on.
order_by – Set ordering within the window frame.
- datafusion.functions.rpad(string: datafusion.expr.Expr, count: datafusion.expr.Expr, characters: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Add right padding to a string.
Extends the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated.
- datafusion.functions.rtrim(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes all characters, spaces by default, from the end of a string.
- datafusion.functions.sha224(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Computes the SHA-224 hash of a binary string.
- datafusion.functions.sha256(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Computes the SHA-256 hash of a binary string.
- datafusion.functions.sha384(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Computes the SHA-384 hash of a binary string.
- datafusion.functions.sha512(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Computes the SHA-512 hash of a binary string.
- datafusion.functions.signum(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the sign of the argument (-1, 0, +1).
- datafusion.functions.sin(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the sine of the argument.
- datafusion.functions.sinh(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the hyperbolic sine of the argument.
- datafusion.functions.split_part(string: datafusion.expr.Expr, delimiter: datafusion.expr.Expr, index: datafusion.expr.Expr) datafusion.expr.Expr ¶
Split a string and return one part.
Splits a string based on a delimiter and picks out the desired field based on the index.
- datafusion.functions.sqrt(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the square root of the argument.
- datafusion.functions.starts_with(string: datafusion.expr.Expr, prefix: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns true if string starts with prefix.
- datafusion.functions.stddev(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the standard deviation of the argument.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – The value to find the minimum of
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.stddev_pop(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the population standard deviation of the argument.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – The value to find the minimum of
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.stddev_samp(arg: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sample standard deviation of the argument.
This is an alias for
stddev()
.
- datafusion.functions.string_agg(expression: datafusion.expr.Expr, delimiter: str, filter: datafusion.expr.Expr | None = None, order_by: list[datafusion.expr.Expr] | None = None) datafusion.expr.Expr ¶
Concatenates the input strings.
This aggregate function will concatenate input strings, ignoring null values, and seperating them with the specified delimiter. Non-string values will be converted to their string equivalents.
If using the builder functions described in ref:_aggregation this function ignores the options
distinct
andnull_treatment
.- Parameters:
expression – Argument to perform bitwise calculation on
delimiter – Text to place between each value of expression
filter – If provided, only compute against rows for which the filter is True
order_by – Set the ordering of the expression to evaluate
- datafusion.functions.strpos(string: datafusion.expr.Expr, substring: datafusion.expr.Expr) datafusion.expr.Expr ¶
Finds the position from where the
substring
matches thestring
.
- datafusion.functions.struct(*args: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns a struct with the given arguments.
- datafusion.functions.substr(string: datafusion.expr.Expr, position: datafusion.expr.Expr) datafusion.expr.Expr ¶
Substring from the
position
to the end.
- datafusion.functions.substr_index(string: datafusion.expr.Expr, delimiter: datafusion.expr.Expr, count: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns an indexed substring.
The return will be the
string
from beforecount
occurrences ofdelimiter
.
- datafusion.functions.substring(string: datafusion.expr.Expr, position: datafusion.expr.Expr, length: datafusion.expr.Expr) datafusion.expr.Expr ¶
Substring from the
position
withlength
characters.
- datafusion.functions.sum(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sum of a set of numbers.
This aggregate function expects a numeric expression.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – Values to combine into an array
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.tan(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the tangent of the argument.
- datafusion.functions.tanh(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns the hyperbolic tangent of the argument.
- datafusion.functions.to_hex(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts an integer to a hexadecimal string.
- datafusion.functions.to_timestamp(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts a string and optional formats to a
Timestamp
in nanoseconds.For usage of
formatters
see the rust chrono packagestrftime
package.[Documentation here.](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
- datafusion.functions.to_timestamp_micros(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts a string and optional formats to a
Timestamp
in microseconds.See
to_timestamp()
for a description on how to use formatters.
- datafusion.functions.to_timestamp_millis(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts a string and optional formats to a
Timestamp
in milliseconds.See
to_timestamp()
for a description on how to use formatters.
- datafusion.functions.to_timestamp_seconds(arg: datafusion.expr.Expr, *formatters: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts a string and optional formats to a
Timestamp
in seconds.See
to_timestamp()
for a description on how to use formatters.
- datafusion.functions.to_unixtime(string: datafusion.expr.Expr, *format_arguments: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts a string and optional formats to a Unixtime.
- datafusion.functions.translate(string: datafusion.expr.Expr, from_val: datafusion.expr.Expr, to_val: datafusion.expr.Expr) datafusion.expr.Expr ¶
Replaces the characters in
from_val
with the counterpart into_val
.
- datafusion.functions.trim(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Removes all characters, spaces by default, from both sides of a string.
- datafusion.functions.trunc(num: datafusion.expr.Expr, precision: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Truncate the number toward zero with optional precision.
- datafusion.functions.upper(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Converts a string to uppercase.
- datafusion.functions.uuid(arg: datafusion.expr.Expr) datafusion.expr.Expr ¶
Returns uuid v4 as a string value.
- datafusion.functions.var(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sample variance of the argument.
This is an alias for
var_samp()
.
- datafusion.functions.var_pop(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the population variance of the argument.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – The variable to compute the variance for
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.var_samp(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sample variance of the argument.
If using the builder functions described in ref:_aggregation this function ignores the options
order_by
,null_treatment
, anddistinct
.- Parameters:
expression – The variable to compute the variance for
filter – If provided, only compute against rows for which the filter is True
- datafusion.functions.var_sample(expression: datafusion.expr.Expr, filter: datafusion.expr.Expr | None = None) datafusion.expr.Expr ¶
Computes the sample variance of the argument.
This is an alias for
var_samp()
.
- datafusion.functions.when(when: datafusion.expr.Expr, then: datafusion.expr.Expr) datafusion.expr.CaseBuilder ¶
Create a case expression that has no base expression.
Create a
CaseBuilder
to match cases for the expressionexpr
. SeeCaseBuilder
for detailed usage.
- datafusion.functions.window(name: str, args: list[datafusion.expr.Expr], partition_by: list[datafusion.expr.Expr] | None = None, order_by: list[datafusion.expr.Expr] | None = None, window_frame: datafusion.expr.WindowFrame | None = None, ctx: datafusion.context.SessionContext | None = None) datafusion.expr.Expr ¶
Creates a new Window function expression.
This interface will soon be deprecated. Instead of using this interface, users should call the window functions directly. For example, to perform a lag use:
df.select(functions.lag(col("a")).partition_by(col("b")).build())