# Expression API DataFrame methods such as `select` and `filter` accept one or more logical expressions and there are many functions available for creating logical expressions. These are documented below. :::{tip} Most functions and methods may receive and return an `Expr`, which can be chained together using a fluent-style API: ```rust use datafusion::prelude::*; // create the expression `(a > 6) AND (b < 7)` col("a").gt(lit(6)).and(col("b").lt(lit(7))); ``` ::: ## Identifiers | Syntax | Description | | ---------- | -------------------------------------------- | | col(ident) | Reference a column in a dataframe `col("a")` | :::{note} ident : A type which implement `Into` trait ::: ## Literal Values | Syntax | Description | | ---------- | -------------------------------------------------- | | lit(value) | Literal value such as `lit(123)` or `lit("hello")` | :::{note} value : A type which implement `Literal` ::: ## Boolean Expressions | Syntax | Description | | ------------------- | ----------- | | and(x, y), x.and(y) | Logical AND | | or(x, y), x.or(y) | Logical OR | | !x, not(x), x.not() | Logical NOT | :::{note} `!` is a bitwise or logical complement operator in Rust, but it only works as a logical NOT in expression API. ::: :::{note} Since `&&` and `||` are logical operators in Rust and cannot be overloaded these are not available in the expression API. ::: ## Bitwise Expressions | Syntax | Description | | ------------------------------------------- | ----------- | | x & y, bitwise_and(x, y), x.bitand(y) | AND | | x \| y, bitwise_or(x, y), x.bitor(y) | OR | | x ^ y, bitwise_xor(x, y), x.bitxor(y) | XOR | | x << y, bitwise_shift_left(x, y), x.shl(y) | Left shift | | x >> y, bitwise_shift_right(x, y), x.shr(y) | Right shift | ## Comparison Expressions | Syntax | Description | | ----------- | --------------------- | | x.eq(y) | Equal | | x.not_eq(y) | Not Equal | | x.gt(y) | Greater Than | | x.gt_eq(y) | Greater Than or Equal | | x.lt(y) | Less Than | | x.lt_eq(y) | Less Than or Equal | :::{note} Comparison operators (`<`, `<=`, `==`, `>=`, `>`) could be overloaded by the `PartialOrd` and `PartialEq` trait in Rust, but these operators always return a `bool` which makes them not work with the expression API. ::: ## Arithmetic Expressions | Syntax | Description | | ---------------- | -------------- | | x + y, x.add(y) | Addition | | x - y, x.sub(y) | Subtraction | | x \* y, x.mul(y) | Multiplication | | x / y, x.div(y) | Division | | x % y, x.rem(y) | Remainder | | -x, x.neg() | Negation | ## Math Functions | Syntax | Description | | --------------------- | ------------------------------------------------- | | abs(x) | absolute value | | acos(x) | inverse cosine | | acosh(x) | inverse hyperbolic cosine | | asin(x) | inverse sine | | asinh(x) | inverse hyperbolic sine | | atan(x) | inverse tangent | | atanh(x) | inverse hyperbolic tangent | | atan2(y, x) | inverse tangent of y / x | | cbrt(x) | cube root | | ceil(x) | nearest integer greater than or equal to argument | | cos(x) | cosine | | cosh(x) | hyperbolic cosine | | degrees(x) | converts radians to degrees | | exp(x) | exponential | | factorial(x) | factorial | | floor(x) | nearest integer less than or equal to argument | | gcd(x, y) | greatest common divisor | | isnan(x) | predicate determining whether NaN/-NaN or not | | iszero(x) | predicate determining whether 0.0/-0.0 or not | | lcm(x, y) | least common multiple | | ln(x) | natural logarithm | | log(base, x) | logarithm of x for a particular base | | log10(x) | base 10 logarithm | | log2(x) | base 2 logarithm | | nanvl(x, y) | returns x if x is not NaN otherwise returns y | | pi() | approximate value of π | | power(base, exponent) | base raised to the power of exponent | | radians(x) | converts degrees to radians | | round(x) | round to nearest integer | | signum(x) | sign of the argument (-1, 0, +1) | | sin(x) | sine | | sinh(x) | hyperbolic sine | | sqrt(x) | square root | | tan(x) | tangent | | tanh(x) | hyperbolic tangent | | trunc(x) | truncate toward zero | :::{note} Unlike to some databases the math functions in Datafusion works the same way as Rust math functions, avoiding failing on corner cases e.g. ```sql select log(-1), log(0), sqrt(-1); +----------------+---------------+-----------------+ | log(Int64(-1)) | log(Int64(0)) | sqrt(Int64(-1)) | +----------------+---------------+-----------------+ | NaN | -inf | NaN | +----------------+---------------+-----------------+ ``` ::: ## Conditional Expressions | Syntax | Description | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | coalesce([value, ...]) | Returns the first of its arguments that is not null. Null is returned only if all arguments are null. It is often used to substitute a default value for null values when data is retrieved for display. | | case(expr)
    .when(expr)
    .end(),
case(expr)
    .when(expr)
    .otherwise(expr) | CASE expression. The expression may chain multiple `when` expressions and end with an `end` or `otherwise` expression. Example:
case(col("a") % lit(3))
    .when(lit(0), lit("A"))
    .when(lit(1), lit("B"))
    .when(lit(2), lit("C"))
    .end()
or, end with `otherwise` to match any other conditions:
case(col("b").gt(lit(100)))
    .when(lit(true), lit("value > 100"))
    .otherwise(lit("value <= 100"))
| | nullif(value1, value2) | Returns a null value if `value1` equals `value2`; otherwise it returns `value1`. This can be used to perform the inverse operation of the `coalesce` expression. | ## String Expressions | Syntax | Description | | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ascii(character) | Returns a numeric representation of the character (`character`). Example: `ascii('a') -> 97` | | bit_length(text) | Returns the length of the string (`text`) in bits. Example: `bit_length('spider') -> 48` | | btrim(text, characters) | Removes all specified characters (`characters`) from both the beginning and the end of the string (`text`). Example: `btrim('aabchelloccb', 'abc') -> hello` | | char_length(text) | Returns number of characters in the string (`text`). The same as `character_length` and `length`. Example: `character_length('lion') -> 4` | | character_length(text) | Returns number of characters in the string (`text`). The same as `char_length` and `length`. Example: `char_length('lion') -> 4` | | concat(value1, [value2 [, ...]]) | Concatenates the text representations (`value1, [value2 [, ...]]`) of all the arguments. NULL arguments are ignored. Example: `concat('aaa', 'bbc', NULL, 321) -> aaabbc321` | | concat_ws(separator, value1, [value2 [, ...]]) | Concatenates the text representations (`value1, [value2 [, ...]]`) of all the arguments with the separator (`separator`). NULL arguments are ignored. `concat_ws('/', 'path', 'to', NULL, 'my', 'folder', 123) -> path/to/my/folder/123` | | chr(integer) | Returns a character by its numeric representation (`integer`). Example: `chr(90) -> 8` | | initcap | Converts the first letter of each word to upper case and the rest to lower case. Example: `initcap('hi TOM') -> Hi Tom` | | left(text, number) | Returns a certain number (`number`) of first characters (`text`). Example: `left('like', 2) -> li` | | length(text) | Returns number of characters in the string (`text`). The same as `character_length` and `char_length`. Example: `length('lion') -> 4` | | lower(text) | Converts all characters in the string (`text`) into lower case. Example: `lower('HELLO') -> hello` | | lpad(text, length, [, fill]) | Extends the string to length (`length`) by prepending the characters (`fill`) (a space by default). Example: `lpad('bb', 5, 'a') → aaabb` | | ltrim(text, text) | Removes all specified characters (`characters`) from the beginning of the string (`text`). Example: `ltrim('aabchelloccb', 'abc') -> helloccb` | | md5(text) | Computes the MD5 hash of the argument (`text`). | | octet_length(text) | Returns number of bytes in the string (`text`). | | repeat(text, number) | Repeats the string the specified number of times. Example: `repeat('1', 4) -> 1111` | | replace(string, from, to) | Replaces a specified string (`from`) with another specified string (`to`) in the string (`string`). Example: `replace('Hello', 'replace', 'el') -> Hola` | | reverse(text) | Reverses the order of the characters in the string (`text`). Example: `reverse('hello') -> olleh` | | right(text, number) | Returns a certain number (`number`) of last characters (`text`). Example: `right('like', 2) -> ke` | | rpad(text, length, [, fill]) | Extends the string to length (`length`) by prepending the characters (`fill`) (a space by default). Example: `rpad('bb', 5, 'a') → bbaaa` | | rtrim | Removes all specified characters (`characters`) from the end of the string (`text`). Example: `rtrim('aabchelloccb', 'abc') -> aabchello` | | digest(input, algorithm) | Computes the binary hash of `input`, using the `algorithm`. | | split_part(string, delimiter, index) | Splits the string (`string`) based on a delimiter (`delimiter`) and picks out the desired field based on the index (`index`). | | starts_with(string, prefix) | Returns `true` if the string (`string`) starts with the specified prefix (`prefix`). If not, it returns `false`. Example: `starts_with('Hi Tom', 'Hi') -> true` | | strpos | Finds the position from where the `substring` matches the `string` | | substr(string, position, [, length]) | Returns substring from the position (`position`) with length (`length`) characters in the string (`string`). | | translate(string, from, to) | Replaces the characters in `from` with the counterpart in `to`. Example: `translate('abcde', 'acd', '15') -> 1b5e` | | trim(string) | Removes all characters, space by default from the string (`string`) | | upper | Converts all characters in the string into upper case. Example: `upper('hello') -> HELLO` | ## Array Expressions | Syntax | Description | | ---------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | array_any_value(array) | Returns the first non-null element in the array. `array_any_value([NULL, 1, 2, 3]) -> 1` | | array_append(array, element) | Appends an element to the end of an array. `array_append([1, 2, 3], 4) -> [1, 2, 3, 4]` | | array_concat(array[, ..., array_n]) | Concatenates arrays. `array_concat([1, 2, 3], [4, 5, 6]) -> [1, 2, 3, 4, 5, 6]` | | array_has(array, element) | Returns true if the array contains the element `array_has([1,2,3], 1) -> true` | | array_has_all(array, sub-array) | Returns true if all elements of sub-array exist in array `array_has_all([1,2,3], [1,3]) -> true` | | array_has_any(array, sub-array) | Returns true if any elements exist in both arrays `array_has_any([1,2,3], [1,4]) -> true` | | array_dims(array) | Returns an array of the array's dimensions. `array_dims([[1, 2, 3], [4, 5, 6]]) -> [2, 3]` | | array_distinct(array) | Returns distinct values from the array after removing duplicates. `array_distinct([1, 3, 2, 3, 1, 2, 4]) -> [1, 2, 3, 4]` | | array_element(array, index) | Extracts the element with the index n from the array `array_element([1, 2, 3, 4], 3) -> 3` | | empty(array) | Returns true for an empty array or false for a non-empty array. `empty([1]) -> false` | | flatten(array) | Converts an array of arrays to a flat array `flatten([[1], [2, 3], [4, 5, 6]]) -> [1, 2, 3, 4, 5, 6]` | | array_length(array, dimension) | Returns the length of the array dimension. `array_length([1, 2, 3, 4, 5]) -> 5` | | array_ndims(array) | Returns the number of dimensions of the array. `array_ndims([[1, 2, 3], [4, 5, 6]]) -> 2` | | array_pop_front(array) | Returns the array without the first element. `array_pop_front([1, 2, 3]) -> [2, 3]` | | array_pop_back(array) | Returns the array without the last element. `array_pop_back([1, 2, 3]) -> [1, 2]` | | array_position(array, element) | Searches for an element in the array, returns first occurrence. `array_position([1, 2, 2, 3, 4], 2) -> 2` | | array_positions(array, element) | Searches for an element in the array, returns all occurrences. `array_positions([1, 2, 2, 3, 4], 2) -> [2, 3]` | | array_prepend(element, array) | Prepends an element to the beginning of an array. `array_prepend(1, [2, 3, 4]) -> [1, 2, 3, 4]` | | array_repeat(element, count) | Returns an array containing element `count` times. `array_repeat(1, 3) -> [1, 1, 1]` | | array_remove(array, element) | Removes the first element from the array equal to the given value. `array_remove([1, 2, 2, 3, 2, 1, 4], 2) -> [1, 2, 3, 2, 1, 4]` | | array_remove_n(array, element, max) | Removes the first `max` elements from the array equal to the given value. `array_remove_n([1, 2, 2, 3, 2, 1, 4], 2, 2) -> [1, 3, 2, 1, 4]` | | array_remove_all(array, element) | Removes all elements from the array equal to the given value. `array_remove_all([1, 2, 2, 3, 2, 1, 4], 2) -> [1, 3, 1, 4]` | | array_replace(array, from, to) | Replaces the first occurrence of the specified element with another specified element. `array_replace([1, 2, 2, 3, 2, 1, 4], 2, 5) -> [1, 5, 2, 3, 2, 1, 4]` | | array_replace_n(array, from, to, max) | Replaces the first `max` occurrences of the specified element with another specified element. `array_replace_n([1, 2, 2, 3, 2, 1, 4], 2, 5, 2) -> [1, 5, 5, 3, 2, 1, 4]` | | array_replace_all(array, from, to) | Replaces all occurrences of the specified element with another specified element. `array_replace_all([1, 2, 2, 3, 2, 1, 4], 2, 5) -> [1, 5, 5, 3, 5, 1, 4]` | | array_slice(array, begin,end) | Returns a slice of the array. `array_slice([1, 2, 3, 4, 5, 6, 7, 8], 3, 6) -> [3, 4, 5, 6]` | | array_slice(array, begin, end, stride) | Returns a slice of the array with added stride feature. `array_slice([1, 2, 3, 4, 5, 6, 7, 8], 3, 6, 2) -> [3, 5, 6]` | | array_to_string(array, delimiter) | Converts each element to its text representation. `array_to_string([1, 2, 3, 4], ',') -> 1,2,3,4` | | array_intersect(array1, array2) | Returns an array of the elements in the intersection of array1 and array2. `array_intersect([1, 2, 3, 4], [5, 6, 3, 4]) -> [3, 4]` | | array_union(array1, array2) | Returns an array of the elements in the union of array1 and array2 without duplicates. `array_union([1, 2, 3, 4], [5, 6, 3, 4]) -> [1, 2, 3, 4, 5, 6]` | | array_except(array1, array2) | Returns an array of the elements that appear in the first array but not in the second. `array_except([1, 2, 3, 4], [5, 6, 3, 4]) -> [1, 2]` | | array_resize(array, size, value) | Resizes the list to contain size elements. Initializes new elements with value or empty if value is not set. `array_resize([1, 2, 3], 5, 0) -> [1, 2, 3, 0, 0]` | | array_sort(array, desc, null_first) | Returns sorted array. `array_sort([3, 1, 2, 5, 4]) -> [1, 2, 3, 4, 5]` | | cardinality(array/map) | Returns the total number of elements in the array or map. `cardinality([[1, 2, 3], [4, 5, 6]]) -> 6` | | make_array(value1, [value2 [, ...]]) | Returns an Arrow array using the specified input expressions. `make_array(1, 2, 3) -> [1, 2, 3]` | | range(start [, stop, step]) | Returns an Arrow array between start and stop with step. `SELECT range(2, 10, 3) -> [2, 5, 8]` | | string_to_array(array, delimiter, null_string) | Splits a `string` based on a `delimiter` and returns an array of parts. Any parts matching the optional `null_string` will be replaced with `NULL`. `string_to_array('abc#def#ghi', '#', ' ') -> ['abc', 'def', 'ghi']` | | trim_array(array, n) | Deprecated | ## Regular Expressions | Syntax | Description | | -------------- | ----------------------------------------------------------------------------- | | regexp_match | Matches a regular expression against a string and returns matched substrings. | | regexp_replace | Replaces strings that match a regular expression | ## Temporal Expressions | Syntax | Description | | -------------------- | ------------------------------------------------------ | | date_part | Extracts a subfield from the date. | | date_trunc | Truncates the date to a specified level of precision. | | from_unixtime | Returns the unix time in format. | | to_timestamp | Converts a string to a `Timestamp(_, _)` | | to_timestamp_millis | Converts a string to a `Timestamp(Milliseconds, None)` | | to_timestamp_micros | Converts a string to a `Timestamp(Microseconds, None)` | | to_timestamp_seconds | Converts a string to a `Timestamp(Seconds, None)` | | now() | Returns current time. | ## Other Expressions | Syntax | Description | | ---------------------------- | ---------------------------------------------------------------------------------------------------------- | | array([value1, ...]) | Returns an array of fixed size with each argument (`[value1, ...]`) on it. | | in_list(expr, list, negated) | Returns `true` if (`expr`) belongs or not belongs (`negated`) to a list (`list`), otherwise returns false. | | random() | Returns a random value from 0 (inclusive) to 1 (exclusive). | | sha224(text) | Computes the SHA224 hash of the argument (`text`). | | sha256(text) | Computes the SHA256 hash of the argument (`text`). | | sha384(text) | Computes the SHA384 hash of the argument (`text`). | | sha512(text) | Computes the SHA512 hash of the argument (`text`). | | to_hex(integer) | Converts the integer (`integer`) to the corresponding hexadecimal string. | ## Aggregate Functions | Syntax | Description | | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | avg(expr) | Сalculates the average value for `expr`. | | avg_distinct(expr) | Creates an expression to represent the avg(distinct) aggregate function | | approx_distinct(expr) | Calculates an approximate count of the number of distinct values for `expr`. | | approx_median(expr) | Calculates an approximation of the median for `expr`. | | approx_percentile_cont(expr, percentile [, centroids]) | Calculates an approximation of the specified `percentile` for `expr`. Optional `centroids` parameter controls accuracy (default: 100). | | approx_percentile_cont_with_weight(expr, weight_expr, percentile [, centroids]) | Calculates an approximation of the specified `percentile` for `expr` and `weight_expr`. Optional `centroids` parameter controls accuracy (default: 100). | | bit_and(expr) | Computes the bitwise AND of all non-null input values for `expr`. | | bit_or(expr) | Computes the bitwise OR of all non-null input values for `expr`. | | bit_xor(expr) | Computes the bitwise exclusive OR of all non-null input values for `expr`. | | bool_and(expr) | Returns true if all non-null input values (`expr`) are true, otherwise false. | | bool_or(expr) | Returns true if any non-null input value (`expr`) is true, otherwise false. | | count(expr) | Returns the number of rows for `expr`. | | count_distinct(expr) | Creates an expression to represent the count(distinct) aggregate function | | cube(exprs) | Creates a grouping set for all combination of `exprs` | | grouping_set(exprs) | Create a grouping set. | | max(expr) | Finds the maximum value of `expr`. | | median(expr) | Сalculates the median of `expr`. | | min(expr) | Finds the minimum value of `expr`. | | rollup(exprs) | Creates a grouping set for rollup sets. | | sum(expr) | Сalculates the sum of `expr`. | | sum_distinct(expr) | Creates an expression to represent the sum(distinct) aggregate function | ## Aggregate Function Builder You can also use the `ExprFunctionExt` trait to more easily build Aggregate arguments `Expr`. See `datafusion-examples/examples/expr_api.rs` for example usage. | Syntax | Equivalent to | | ----------------------------------------------------------------------- | ----------------------------------- | | first_value_udaf.call(vec![expr]).order_by(vec![expr]).build().unwrap() | first_value(expr, Some(vec![expr])) | ## Subquery Expressions | Syntax | Description | | --------------- | --------------------------------------------------------------------------------------------- | | exists | Creates an `EXISTS` subquery expression | | in_subquery | `df1.filter(in_subquery(col("foo"), df2))?` is the equivalent of the SQL `WHERE foo IN ` | | not_exists | Creates a `NOT EXISTS` subquery expression | | not_in_subquery | Creates a `NOT IN` subquery expression | | scalar_subquery | Creates a scalar subquery expression | ## User-Defined Function Expressions | Syntax | Description | | ----------- | ------------------------------------------------------------------------- | | create_udf | Creates a new UDF with a specific signature and specific return type. | | create_udaf | Creates a new UDAF with a specific signature, state type and return type. |