Scalar Functions (NEW)

This page is a WIP and will replace the Scalar Functions page once completed.

Math Functions

log

Returns the base-x logarithm of a number. Can either provide a specified base, or if omitted then takes the base-10 of a number.

log(base, numeric_expression)
log(numeric_expression)

Arguments

  • base: Base numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

Conditional Functions

coalesce

Returns the first of its arguments that is not null. Returns null if all arguments are null. This function is often used to substitute a default value for null values.

coalesce(expression1[, ..., expression_n])

Arguments

  • expression1, expression_n: Expression to use if previous expressions are null. Can be a constant, column, or function, and any combination of arithmetic operators. Pass as many expression arguments as necessary.

String Functions

ascii

Returns the ASCII value of the first character in a string.

ascii(str)

Arguments

  • str: String expression to operate on. Can be a constant, column, or function that evaluates to or can be coerced to a Utf8, LargeUtf8 or a Utf8View.

Related functions:

rpad

Pads the right side of a string with another string to a specified string length.

rpad(str, n[, padding_str])

Arguments

  • str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.

  • n: String length to pad to.

  • padding_str: String expression to pad with. Can be a constant, column, or function, and any combination of string operators. Default is a space.

Related functions:

Binary String Functions

decode

Decode binary data from textual representation in string.

decode(expression, format)

Arguments

  • expression: Expression containing encoded string data

  • format: Same arguments as encode

Related functions:

encode

Encode binary data into a textual representation.

encode(expression, format)

Arguments

  • expression: Expression containing string or binary data

  • format: Supported formats are: base64, hex

Related functions:

Regular Expression Functions

Apache DataFusion uses a PCRE-like regular expression syntax (minus support for several features including look-around and backreferences). The following regular expression functions are supported:

regexp_like

Returns true if a regular expression has at least one match in a string, false otherwise.

regexp_like(str, regexp[, flags])

Arguments

  • str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.

  • regexp: Regular expression to test against the string expression. Can be a constant, column, or function.

  • flags: Optional regular expression flags that control the behavior of the regular expression. The following flags are supported:

    • i: case-insensitive: letters match both upper and lower case

    • m: multi-line mode: ^ and $ match begin/end of line

    • s: allow . to match \n

    • R: enables CRLF mode: when multi-line mode is enabled, \r\n is used

    • U: swap the meaning of x* and x*?

Example

select regexp_like('Köln', '[a-zA-Z]ö[a-zA-Z]{2}');
+--------------------------------------------------------+
| regexp_like(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) |
+--------------------------------------------------------+
| true                                                   |
+--------------------------------------------------------+
SELECT regexp_like('aBc', '(b|d)', 'i');
+--------------------------------------------------+
| regexp_like(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) |
+--------------------------------------------------+
| true                                             |
+--------------------------------------------------+

Additional examples can be found here

Time and Date Functions

to_date

Converts a value to a date (YYYY-MM-DD). Supports strings, integer and double types as input. Strings are parsed as YYYY-MM-DD (e.g. ‘2023-07-20’) if no Chrono formats are provided. Integers and doubles are interpreted as days since the unix epoch (1970-01-01T00:00:00Z). Returns the corresponding date.

Note: to_date returns Date32, which represents its values as the number of days since unix epoch(1970-01-01) stored as signed 32 bit value. The largest supported date value is 9999-12-31.

to_date('2017-05-31', '%Y-%m-%d')

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • format_n: Optional Chrono format strings to use to parse the expression. Formats will be tried in the order they appear with the first successful one being returned. If none of the formats successfully parse the expression an error will be returned.

Example

> select to_date('2023-01-31');
+-----------------------------+
| to_date(Utf8("2023-01-31")) |
+-----------------------------+
| 2023-01-31                  |
+-----------------------------+
> select to_date('2023/01/31', '%Y-%m-%d', '%Y/%m/%d');
+---------------------------------------------------------------+
| to_date(Utf8("2023/01/31"),Utf8("%Y-%m-%d"),Utf8("%Y/%m/%d")) |
+---------------------------------------------------------------+
| 2023-01-31                                                    |
+---------------------------------------------------------------+

Additional examples can be found here

Hashing Functions

sha224

Computes the SHA-224 hash of a binary string.

sha224(expression)

Arguments

  • expression: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.