Scalar Functions (NEW)¶
This page is a WIP and will replace the Scalar Functions page once completed.
Math Functions¶
log
¶
Returns the base-x logarithm of a number. Can either provide a specified base, or if omitted then takes the base-10 of a number.
log(base, numeric_expression)
log(numeric_expression)
Arguments¶
base: Base numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
numeric_expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
Conditional Functions¶
coalesce
¶
Returns the first of its arguments that is not null. Returns null if all arguments are null. This function is often used to substitute a default value for null values.
coalesce(expression1[, ..., expression_n])
Arguments¶
expression1, expression_n: Expression to use if previous expressions are null. Can be a constant, column, or function, and any combination of arithmetic operators. Pass as many expression arguments as necessary.
String Functions¶
ascii
¶
Returns the ASCII value of the first character in a string.
ascii(str)
Arguments¶
str: String expression to operate on. Can be a constant, column, or function that evaluates to or can be coerced to a Utf8, LargeUtf8 or a Utf8View.
Related functions:
rpad
¶
Pads the right side of a string with another string to a specified string length.
rpad(str, n[, padding_str])
Arguments¶
str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
n: String length to pad to.
padding_str: String expression to pad with. Can be a constant, column, or function, and any combination of string operators. Default is a space.
Related functions:
Binary String Functions¶
decode
¶
Decode binary data from textual representation in string.
decode(expression, format)
Arguments¶
expression: Expression containing encoded string data
format: Same arguments as encode
Related functions:
encode
¶
Encode binary data into a textual representation.
encode(expression, format)
Arguments¶
expression: Expression containing string or binary data
format: Supported formats are:
base64
,hex
Related functions:
Regular Expression Functions¶
Apache DataFusion uses a PCRE-like regular expression syntax (minus support for several features including look-around and backreferences). The following regular expression functions are supported:
regexp_like
¶
Returns true if a regular expression has at least one match in a string, false otherwise.
regexp_like(str, regexp[, flags])
Arguments¶
str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
regexp: Regular expression to test against the string expression. Can be a constant, column, or function.
flags: Optional regular expression flags that control the behavior of the regular expression. The following flags are supported:
i: case-insensitive: letters match both upper and lower case
m: multi-line mode: ^ and $ match begin/end of line
s: allow . to match \n
R: enables CRLF mode: when multi-line mode is enabled, \r\n is used
U: swap the meaning of x* and x*?
Example¶
select regexp_like('Köln', '[a-zA-Z]ö[a-zA-Z]{2}');
+--------------------------------------------------------+
| regexp_like(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) |
+--------------------------------------------------------+
| true |
+--------------------------------------------------------+
SELECT regexp_like('aBc', '(b|d)', 'i');
+--------------------------------------------------+
| regexp_like(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) |
+--------------------------------------------------+
| true |
+--------------------------------------------------+
Additional examples can be found here
Time and Date Functions¶
to_date
¶
Converts a value to a date (YYYY-MM-DD
).
Supports strings, integer and double types as input.
Strings are parsed as YYYY-MM-DD (e.g. ‘2023-07-20’) if no Chrono formats are provided.
Integers and doubles are interpreted as days since the unix epoch (1970-01-01T00:00:00Z
).
Returns the corresponding date.
Note: to_date
returns Date32, which represents its values as the number of days since unix epoch(1970-01-01
) stored as signed 32 bit value. The largest supported date value is 9999-12-31
.
to_date('2017-05-31', '%Y-%m-%d')
Arguments¶
expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
format_n: Optional Chrono format strings to use to parse the expression. Formats will be tried in the order they appear with the first successful one being returned. If none of the formats successfully parse the expression an error will be returned.
Example¶
> select to_date('2023-01-31');
+-----------------------------+
| to_date(Utf8("2023-01-31")) |
+-----------------------------+
| 2023-01-31 |
+-----------------------------+
> select to_date('2023/01/31', '%Y-%m-%d', '%Y/%m/%d');
+---------------------------------------------------------------+
| to_date(Utf8("2023/01/31"),Utf8("%Y-%m-%d"),Utf8("%Y/%m/%d")) |
+---------------------------------------------------------------+
| 2023-01-31 |
+---------------------------------------------------------------+
Additional examples can be found here
Hashing Functions¶
sha224
¶
Computes the SHA-224 hash of a binary string.
sha224(expression)
Arguments¶
expression: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.