Aggregate Functions¶
Aggregate functions operate on a set of values to compute a single result.
General Functions¶
array_agg
¶
Returns an array created from the expression elements. If ordering is required, elements are inserted in the specified order.
array_agg(expression [ORDER BY expression])
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT array_agg(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| array_agg(column_name ORDER BY other_column) |
+-----------------------------------------------+
| [element1, element2, element3] |
+-----------------------------------------------+
avg
¶
Returns the average of numeric values in the specified column.
avg(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT avg(column_name) FROM table_name;
+---------------------------+
| avg(column_name) |
+---------------------------+
| 42.75 |
+---------------------------+
Aliases¶
mean
bit_and
¶
Computes the bitwise AND of all non-null input values.
bit_and(expression)
Arguments¶
expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.
bit_or
¶
Computes the bitwise OR of all non-null input values.
bit_or(expression)
Arguments¶
expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.
bit_xor
¶
Computes the bitwise exclusive OR of all non-null input values.
bit_xor(expression)
Arguments¶
expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.
bool_and
¶
Returns true if all non-null input values are true, otherwise false.
bool_and(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT bool_and(column_name) FROM table_name;
+----------------------------+
| bool_and(column_name) |
+----------------------------+
| true |
+----------------------------+
bool_or
¶
Returns true if all non-null input values are true, otherwise false.
bool_and(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT bool_and(column_name) FROM table_name;
+----------------------------+
| bool_and(column_name) |
+----------------------------+
| true |
+----------------------------+
count
¶
Returns the number of non-null values in the specified column. To include null values in the total count, use count(*)
.
count(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT count(column_name) FROM table_name;
+-----------------------+
| count(column_name) |
+-----------------------+
| 100 |
+-----------------------+
> SELECT count(*) FROM table_name;
+------------------+
| count(*) |
+------------------+
| 120 |
+------------------+
first_value
¶
Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
first_value(expression [ORDER BY expression])
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT first_value(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| first_value(column_name ORDER BY other_column)|
+-----------------------------------------------+
| first_element |
+-----------------------------------------------+
grouping
¶
Returns 1 if the data is aggregated across the specified column, or 0 if it is not aggregated in the result set.
grouping(expression)
Arguments¶
expression: Expression to evaluate whether data is aggregated across the specified column. Can be a constant, column, or function.
Example¶
> SELECT column_name, GROUPING(column_name) AS group_column
FROM table_name
GROUP BY GROUPING SETS ((column_name), ());
+-------------+-------------+
| column_name | group_column |
+-------------+-------------+
| value1 | 0 |
| value2 | 0 |
| NULL | 1 |
+-------------+-------------+
last_value
¶
Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
last_value(expression [ORDER BY expression])
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT last_value(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| last_value(column_name ORDER BY other_column) |
+-----------------------------------------------+
| last_element |
+-----------------------------------------------+
max
¶
Returns the maximum value in the specified column.
max(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT max(column_name) FROM table_name;
+----------------------+
| max(column_name) |
+----------------------+
| 150 |
+----------------------+
mean
¶
Alias of avg.
median
¶
Returns the median value in the specified column.
median(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT median(column_name) FROM table_name;
+----------------------+
| median(column_name) |
+----------------------+
| 45.5 |
+----------------------+
min
¶
Returns the minimum value in the specified column.
min(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT min(column_name) FROM table_name;
+----------------------+
| min(column_name) |
+----------------------+
| 12 |
+----------------------+
string_agg
¶
Concatenates the values of string expressions and places separator values between them.
string_agg(expression, delimiter)
Arguments¶
expression: The string expression to concatenate. Can be a column or any valid string expression.
delimiter: A literal string used as a separator between the concatenated values.
Example¶
> SELECT string_agg(name, ', ') AS names_list
FROM employee;
+--------------------------+
| names_list |
+--------------------------+
| Alice, Bob, Charlie |
+--------------------------+
sum
¶
Returns the sum of all values in the specified column.
sum(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT sum(column_name) FROM table_name;
+-----------------------+
| sum(column_name) |
+-----------------------+
| 12345 |
+-----------------------+
var
¶
Returns the statistical sample variance of a set of numbers.
var(expression)
Arguments¶
expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
Aliases¶
var_sample
var_samp
var_pop
¶
Returns the statistical population variance of a set of numbers.
var_pop(expression)
Arguments¶
expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
Aliases¶
var_population
var_population
¶
Alias of var_pop.
var_samp
¶
Alias of var.
var_sample
¶
Alias of var.
Statistical Functions¶
corr
¶
Returns the coefficient of correlation between two numeric values.
corr(expression1, expression2)
Arguments¶
expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT corr(column1, column2) FROM table_name;
+--------------------------------+
| corr(column1, column2) |
+--------------------------------+
| 0.85 |
+--------------------------------+
covar
¶
Alias of covar_samp.
covar_pop
¶
Returns the sample covariance of a set of number pairs.
covar_samp(expression1, expression2)
Arguments¶
expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT covar_samp(column1, column2) FROM table_name;
+-----------------------------------+
| covar_samp(column1, column2) |
+-----------------------------------+
| 8.25 |
+-----------------------------------+
covar_samp
¶
Returns the sample covariance of a set of number pairs.
covar_samp(expression1, expression2)
Arguments¶
expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT covar_samp(column1, column2) FROM table_name;
+-----------------------------------+
| covar_samp(column1, column2) |
+-----------------------------------+
| 8.25 |
+-----------------------------------+
Aliases¶
covar
nth_value
¶
Returns the nth value in a group of values.
nth_value(expression, n ORDER BY expression)
Arguments¶
expression: The column or expression to retrieve the nth value from.
n: The position (nth) of the value to retrieve, based on the ordering.
Example¶
> SELECT dept_id, salary, NTH_VALUE(salary, 2) OVER (PARTITION BY dept_id ORDER BY salary ASC) AS second_salary_by_dept
FROM employee;
+---------+--------+-------------------------+
| dept_id | salary | second_salary_by_dept |
+---------+--------+-------------------------+
| 1 | 30000 | NULL |
| 1 | 40000 | 40000 |
| 1 | 50000 | 40000 |
| 2 | 35000 | NULL |
| 2 | 45000 | 45000 |
+---------+--------+-------------------------+
regr_avgx
¶
Computes the average of the independent variable (input) expression_x for the non-null paired data points.
regr_avgx(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_avgy
¶
Computes the average of the dependent variable (output) expression_y for the non-null paired data points.
regr_avgy(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_count
¶
Counts the number of non-null paired data points.
regr_count(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_intercept
¶
Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.
regr_intercept(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_r2
¶
Computes the square of the correlation coefficient between the independent and dependent variables.
regr_r2(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_slope
¶
Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.
regr_slope(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_sxx
¶
Computes the sum of squares of the independent variable.
regr_sxx(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_sxy
¶
Computes the sum of products of paired data points.
regr_sxy(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
regr_syy
¶
Computes the sum of squares of the dependent variable.
regr_syy(expression_y, expression_x)
Arguments¶
expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
stddev
¶
Returns the standard deviation of a set of numbers.
stddev(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT stddev(column_name) FROM table_name;
+----------------------+
| stddev(column_name) |
+----------------------+
| 12.34 |
+----------------------+
Aliases¶
stddev_samp
stddev_pop
¶
Returns the population standard deviation of a set of numbers.
stddev_pop(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT stddev_pop(column_name) FROM table_name;
+--------------------------+
| stddev_pop(column_name) |
+--------------------------+
| 10.56 |
+--------------------------+
stddev_samp
¶
Alias of stddev.
Approximate Functions¶
approx_distinct
¶
Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.
approx_distinct(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT approx_distinct(column_name) FROM table_name;
+-----------------------------------+
| approx_distinct(column_name) |
+-----------------------------------+
| 42 |
+-----------------------------------+
approx_median
¶
Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(x, 0.5)
.
approx_median(expression)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example¶
> SELECT approx_median(column_name) FROM table_name;
+-----------------------------------+
| approx_median(column_name) |
+-----------------------------------+
| 23.5 |
+-----------------------------------+
approx_percentile_cont
¶
Returns the approximate percentile of input values using the t-digest algorithm.
approx_percentile_cont(expression, percentile, centroids)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).
centroids: Number of centroids to use in the t-digest algorithm. Default is 100. A higher number results in more accurate approximation but requires more memory.
Example¶
> SELECT approx_percentile_cont(column_name, 0.75, 100) FROM table_name;
+-------------------------------------------------+
| approx_percentile_cont(column_name, 0.75, 100) |
+-------------------------------------------------+
| 65.0 |
+-------------------------------------------------+
approx_percentile_cont_with_weight
¶
Returns the weighted approximate percentile of input values using the t-digest algorithm.
approx_percentile_cont_with_weight(expression, weight, percentile)
Arguments¶
expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.
percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).
Example¶
> SELECT approx_percentile_cont_with_weight(column_name, weight_column, 0.90) FROM table_name;
+----------------------------------------------------------------------+
| approx_percentile_cont_with_weight(column_name, weight_column, 0.90) |
+----------------------------------------------------------------------+
| 78.5 |
+----------------------------------------------------------------------+