Aggregate Functions

Aggregate functions operate on a set of values to compute a single result.

General Functions

array_agg

Returns an array created from the expression elements. If ordering is required, elements are inserted in the specified order.

array_agg(expression [ORDER BY expression])

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT array_agg(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| array_agg(column_name ORDER BY other_column)  |
+-----------------------------------------------+
| [element1, element2, element3]                |
+-----------------------------------------------+

avg

Returns the average of numeric values in the specified column.

avg(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT avg(column_name) FROM table_name;
+---------------------------+
| avg(column_name)           |
+---------------------------+
| 42.75                      |
+---------------------------+

Aliases

  • mean

bit_and

Computes the bitwise AND of all non-null input values.

bit_and(expression)

Arguments

  • expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.

bit_or

Computes the bitwise OR of all non-null input values.

bit_or(expression)

Arguments

  • expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.

bit_xor

Computes the bitwise exclusive OR of all non-null input values.

bit_xor(expression)

Arguments

  • expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.

bool_and

Returns true if all non-null input values are true, otherwise false.

bool_and(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT bool_and(column_name) FROM table_name;
+----------------------------+
| bool_and(column_name)       |
+----------------------------+
| true                        |
+----------------------------+

bool_or

Returns true if all non-null input values are true, otherwise false.

bool_and(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT bool_and(column_name) FROM table_name;
+----------------------------+
| bool_and(column_name)       |
+----------------------------+
| true                        |
+----------------------------+

count

Returns the number of non-null values in the specified column. To include null values in the total count, use count(*).

count(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT count(column_name) FROM table_name;
+-----------------------+
| count(column_name)     |
+-----------------------+
| 100                   |
+-----------------------+

> SELECT count(*) FROM table_name;
+------------------+
| count(*)         |
+------------------+
| 120              |
+------------------+

first_value

Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

first_value(expression [ORDER BY expression])

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT first_value(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| first_value(column_name ORDER BY other_column)|
+-----------------------------------------------+
| first_element                                 |
+-----------------------------------------------+

grouping

Returns 1 if the data is aggregated across the specified column, or 0 if it is not aggregated in the result set.

grouping(expression)

Arguments

  • expression: Expression to evaluate whether data is aggregated across the specified column. Can be a constant, column, or function.

Example

> SELECT column_name, GROUPING(column_name) AS group_column
  FROM table_name
  GROUP BY GROUPING SETS ((column_name), ());
+-------------+-------------+
| column_name | group_column |
+-------------+-------------+
| value1      | 0           |
| value2      | 0           |
| NULL        | 1           |
+-------------+-------------+

last_value

Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

last_value(expression [ORDER BY expression])

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT last_value(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| last_value(column_name ORDER BY other_column) |
+-----------------------------------------------+
| last_element                                  |
+-----------------------------------------------+

max

Returns the maximum value in the specified column.

max(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT max(column_name) FROM table_name;
+----------------------+
| max(column_name)      |
+----------------------+
| 150                  |
+----------------------+

mean

Alias of avg.

median

Returns the median value in the specified column.

median(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT median(column_name) FROM table_name;
+----------------------+
| median(column_name)   |
+----------------------+
| 45.5                 |
+----------------------+

min

Returns the minimum value in the specified column.

min(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT min(column_name) FROM table_name;
+----------------------+
| min(column_name)      |
+----------------------+
| 12                   |
+----------------------+

string_agg

Concatenates the values of string expressions and places separator values between them.

string_agg(expression, delimiter)

Arguments

  • expression: The string expression to concatenate. Can be a column or any valid string expression.

  • delimiter: A literal string used as a separator between the concatenated values.

Example

> SELECT string_agg(name, ', ') AS names_list
  FROM employee;
+--------------------------+
| names_list               |
+--------------------------+
| Alice, Bob, Charlie      |
+--------------------------+

sum

Returns the sum of all values in the specified column.

sum(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT sum(column_name) FROM table_name;
+-----------------------+
| sum(column_name)       |
+-----------------------+
| 12345                 |
+-----------------------+

var

Returns the statistical sample variance of a set of numbers.

var(expression)

Arguments

  • expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.

Aliases

  • var_sample

  • var_samp

var_pop

Returns the statistical population variance of a set of numbers.

var_pop(expression)

Arguments

  • expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.

Aliases

  • var_population

var_population

Alias of var_pop.

var_samp

Alias of var.

var_sample

Alias of var.

Statistical Functions

corr

Returns the coefficient of correlation between two numeric values.

corr(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT corr(column1, column2) FROM table_name;
+--------------------------------+
| corr(column1, column2)         |
+--------------------------------+
| 0.85                           |
+--------------------------------+

covar

Alias of covar_samp.

covar_pop

Returns the sample covariance of a set of number pairs.

covar_samp(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT covar_samp(column1, column2) FROM table_name;
+-----------------------------------+
| covar_samp(column1, column2)      |
+-----------------------------------+
| 8.25                              |
+-----------------------------------+

covar_samp

Returns the sample covariance of a set of number pairs.

covar_samp(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT covar_samp(column1, column2) FROM table_name;
+-----------------------------------+
| covar_samp(column1, column2)      |
+-----------------------------------+
| 8.25                              |
+-----------------------------------+

Aliases

  • covar

nth_value

Returns the nth value in a group of values.

nth_value(expression, n ORDER BY expression)

Arguments

  • expression: The column or expression to retrieve the nth value from.

  • n: The position (nth) of the value to retrieve, based on the ordering.

Example

> SELECT dept_id, salary, NTH_VALUE(salary, 2) OVER (PARTITION BY dept_id ORDER BY salary ASC) AS second_salary_by_dept
  FROM employee;
+---------+--------+-------------------------+
| dept_id | salary | second_salary_by_dept   |
+---------+--------+-------------------------+
| 1       | 30000  | NULL                    |
| 1       | 40000  | 40000                   |
| 1       | 50000  | 40000                   |
| 2       | 35000  | NULL                    |
| 2       | 45000  | 45000                   |
+---------+--------+-------------------------+

regr_avgx

Computes the average of the independent variable (input) expression_x for the non-null paired data points.

regr_avgx(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_avgy

Computes the average of the dependent variable (output) expression_y for the non-null paired data points.

regr_avgy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_count

Counts the number of non-null paired data points.

regr_count(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_intercept

Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.

regr_intercept(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_r2

Computes the square of the correlation coefficient between the independent and dependent variables.

regr_r2(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_slope

Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.

regr_slope(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_sxx

Computes the sum of squares of the independent variable.

regr_sxx(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_sxy

Computes the sum of products of paired data points.

regr_sxy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

regr_syy

Computes the sum of squares of the dependent variable.

regr_syy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

stddev

Returns the standard deviation of a set of numbers.

stddev(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT stddev(column_name) FROM table_name;
+----------------------+
| stddev(column_name)   |
+----------------------+
| 12.34                |
+----------------------+

Aliases

  • stddev_samp

stddev_pop

Returns the population standard deviation of a set of numbers.

stddev_pop(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT stddev_pop(column_name) FROM table_name;
+--------------------------+
| stddev_pop(column_name)   |
+--------------------------+
| 10.56                    |
+--------------------------+

stddev_samp

Alias of stddev.

Approximate Functions

approx_distinct

Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.

approx_distinct(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT approx_distinct(column_name) FROM table_name;
+-----------------------------------+
| approx_distinct(column_name)      |
+-----------------------------------+
| 42                                |
+-----------------------------------+

approx_median

Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(x, 0.5).

approx_median(expression)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example

> SELECT approx_median(column_name) FROM table_name;
+-----------------------------------+
| approx_median(column_name)        |
+-----------------------------------+
| 23.5                              |
+-----------------------------------+

approx_percentile_cont

Returns the approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont(expression, percentile, centroids)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).

  • centroids: Number of centroids to use in the t-digest algorithm. Default is 100. A higher number results in more accurate approximation but requires more memory.

Example

> SELECT approx_percentile_cont(column_name, 0.75, 100) FROM table_name;
+-------------------------------------------------+
| approx_percentile_cont(column_name, 0.75, 100)  |
+-------------------------------------------------+
| 65.0                                            |
+-------------------------------------------------+

approx_percentile_cont_with_weight

Returns the weighted approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont_with_weight(expression, weight, percentile)

Arguments

  • expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

  • weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.

  • percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).

Example

> SELECT approx_percentile_cont_with_weight(column_name, weight_column, 0.90) FROM table_name;
+----------------------------------------------------------------------+
| approx_percentile_cont_with_weight(column_name, weight_column, 0.90) |
+----------------------------------------------------------------------+
| 78.5                                                                 |
+----------------------------------------------------------------------+