Aggregate Functions¶

Aggregate functions operate on a set of values to compute a single result.

General Functions¶

array_agg
avg
bit_and
bit_or
bit_xor
bool_and
bool_or
count
first_value
grouping
last_value
max
mean
median
min
string_agg
sum
var
var_pop
var_population
var_samp
var_sample

`array_agg`¶

Returns an array created from the expression elements. If ordering is required, elements are inserted in the specified order.

array_agg(expression [ORDER BY expression])

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT array_agg(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| array_agg(column_name ORDER BY other_column)  |
+-----------------------------------------------+
| [element1, element2, element3]                |
+-----------------------------------------------+

`avg`¶

Returns the average of numeric values in the specified column.

avg(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT avg(column_name) FROM table_name;
+---------------------------+
| avg(column_name)           |
+---------------------------+
| 42.75                      |
+---------------------------+

Aliases¶

mean

`bit_and`¶

Computes the bitwise AND of all non-null input values.

bit_and(expression)

Arguments¶

expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.

`bit_or`¶

Computes the bitwise OR of all non-null input values.

bit_or(expression)

Arguments¶

expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.

`bit_xor`¶

Computes the bitwise exclusive OR of all non-null input values.

bit_xor(expression)

Arguments¶

expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.

`bool_and`¶

Returns true if all non-null input values are true, otherwise false.

bool_and(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT bool_and(column_name) FROM table_name;
+----------------------------+
| bool_and(column_name)       |
+----------------------------+
| true                        |
+----------------------------+

`bool_or`¶

Returns true if all non-null input values are true, otherwise false.

bool_and(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT bool_and(column_name) FROM table_name;
+----------------------------+
| bool_and(column_name)       |
+----------------------------+
| true                        |
+----------------------------+

`count`¶

Returns the number of non-null values in the specified column. To include null values in the total count, use count(*).

count(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT count(column_name) FROM table_name;
+-----------------------+
| count(column_name)     |
+-----------------------+
| 100                   |
+-----------------------+

> SELECT count(*) FROM table_name;
+------------------+
| count(*)         |
+------------------+
| 120              |
+------------------+

`first_value`¶

Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

first_value(expression [ORDER BY expression])

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT first_value(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| first_value(column_name ORDER BY other_column)|
+-----------------------------------------------+
| first_element                                 |
+-----------------------------------------------+

`grouping`¶

Returns 1 if the data is aggregated across the specified column, or 0 if it is not aggregated in the result set.

grouping(expression)

Arguments¶

expression: Expression to evaluate whether data is aggregated across the specified column. Can be a constant, column, or function.

Example¶

> SELECT column_name, GROUPING(column_name) AS group_column
  FROM table_name
  GROUP BY GROUPING SETS ((column_name), ());
+-------------+-------------+
| column_name | group_column |
+-------------+-------------+
| value1      | 0           |
| value2      | 0           |
| NULL        | 1           |
+-------------+-------------+

`last_value`¶

Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

first_value(expression [ORDER BY expression])

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT first_value(column_name ORDER BY other_column) FROM table_name;
+-----------------------------------------------+
| first_value(column_name ORDER BY other_column)|
+-----------------------------------------------+
| first_element                                 |
+-----------------------------------------------+

`max`¶

Returns the maximum value in the specified column.

max(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT max(column_name) FROM table_name;
+----------------------+
| max(column_name)      |
+----------------------+
| 150                  |
+----------------------+

`mean`¶

Alias of avg.

`median`¶

Returns the median value in the specified column.

median(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT median(column_name) FROM table_name;
+----------------------+
| median(column_name)   |
+----------------------+
| 45.5                 |
+----------------------+

`min`¶

Returns the maximum value in the specified column.

max(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT max(column_name) FROM table_name;
+----------------------+
| max(column_name)      |
+----------------------+
| 150                  |
+----------------------+

`string_agg`¶

Concatenates the values of string expressions and places separator values between them.

string_agg(expression, delimiter)

Arguments¶

expression: The string expression to concatenate. Can be a column or any valid string expression.
delimiter: A literal string used as a separator between the concatenated values.

Example¶

> SELECT string_agg(name, ', ') AS names_list
  FROM employee;
+--------------------------+
| names_list               |
+--------------------------+
| Alice, Bob, Charlie      |
+--------------------------+

`sum`¶

Returns the sum of all values in the specified column.

sum(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT sum(column_name) FROM table_name;
+-----------------------+
| sum(column_name)       |
+-----------------------+
| 12345                 |
+-----------------------+

`var`¶

Returns the statistical sample variance of a set of numbers.

var(expression)

Arguments¶

expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.

Aliases¶

var_sample
var_samp

`var_pop`¶

Returns the statistical population variance of a set of numbers.

var_pop(expression)

Arguments¶

expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.

Aliases¶

var_population

`var_population`¶

Alias of var_pop.

`var_samp`¶

Alias of var.

`var_sample`¶

Alias of var.

Statistical Functions¶

corr
covar
covar_pop
covar_samp
nth_value
regr_avgx
regr_avgy
regr_count
regr_intercept
regr_r2
regr_slope
regr_sxx
regr_sxy
regr_syy
stddev
stddev_pop
stddev_samp

`corr`¶

Returns the coefficient of correlation between two numeric values.

corr(expression1, expression2)

Arguments¶

expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT corr(column1, column2) FROM table_name;
+--------------------------------+
| corr(column1, column2)         |
+--------------------------------+
| 0.85                           |
+--------------------------------+

`covar`¶

Alias of covar_samp.

`covar_pop`¶

Returns the sample covariance of a set of number pairs.

covar_samp(expression1, expression2)

Arguments¶

expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT covar_samp(column1, column2) FROM table_name;
+-----------------------------------+
| covar_samp(column1, column2)      |
+-----------------------------------+
| 8.25                              |
+-----------------------------------+

`covar_samp`¶

Returns the sample covariance of a set of number pairs.

covar_samp(expression1, expression2)

Arguments¶

expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT covar_samp(column1, column2) FROM table_name;
+-----------------------------------+
| covar_samp(column1, column2)      |
+-----------------------------------+
| 8.25                              |
+-----------------------------------+

Aliases¶

covar

`nth_value`¶

Returns the nth value in a group of values.

nth_value(expression, n ORDER BY expression)

Arguments¶

expression: The column or expression to retrieve the nth value from.
n: The position (nth) of the value to retrieve, based on the ordering.

Example¶

> SELECT dept_id, salary, NTH_VALUE(salary, 2) OVER (PARTITION BY dept_id ORDER BY salary ASC) AS second_salary_by_dept
  FROM employee;
+---------+--------+-------------------------+
| dept_id | salary | second_salary_by_dept   |
+---------+--------+-------------------------+
| 1       | 30000  | NULL                    |
| 1       | 40000  | 40000                   |
| 1       | 50000  | 40000                   |
| 2       | 35000  | NULL                    |
| 2       | 45000  | 45000                   |
+---------+--------+-------------------------+

`regr_avgx`¶

Computes the average of the independent variable (input) expression_x for the non-null paired data points.

regr_avgx(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_avgy`¶

Computes the average of the dependent variable (output) expression_y for the non-null paired data points.

regr_avgy(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_count`¶

Counts the number of non-null paired data points.

regr_count(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_intercept`¶

Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.

regr_intercept(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_r2`¶

Computes the square of the correlation coefficient between the independent and dependent variables.

regr_r2(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_slope`¶

Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.

regr_slope(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_sxx`¶

Computes the sum of squares of the independent variable.

regr_sxx(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_sxy`¶

Computes the sum of products of paired data points.

regr_sxy(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`regr_syy`¶

Computes the sum of squares of the dependent variable.

regr_syy(expression_y, expression_x)

Arguments¶

expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

`stddev`¶

Returns the standard deviation of a set of numbers.

stddev(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT stddev(column_name) FROM table_name;
+----------------------+
| stddev(column_name)   |
+----------------------+
| 12.34                |
+----------------------+

Aliases¶

stddev_samp

`stddev_pop`¶

Returns the standard deviation of a set of numbers.

stddev(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT stddev(column_name) FROM table_name;
+----------------------+
| stddev(column_name)   |
+----------------------+
| 12.34                |
+----------------------+

`stddev_samp`¶

Alias of stddev.

`approx_distinct`¶

Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.

approx_distinct(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT approx_distinct(column_name) FROM table_name;
+-----------------------------------+
| approx_distinct(column_name)      |
+-----------------------------------+
| 42                                |
+-----------------------------------+

`approx_median`¶

Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(x, 0.5).

approx_median(expression)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.

Example¶

> SELECT approx_median(column_name) FROM table_name;
+-----------------------------------+
| approx_median(column_name)        |
+-----------------------------------+
| 23.5                              |
+-----------------------------------+

`approx_percentile_cont`¶

Returns the approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont(expression, percentile, centroids)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).
centroids: Number of centroids to use in the t-digest algorithm. Default is 100. A higher number results in more accurate approximation but requires more memory.

Example¶

> SELECT approx_percentile_cont(column_name, 0.75, 100) FROM table_name;
+-------------------------------------------------+
| approx_percentile_cont(column_name, 0.75, 100)  |
+-------------------------------------------------+
| 65.0                                            |
+-------------------------------------------------+

`approx_percentile_cont_with_weight`¶

Returns the weighted approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont_with_weight(expression, weight, percentile)

Arguments¶

expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.
percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).

Example¶

> SELECT approx_percentile_cont_with_weight(column_name, weight_column, 0.90) FROM table_name;
+----------------------------------------------------------------------+
| approx_percentile_cont_with_weight(column_name, weight_column, 0.90) |
+----------------------------------------------------------------------+
| 78.5                                                                 |
+----------------------------------------------------------------------+

Operators

Window Functions

Aggregate Functions¶

General Functions¶

array_agg¶

Arguments¶

Example¶

avg¶

Arguments¶

Example¶

Aliases¶

bit_and¶

Arguments¶

bit_or¶

Arguments¶

bit_xor¶

Arguments¶

bool_and¶

Arguments¶

Example¶

bool_or¶

Arguments¶

Example¶

count¶

Arguments¶

Example¶

first_value¶

Arguments¶

Example¶

grouping¶

Arguments¶

Example¶

last_value¶

Arguments¶

Example¶

max¶

Arguments¶

Example¶

mean¶

median¶

Arguments¶

Example¶

min¶

Arguments¶

Example¶

string_agg¶

Arguments¶

Example¶

sum¶

Arguments¶

Example¶

var¶

Arguments¶

Aliases¶

var_pop¶

Arguments¶

Aliases¶

var_population¶

var_samp¶

var_sample¶

Statistical Functions¶

corr¶

Arguments¶

Example¶

covar¶

covar_pop¶

Arguments¶

Example¶

covar_samp¶

Arguments¶

Example¶

Aliases¶

nth_value¶

Arguments¶

Example¶

regr_avgx¶

Arguments¶

regr_avgy¶

Arguments¶

regr_count¶

Arguments¶

regr_intercept¶

`array_agg`¶

`avg`¶

`bit_and`¶

`bit_or`¶

`bit_xor`¶

`bool_and`¶

`bool_or`¶

`count`¶

`first_value`¶

`grouping`¶

`last_value`¶

`max`¶

`mean`¶

`median`¶

`min`¶

`string_agg`¶

`sum`¶

`var`¶

`var_pop`¶

`var_population`¶

`var_samp`¶

`var_sample`¶

`corr`¶

`covar`¶

`covar_pop`¶

`covar_samp`¶

`nth_value`¶

`regr_avgx`¶

`regr_avgy`¶

`regr_count`¶

`regr_intercept`¶

`regr_r2`¶

`regr_slope`¶

`regr_sxx`¶

`regr_sxy`¶

`regr_syy`¶

`stddev`¶

`stddev_pop`¶

`stddev_samp`¶

`approx_distinct`¶

`approx_median`¶

`approx_percentile_cont`¶

`approx_percentile_cont_with_weight`¶