Aggregate Functions

Aggregate functions operate on a set of values to compute a single result.

General

avg

Returns the average of numeric values in the specified column.

avg(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

Aliases

  • mean

bit_and

Computes the bitwise AND of all non-null input values.

bit_and(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bit_or

Computes the bitwise OR of all non-null input values.

bit_or(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bit_xor

Computes the bitwise exclusive OR of all non-null input values.

bit_xor(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bool_and

Returns true if all non-null input values are true, otherwise false.

bool_and(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

bool_or

Returns true if any non-null input value is true, otherwise false.

bool_or(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

count

Returns the number of non-null values in the specified column.

To include null values in the total count, use count(*).

count(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

max

Returns the maximum value in the specified column.

max(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

mean

Alias of avg.

median

Returns the median value in the specified column.

median(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

min

Returns the minimum value in the specified column.

min(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

sum

Returns the sum of all values in the specified column.

sum(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

array_agg

Returns an array created from the expression elements. If ordering requirement is given, elements are inserted in the order of required ordering.

array_agg(expression [ORDER BY expression])

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

first_value

Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

first_value(expression [ORDER BY expression])

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

last_value

Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.

last_value(expression [ORDER BY expression])

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

Statistical

corr

Returns the coefficient of correlation between two numeric values.

corr(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

covar

Returns the covariance of a set of number pairs.

covar(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

covar_pop

Returns the population covariance of a set of number pairs.

covar_pop(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

covar_samp

Returns the sample covariance of a set of number pairs.

covar_samp(expression1, expression2)

Arguments

  • expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

stddev

Returns the standard deviation of a set of numbers.

stddev(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

stddev_pop

Returns the population standard deviation of a set of numbers.

stddev_pop(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

stddev_samp

Returns the sample standard deviation of a set of numbers.

stddev_samp(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

var

Returns the statistical variance of a set of numbers.

var(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

var_pop

Returns the statistical population variance of a set of numbers.

var_pop(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

var_samp

Returns the statistical sample variance of a set of numbers.

var_samp(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_slope

Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.

regr_slope(expression1, expression2)

Arguments

  • expression_y: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_avgx

Computes the average of the independent variable (input) expression_x for the non-null paired data points.

regr_avgx(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_avgy

Computes the average of the dependent variable (output) expression_y for the non-null paired data points.

regr_avgy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_count

Counts the number of non-null paired data points.

regr_count(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_intercept

Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.

regr_intercept(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_r2

Computes the square of the correlation coefficient between the independent and dependent variables.

regr_r2(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_sxx

Computes the sum of squares of the independent variable.

regr_sxx(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_syy

Computes the sum of squares of the dependent variable.

regr_syy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

regr_sxy

Computes the sum of products of paired data points.

regr_sxy(expression_y, expression_x)

Arguments

  • expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

  • expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.

Approximate

approx_distinct

Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.

approx_distinct(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

approx_median

Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(x, 0.5).

approx_median(expression)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

approx_percentile_cont

Returns the approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont(expression, percentile, centroids)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).

  • centroids: Number of centroids to use in the t-digest algorithm. Default is 100.

    If there are this number or fewer unique values, you can expect an exact result. A higher number of centroids results in a more accurate approximation, but requires more memory to compute.

approx_percentile_cont_with_weight

Returns the weighted approximate percentile of input values using the t-digest algorithm.

approx_percentile_cont_with_weight(expression, weight, percentile)

Arguments

  • expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

  • weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.

  • percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).