Window functions¶
Window functions are functions which perform a computation across a set of rows which are related to the current row. This is comparable to aggregation functions, but window functions do not cause multiple rows to be grouped into a single row.
Table of contents
Window function call¶
Synopsis¶
The synopsis of a window function call is one of the following
function_name ( { * | [ expression [, expression ... ] ] } )
[ FILTER ( WHERE condition ) ]
[ { RESPECT | IGNORE } NULLS ]
over_clause
where function_name
is a name of a general-purpose window or aggregate function and expression
is a column reference, scalar
function or literal.
If FILTER
is specified, then only the rows that met the WHERE condition are supplied to the window function. Only window
functions that are aggregates accept the FILTER
clause.
If IGNORE NULLS
option is specified, then the null values are excluded from
the window function executions. The window functions that support this option
are: lead(arg [, offset [, default] ]), lag(arg [, offset [, default] ]),
first_value(arg), last_value(arg),
and nth_value(arg, number). If a function supports this option and
it is not specified, then RESPECT NULLS
is set by default.
The OVER clause is what declares a function to be a window function.
The window function call that uses a wildcard
instead of an expression
as a function argument is supported only by the count(*)
aggregate
function.
Window definition¶
OVER¶
Synopsis¶
OVER { window_name | ( [ window_definition ] ) }
where window_definition
has the syntax
window_definition:
[ window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ { RANGE | ROWS } BETWEEN frame_start AND frame_end ]
The window_name
refers to window_definition
defined in the
WINDOW clause.
The frame_start
and frame_end
can be one of
UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING
The default frame definition is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT
ROW
. If frame_end
is omitted it defaults to CURRENT ROW
.
frame_start
cannot be FOLLOWING
or UNBOUNDED FOLLOWING
and
frame_end
cannot be PRECEDING
or UNBOUNDED PRECEDING
.
In RANGE
mode if the frame_start
is CURRENT ROW
the frame starts
with the current row’s first peer (a row that the window’s ORDER BY
expression sorts as equal to the current row), while
a frame_end
of CURRENT ROW
means the frame will end with the current’s
row last peer row.
In ROWS
mode CURRENT_ROW
means the current row.
The offset PRECEDING
and offset FOLLOWING
options vary in meaning
depending on the frame mode. In ROWS
mode, the offset
is an integer
indicating that the frame start or end is offsetted by that many rows before or
after the current row. In RANGE
mode, the use of a custom offset
option
requires that there is exactly one ORDER BY
column in the window
definition. The frame contains those rows whose ordering column value is no
more than offset
minus (for PRECEDING
) or plus (for FOLLOWING
) the
current row’s ordering column value. Because the value of offset
is
subtracted/added to the values of the ordering column, only type combinations
that support addition/subtraction operations are allowed. For instance, when
the ordering column is of type timestamp, the
offset
expression can be an interval.
The OVER clause defines the window
containing the
appropriate rows which will take part in the window function
computation.
An empty OVER clause defines a window
containing
all the rows in the result set.
Example:
cr> SELECT dept_id, COUNT(*) OVER() AS cnt FROM employees ORDER BY 1, 2;
+---------+-----+
| dept_id | cnt |
+---------+-----+
| 4001 | 18 |
| 4001 | 18 |
| 4001 | 18 |
| 4002 | 18 |
| 4002 | 18 |
| 4002 | 18 |
| 4002 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4003 | 18 |
| 4004 | 18 |
| 4004 | 18 |
| 4004 | 18 |
| 4006 | 18 |
| 4006 | 18 |
| 4006 | 18 |
+---------+-----+
SELECT 18 rows in set (... sec)
The PARTITION BY
clause groups the rows within a window into
partitions which are processed separately by the window function, each
partition in turn becoming a window. If PARTITION BY
is not specified, all
the rows are considered a single partition.
Example:
cr> SELECT dept_id, ROW_NUMBER() OVER(PARTITION BY dept_id) AS row_num
... FROM employees ORDER BY 1, 2;
+---------+---------+
| dept_id | row_num |
+---------+---------+
| 4001 | 1 |
| 4001 | 2 |
| 4001 | 3 |
| 4002 | 1 |
| 4002 | 2 |
| 4002 | 3 |
| 4002 | 4 |
| 4003 | 1 |
| 4003 | 2 |
| 4003 | 3 |
| 4003 | 4 |
| 4003 | 5 |
| 4004 | 1 |
| 4004 | 2 |
| 4004 | 3 |
| 4006 | 1 |
| 4006 | 2 |
| 4006 | 3 |
+---------+---------+
SELECT 18 rows in set (... sec)
If ORDER BY
is supplied the window
definition consists of a range of
rows starting with the first row in the partition
and ending with the
current row, plus any subsequent rows that are equal to the current row, which
are the current row’s peers
.
Example:
cr> SELECT
... dept_id,
... sex,
... COUNT(*) OVER(PARTITION BY dept_id ORDER BY sex) AS cnt
... FROM employees
... ORDER BY 1, 2, 3
+---------+-----+-----+
| dept_id | sex | cnt |
+---------+-----+-----+
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4002 | F | 1 |
| 4002 | M | 4 |
| 4002 | M | 4 |
| 4002 | M | 4 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4004 | F | 1 |
| 4004 | M | 3 |
| 4004 | M | 3 |
| 4006 | F | 1 |
| 4006 | M | 3 |
| 4006 | M | 3 |
+---------+-----+-----+
SELECT 18 rows in set (... sec)
Note
Taking into account the peers
concept mentioned above, for an empty
OVER clause all the rows in the result set are
peers
.
Note
Aggregation functions will be treated as window
functions
when used in conjunction with the OVER
clause.
Note
Window definitions order or partitioned by an array column type are currently not supported.
In the UNBOUNDED FOLLOWING
case the window
for each row starts with
each row and ends with the last row in the current partition
. If the
current row
has peers
the window
will include (or start with) all
the current row
peers and end at the upper bound of the partition
.
Example:
cr> SELECT
... dept_id,
... sex,
... COUNT(*) OVER(
... PARTITION BY dept_id
... ORDER BY
... sex RANGE BETWEEN CURRENT ROW
... AND UNBOUNDED FOLLOWING
... ) partitionByDeptOrderBySex
... FROM employees
... ORDER BY 1, 2, 3
+---------+-----+---------------------------+
| dept_id | sex | partitionbydeptorderbysex |
+---------+-----+---------------------------+
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4001 | M | 3 |
| 4002 | F | 4 |
| 4002 | M | 3 |
| 4002 | M | 3 |
| 4002 | M | 3 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4003 | M | 5 |
| 4004 | F | 3 |
| 4004 | M | 2 |
| 4004 | M | 2 |
| 4006 | F | 3 |
| 4006 | M | 2 |
| 4006 | M | 2 |
+---------+-----+---------------------------+
SELECT 18 rows in set (... sec)
Named windows¶
It is possible to define a list of named window definitions that can be referenced in OVER clauses. To do this, use the WINDOW clause in the SELECT clause.
Named windows are particularly useful when the same window definition could be used in multiple OVER clauses. For instance
cr> SELECT
... x,
... FIRST_VALUE(x) OVER (w) AS "first",
... LAST_VALUE(x) OVER (w) AS "last"
... FROM (VALUES (1), (2), (3), (4)) AS t(x)
... WINDOW w AS (ORDER BY x)
+---+-------+------+
| x | first | last |
+---+-------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
+---+-------+------+
SELECT 4 rows in set (... sec)
If a window_name
is specified in the window definition of the
OVER clause, then there must be a named window entry
that matches the window_name
in the window definition list of the
WINDOW clause.
If the OVER clause has its own non-empty window definition and references a window definition from the WINDOW clause, then it can only add clauses from the referenced window, but not overwrite them.
cr> SELECT
... x,
... LAST_VALUE(x) OVER (w ORDER BY x) AS y
... FROM (VALUES
... (1, 1),
... (2, 1),
... (3, 2),
... (4, 2) ) AS t(x, y)
... WINDOW w AS (PARTITION BY y)
+---+---+
| x | y |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+---+---+
SELECT 4 rows in set (... sec)
Otherwise, an attempt to override the clauses of the referenced window by the window definition of the OVER clause will result in failure.
cr> SELECT
... FIRST_VALUE(x) OVER (w ORDER BY x)
... FROM (VALUES(1), (2), (3), (4)) as t(x)
... WINDOW w AS (ORDER BY x)
SQLParseException[Cannot override ORDER BY clause of window w]
It is not possible to define the PARTITION BY
clause in the window
definition of the OVER clause if it references a
window definition from the WINDOW clause.
The window definitions in the WINDOW clause cannot define its own window frames, if they are referenced by non-empty window definitions of the OVER clauses.
The definition of the named window can itself begin with a window_name
. In
this case all the elements of interconnected named windows will be copied to
the window definition of the OVER clause if it
references the named window definition that has subsequent window
references. The window definitions in the WINDOW
clause permits only
backward references.
cr> SELECT
... x,
... ROW_NUMBER() OVER (w) AS y
... FROM (VALUES
... (1, 1),
... (3, 2),
... (2, 1)) AS t (x, y)
... WINDOW p AS (PARTITION BY y),
... w AS (p ORDER BY x)
+---+---+
| x | y |
+---+---+
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
+---+---+
SELECT 3 rows in set (... sec)
General-purpose window functions¶
row_number()
¶
Returns the number of the current row within its window.
Example:
cr> SELECT
... col1,
... ROW_NUMBER() OVER(ORDER BY col1) as row_num
... FROM (VALUES('x'), ('y'), ('z')) AS t;
+------+---------+
| col1 | row_num |
+------+---------+
| x | 1 |
| y | 2 |
| z | 3 |
+------+---------+
SELECT 3 rows in set (... sec)
first_value(arg)
¶
Returns the argument value evaluated at the first row within the window.
Its return type is the type of its argument.
Example:
cr> SELECT
... col1,
... FIRST_VALUE(col1) OVER (ORDER BY col1) AS value
... FROM (VALUES('x'), ('y'), ('y'), ('z')) AS t;
+------+-------+
| col1 | value |
+------+-------+
| x | x |
| y | x |
| y | x |
| z | x |
+------+-------+
SELECT 4 rows in set (... sec)
last_value(arg)
¶
Returns the argument value evaluated at the last row within the window.
Its return type is the type of its argument.
Example:
cr> SELECT
... col1,
... LAST_VALUE(col1) OVER(ORDER BY col1) AS value
... FROM (VALUES('x'), ('y'), ('y'), ('z')) AS t;
+------+-------+
| col1 | value |
+------+-------+
| x | x |
| y | y |
| y | y |
| z | z |
+------+-------+
SELECT 4 rows in set (... sec)
nth_value(arg, number)
¶
Returns the argument value evaluated at row that is
the nth row within the window. NULL
is returned if the nth row doesn’t
exist in the window.
Its return type is the type of its first argument.
Example:
cr> SELECT
... col1,
... NTH_VALUE(col1, 3) OVER(ORDER BY col1) AS val
... FROM (VALUES ('x'), ('y'), ('y'), ('z')) AS t;
+------+------+
| col1 | val |
+------+------+
| x | NULL |
| y | y |
| y | y |
| z | y |
+------+------+
SELECT 4 rows in set (... sec)
lag(arg [, offset [, default] ])
¶
Synopsis¶
lag(argument any [, offset integer [, default any]])
Returns the argument value evaluated at the row that
precedes the current row by the offset within the partition. If there is no
such row, the return value is default
. If offset
or default
arguments are missing, they default to 1
and null
, respectively.
Both offset
and default
are evaluated with respect to the current row.
If offset
is 0
, then argument value is evaluated for the current row.
The default
and argument
data types must match.
Example:
cr> SELECT
... dept_id,
... year,
... budget,
... LAG(budget) OVER(
... PARTITION BY dept_id) prev_budget
... FROM (VALUES
... (1, 2017, 45000),
... (1, 2018, 35000),
... (2, 2017, 15000),
... (2, 2018, 65000),
... (2, 2019, 12000))
... as t (dept_id, year, budget);
+---------+------+--------+-------------+
| dept_id | year | budget | prev_budget |
+---------+------+--------+-------------+
| 1 | 2017 | 45000 | NULL |
| 1 | 2018 | 35000 | 45000 |
| 2 | 2017 | 15000 | NULL |
| 2 | 2018 | 65000 | 15000 |
| 2 | 2019 | 12000 | 65000 |
+---------+------+--------+-------------+
SELECT 5 rows in set (... sec)
lead(arg [, offset [, default] ])
¶
Synopsis¶
lead(argument any [, offset integer [, default any]])
The lead
function is the counterpart of the lag window function as it allows the evaluation
of the argument at rows that follow the current row. lead
returns the
argument value evaluated at the row that follows the current row by the offset
within the partition. If there is no such row, the return value is default
.
If offset
or default
arguments are missing, they default to 1
or
null
, respectively.
Both offset
and default
are evaluated with respect to the current row.
If offset
is 0
, then argument value is evaluated for the current row.
The default
and argument
data types must match.
Example:
cr> SELECT
... dept_id,
... year,
... budget,
... LEAD(budget) OVER(
... PARTITION BY dept_id) next_budget
... FROM (VALUES
... (1, 2017, 45000),
... (1, 2018, 35000),
... (2, 2017, 15000),
... (2, 2018, 65000),
... (2, 2019, 12000))
... as t (dept_id, year, budget);
+---------+------+--------+-------------+
| dept_id | year | budget | next_budget |
+---------+------+--------+-------------+
| 1 | 2017 | 45000 | 35000 |
| 1 | 2018 | 35000 | NULL |
| 2 | 2017 | 15000 | 65000 |
| 2 | 2018 | 65000 | 12000 |
| 2 | 2019 | 12000 | NULL |
+---------+------+--------+-------------+
SELECT 5 rows in set (... sec)
rank()
¶
Synopsis¶
rank()
Returns the rank of every row within a partition of a result set.
Within each partition, the rank of the first row is 1
. Subsequent tied
rows are given the same rank, and the potential rank of the next row
is incremented. Because of this, ranks may not be sequential.
Example:
cr> SELECT
... name,
... department_id,
... salary,
... RANK() OVER (ORDER BY salary desc) as salary_rank
... FROM (VALUES
... ('Bobson Dugnutt', 1, 2000),
... ('Todd Bonzalez', 2, 2500),
... ('Jess Brewer', 1, 2500),
... ('Safwan Buchanan', 1, 1900),
... ('Hal Dodd', 1, 2500),
... ('Gillian Hawes', 2, 2000))
... as t (name, department_id, salary);
+-----------------+---------------+--------+-------------+
| name | department_id | salary | salary_rank |
+-----------------+---------------+--------+-------------+
| Todd Bonzalez | 2 | 2500 | 1 |
| Jess Brewer | 1 | 2500 | 1 |
| Hal Dodd | 1 | 2500 | 1 |
| Bobson Dugnutt | 1 | 2000 | 4 |
| Gillian Hawes | 2 | 2000 | 4 |
| Safwan Buchanan | 1 | 1900 | 6 |
+-----------------+---------------+--------+-------------+
SELECT 6 rows in set (... sec)
dense_rank()
¶
Synopsis¶
dense_rank()
Returns the rank of every row within a partition of a result set, similar to
rank
. However, unlike rank
, dense_rank
always returns sequential
rank values.
Within each partition, the rank of the first row is 1
. Subsequent tied
rows are given the same rank.
Example:
cr> SELECT
... name,
... department_id,
... salary,
... DENSE_RANK() OVER (ORDER BY salary desc) as salary_rank
... FROM (VALUES
... ('Bobson Dugnutt', 1, 2000),
... ('Todd Bonzalez', 2, 2500),
... ('Jess Brewer', 1, 2500),
... ('Safwan Buchanan', 1, 1900),
... ('Hal Dodd', 1, 2500),
... ('Gillian Hawes', 2, 2000))
... as t (name, department_id, salary);
+-----------------+---------------+--------+-------------+
| name | department_id | salary | salary_rank |
+-----------------+---------------+--------+-------------+
| Todd Bonzalez | 2 | 2500 | 1 |
| Jess Brewer | 1 | 2500 | 1 |
| Hal Dodd | 1 | 2500 | 1 |
| Bobson Dugnutt | 1 | 2000 | 2 |
| Gillian Hawes | 2 | 2000 | 2 |
| Safwan Buchanan | 1 | 1900 | 3 |
+-----------------+---------------+--------+-------------+
SELECT 6 rows in set (... sec)