Presto 0.100 Documentation

5.13. 窗口函数

5.13. 窗口函数

窗口函数主要用于在查询结果的所有行之间进行计算。窗口函数运行在HAVING语句之后,但是运行在ORDER BY语句之前。如果想要调用窗口函数,需要使用OVER语句来指定窗口。一个窗口有3个组成部分(这里就不做汉化了,因为我感觉直接看英文更清楚一点):

例如:下面的查询将orders表中的信息按照每个出纳员营业额的大小进行排序:

SELECT orderkey, clerk, totalprice,
       rank() OVER (PARTITION BY clerk
                    ORDER BY totalprice DESC) AS rnk
FROM orders
ORDER BY clerk, rnk

Aggregate Functions

All Aggregate Functions can be used as window functions by adding the OVER clause. The aggregate function is computed for each row over the rows within the current row’s window frame.

For example, the following query produces a rolling sum of order prices by day for each clerk:

SELECT clerk, orderdate, orderkey, totalprice,
       sum(totalprice) OVER (PARTITION BY clerk
                             ORDER BY orderdate) AS rolling_sum
FROM orders
ORDER BY clerk, orderdate, orderkey

排序函数

cume_dist() → bigint

Returns the cumulative distribution of a value in a group of values. The result is the number of rows preceding or peer with the row in the window ordering of the window partition divided by the total number of rows in the window partition. Thus, any tie values in the ordering will evaluate to the same distribution value.

dense_rank() → bigint

Returns the rank of a value in a group of values. This is similar to rank(), except that tie values do not produce gaps in the sequence.

ntile(n) → bigint

Divides the rows for each window partition into n buckets ranging from 1 to at most n. Bucket values will differ by at most 1. If the number of rows in the partition does not divide evenly into the number of buckets, then the remainder values are distributed one per bucket, starting with the first bucket.

For example, with 6 rows and 4 buckets, the bucket values would be as follows: 1 1 2 2 3 4

percent_rank() → bigint

Returns the percentage ranking of a value in group of values. The result is (r - 1) / (n - 1) where r is the rank() of the row and n is the total number of rows in the window partition.

rank() → bigint

Returns the rank of a value in a group of values. The rank is one plus the number of rows preceding the row that are not peer with the row. Thus, tie values in the ordering will produce gaps in the sequence. The ranking is performed for each window partition.

row_number() → bigint

Returns a unique, sequential number for each row, starting with one, according to the ordering of rows within the window partition.

Value Functions

first_value(x) → [same as input]

Returns the first value of the window.

last_value(x) → [same as input]

Returns the last value of the window.

nth_value(x, offset) → [same as input]

Returns the value at the specified offset from beginning the window. Offsets start at 1. The offset can be any scalar expression. If the offset is null or greater than the number of values in the window, null is returned. It is an error for the offset to be zero or negative.

lead(x[, offset[, default_value]]) → [same as input]

Returns the value at offset rows after the current row in the window. Offsets start at 0, which is the current row. The offset can be any scalar expression. The default offset is 1. If the offset is null or larger than the window, the default_value is returned, or if it is not specified null is returned.

lag(x[, offset[, default_value]]) → [same as input]

Returns the value at offset rows before the current row in the window Offsets start at 0, which is the current row. The offset can be any scalar expression. The default offset is 1. If the offset is null or larger than the window, the default_value is returned, or if it is not specified null is returned.