Window functions
Window functions add another tool to our data analysis tool belt, enabling us to perform aggregations while retaining the original data in the result set: so, they are like aggregation functions, but more flexible in that we can also return columns unrelated to the aggregation!
Let’s jump straight into a practical example, where we want to see the sum of all invoice amounts next to the other columns of our (simplified) invoices
table shown below, on the left side:
As you can see, the resultset on the right contains the very same columns and values as the original, plus one new column that always contains the same value: the sum of all invoice amounts.
A query to produce such a result would look like this:
SELECT
id,
country,
total,
sum(total) OVER ()
FROM learnsql.invoices;
Note: To keep examples as simple as possible, we’re now back to the initial database with a single table!
This query looks remarkably similar to the aggregation queries that we have seen previously, but there is an important difference: instead of the GROUP BY
clause, there is an OVER ()
clause next to the aggregation function, which is our windowing clause and turns our function into a window function.
OVER ()
is the simplest form of windowing clause and essentially means: apply the aggregation over all rows in the table. You can think of this query as the equivalent of running the following two queries and then combining their results:
SELECT id, country, total FROM learnsql.invoices
SELECT sum(total) from learnsql.invoices
Partitioning
Let’s now say that we want to get the sum of invoice amounts per country, which would produce a result like this:
The query would now look as follows:
SELECT
id,
country,
total,
sum(total) OVER (PARTITION BY country)
FROM learnsql.invoices;
PARTITION BY
brings grouping to window functions and the amount we now get is the total per country rather than the overall total.
Retaining access to the unaggregated data also enables us to see how it compares to aggregated data. For example, this query compares the total amount of each invoice with the overall amount per country, returning a value that indicates how much the invoice contributes, in percentage, to the total:
SELECT
id,
total,
sum(total) OVER (PARTITION BY country) AS total_by_country,
round(total / sum(total) OVER (PARTITION BY country), 3) * 100 AS percentage
FROM learnsql.invoices
ORDER BY 4 DESC;
We calculate the percentage by dividing the row total
by the country total (= sum(total) OVER (PARTITION BY country)
), then rounding the result to 3 decimal digits (for readability), and finally multiplying it by 100 to make it a proper percentage.
Ranking
Window functions can also help us assign a rank to our query results, based on their values. For example, the following query ranks business customers so that, in each country, the customer with the highest invoice will be ranked first, the one with the second highest invoice will be ranked second, and so on:
SELECT
id,
business_name,
country,
total,
dense_rank() OVER (PARTITION BY country ORDER BY total DESC) AS ranking
FROM learnsql.invoices
WHERE business_name IS NOT NULL
ORDER BY 3, 5;
This time we’re using the dense_rank
function and, in addition to partitioning by country, we’re sorting by total
in descending order within each group, to have the highest value first.
References:
Practice
Are you ready to practice what you’ve just learned? Head over to the Notebook and try to solve the exercises you find there!
Note: the solution to each exercise is in the collapsed block right below it: you can expand it by clicking the arrow icon that appears on the far left when you hover it.