Window functions perform aggregation operations on a set of query rows. However, aggregation operations group query rows into a single result row, whereas window functions produce results for each query row. In this lesson, we will see what windows functions in SQL are and the most common windows functions.
Windows functions are built-in functions that allow you to perform calculations across rows that are related to the current row. These functions are commonly used in analytical queries and data processing operations to perform various aggregations, ranking, and grouping functions on data sets.
In general, a Windows function involves defining a window or subset of rows within the dataframe or group and applying a function to that window. The syntax usually involves specifying the window using a set of conditions or criteria, such as the range of rows or the partition key, and then specifying the function to apply.
The syntax of a Windows Function in PostgreSQL is as follows:
<window function>([argument1 [, argument2, ...]])
OVER ([PARTITION BYpartition_expression, ... ]
[ORDER BY sort_expression [ASC | DESC], ... ]
[frame_clause] )
Explanation:
The OVER clause is required for a window function to operate as a window function. It defines the window specification, which includes the data's partitioning, ordering, and framing.
Example:
Suppose we have a table of sales that contains the following data:
region | month | sales |
---|---|---|
East | Jan | 100 |
East | Feb | 200 |
East | Mar | 300 |
West | Jan | 50 |
West | Feb | 75 |
West | Mar | 100 |
We can use the AVG window function to calculate the average sales for each region and show the result alongside each row of the table. Here's the SQL query to do that:
SELECT region, month, sales,
AVG(sales) OVER (PARTITION BY region ORDER BY month) AS avg_sales
FROM sales;
The result of the query would be:
region | month | sales | avg_sales |
---|---|---|---|
East | Jan | 100 | 100 |
East | Feb | 200 | 150 |
East | Mar | 300 | 200 |
West | Jan | 50 | 50 |
West | Feb | 75 | 62.5 |
West | Mar | 100 | 75 |
In this example, we're using the AVG window function to calculate the average sales for each region. The OVER clause is used to specify the partitioning of the data by region and the ordering of the data by month. The AVG(sales) function is the window function that is applied to each partition, and it calculates the average sales for the rows within that partition. The result is a new column, avg_sales, that shows the average sales for each region in each month.
Before discussing windows functions, we will first define a table to understand how we use windows functions. Suppose we have a table student that contains the following data:
id | name | grade |
---|---|---|
1 | Alice | 80 |
2 | Bob | 90 |
3 | Charlie | 85 |
4 | Dave | 95 |
5 | Eve | 75 |
6 | Frank | 85 |
Here are some of the commonly used Windows functions in SQL:
The ROW_NUMBER() window function assigns a unique sequential integer to each row within a partition of a result set. It is often used to generate a unique identifier for each row or to rank the rows based on a specific order.
Here's an example of using the ROW_NUMBER() window function in PostgreSQL:
We can use the ROW_NUMBER() window function to assign a unique number to each row based on the grade in descending order. Here's the SQL query to do that:
SELECT id, name, grade,
ROW_NUMBER() OVER (ORDER BY grade DESC) as row_num
FROM students;
The result of the query would be:
id | name | grade | row_num |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 2 |
3 | Charlie | 85 | 3 |
6 | Frank | 85 | 4 |
1 | Alice | 80 | 5 |
5 | Eve | 75 | 6 |
In this example, we're using the ROW_NUMBER() window function to assign a unique sequential number to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, row_num, that shows the ranking of each student based on their grade.
The RANK() window function is used to assign a rank to each row within a partition of a result set based on the values in one or more columns. It is similar to the ROW_NUMBER() function but can result in tied rankings.
Here's an example of using the RANK() window function in PostgreSQL:
We can use the RANK() window function to assign a rank to each row based on the grade in descending order. Here's the SQL query to do that:
SELECT id, name, grade,
RANK() OVER (ORDER BY grade DESC) as rank
FROM students;
The result of the query would be:
id | name | grade | rank |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 2 |
3 | Charlie | 85 | 3 |
6 | Frank | 85 | 3 |
1 | Alice | 80 | 5 |
5 | Eve | 75 | 6 |
In this example, we're using the RANK() window function to assign a rank to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column rank that shows the rank of each student based on their grade. Note that Frank and Charlie have the same grade of 85, so they have been assigned the same rank of 3, resulting in Alice's rank being 5 instead of 4.
The DENSE_RANK() window function assigns a rank to each row within a partition of a result set. It is similar to the ROW_NUMBER() function but does not leave gaps in the ranking when there are ties. If there are ties in the ranking, the next rank is assigned based on the number of tied rows.
Here's an example of using the DENSE_RANK():
SELECT id, name, grade,
DENSE_RANK() OVER (ORDER BY grade DESC) as dense_rank
FROM students;
We can use the DENSE_RANK() window function to assign a rank to each row based on the grade in descending order. Here's the SQL query to do that:
The result of the query would be:
id | name | grade | dense_rank |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 2 |
3 | Charlie | 85 | 3 |
6 | Frank | 85 | 3 |
1 | Alice | 80 | 4 |
5 | Eve | 75 | 5 |
In this example, we're using the DENSE_RANK() window function to assign a rank to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, dense_rank, that shows the dense rank of each student based on their grade. Charlie and Frank have the same grade, so they both get rank 3. The next rank is 4, not 5, as it would be with ROW_NUMBER().
The NTILE() function is a window function in PostgreSQL that is used to divide a result set into a specified number of groups, or "buckets", based on a specified expression. It assigns a bucket number to each row within a partition of a result set.
Here's an example of using the NTILE() window function :
SELECT id, name, grade,
NTILE(3) OVER (ORDER BY grade DESC) as bucket_num
FROM students;
We want to divide the students into three buckets based on their grades. We can use the NTILE() function to assign each student to one of the three buckets. Here's the SQL query to do that:
The result of the query would be:
id | name | grade | bucket_num |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 1 |
3 | Charlie | 85 | 2 |
6 | Frank | 85 | 2 |
1 | Alice | 80 | 3 |
5 | Eve | 75 | 3 |
n this example, we're using the NTILE() window function to divide the students into three buckets based on their grade. The OVER clause is used to specify the ordering of the data by grade in descending order. The result is a new column, bucket_num, that shows the bucket number assigned to each student.
The LAG() function is a window function in PostgreSQL that enables access to a previous row in a result set. It can be used to compare the values of the current row with the previous row or to calculate the difference between two consecutive rows.
Here's an example of using the LAG() window function:
We can use the LAG() window function to calculate the difference in grade between each student and the previous student based on their ordering by ID. Here's the SQL query to do that:
SELECT id, name, grade,
grade - LAG(grade, 1, 0) OVER (ORDER BY id) as grade_diff
FROM students;
The result of the query would be:
id | name | grade | grade_diff |
---|---|---|---|
1 | Alice | 80 | 0 |
2 | Bob | 90 | 10 |
3 | Charlie | 85 | -5 |
4 | Dave | 95 | 10 |
5 | Eve | 75 | -20 |
6 | Frank | 85 | 10 |
In this example, we're using the LAG() window function to calculate the difference in grade between each student and the previous student based on their ordering by ID. The OVER a clause is used to specify the ordering of the data by ID. The result is a new column grade_diff that shows the difference in grade between each student and the previous student, with a default value of 0 for the first row.
The LEAD() function is a window function that allows you to access the value of a subsequent row in the same result set. It is often used to compare the current row with the next row or to calculate the change or difference between consecutive rows.
Here's an example of using the LEAD() function:
Suppose we calculate the difference in grades between each student and the next student on the list. We can use the LEAD() function to retrieve the grade of the next student and then subtract it from the current student's grade. Here's the SQL query to do that:
SELECT name, grade, LEAD(grade) OVER (ORDER BY grade DESC) - grade as grade_diff
FROM students;
The result of the query would be:
name | grade | grade_diff |
---|---|---|
Dave | 95 | 5 |
Bob | 90 | 5 |
Charlie | 85 | 0 |
Frank | 85 | 5 |
Alice | 80 | 5 |
Eve | 75 | null |
In this example, we're using the LEAD() function to retrieve the grade of the next student in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, grade_diff, that shows the difference in grades between each student and the next student in the list. Note that the last row has a null value for grade_diff because there is no subsequent row to compare it.
Aggregate functions that compute a single result for a group of rows within a partition.
We can use the window functions to calculate various student grade metrics. Here's an example SQL query that calculates the sum, average, minimum, maximum, and count of grades for each row:
SELECT id, name, grade,
SUM(grade) OVER () AS sum_grades,
AVG(grade) OVER () AS avg_grades,
MIN(grade) OVER () AS min_grade,
MAX(grade) OVER () AS max_grade,
COUNT(grade) OVER () AS count_grades
FROM students;
The result of the query would be:
id | name | grade | sum_grades | avg_grades | min_grade | max_grade | count_grades |
---|---|---|---|---|---|---|---|
1 | Alice | 80 | 510 | 85 | 75 | 95 | 6 |
2 | Bob | 90 | 510 | 85 | 75 | 95 | 6 |
3 | Charlie | 85 | 510 | 85 | 75 | 95 | 6 |
4 | Dave | 95 | 510 | 85 | 75 | 95 | 6 |
5 | Eve | 75 | 510 | 85 | 75 | 95 | 6 |
6 | Frank | 85 | 510 | 85 | 75 | 95 | 6 |
In this example, we're using various window functions to calculate different metrics for the students' grades. The OVER clause is used to specify that these calculations should be performed over the entire result set (i.e. no partitioning is necessary). The result is a new set of columns sum_grades, avg_grades, min_grade, max_grade, and count_grades, which show the sum, average, minimum, maximum, and count of grades, respectively, for each row.
Window functions operate over a window of rows defined by the OVER() clause.
Here are the primary types of window functions, along with a list of common window functions under each type:
These functions assign a rank to each row within a partition.
These functions return a scalar value calculated from values in a set of rows. When used as window functions, they calculate values across a defined window.
These functions return values relative to the current row in the window.
These functions provide access to a row at a specific offset from the current row.
These functions calculate values based on a set of rows relative to the current row.
These functions perform numerical and statistical analysis on a set of rows.
These are standard SQL functions with windowed versions that apply over a specific window of rows.
Advanced windowing syntax in SQL includes powerful features such as frame specifications, the EXCLUDE clause, the FILTER clause, and window chaining. These features offer greater control over how window functions are applied, especially when working with complex analytical queries. Below is a detailed explanation of each.
Frame specifications allow you to define the range of rows over which the window function is applied within the partition. The two most common types of frame specifications are ROWS and RANGE:
ROWS | RANGE BETWEEN <start_frame> AND <end_frame> |
SELECT transaction_id, transaction_date, amount, SUM(amount) OVER (ORDER BY transaction_date ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS moving_sum FROM transactions; |
This calculates the sum of the current row, the three preceding rows, and the next row.
The EXCLUDE clause (part of SQL:2011 standard) is used to explicitly exclude certain rows from the frame while calculating a window function. This is useful when you want more control over which rows are considered in your window function.
<function> OVER (PARTITION BY <expr> ORDER BY <expr> <frame_clause> EXCLUDE <exclusion_option>) |
SELECT employee_id, salary, SUM(salary) OVER (ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW) AS sum_excluding_current FROM employees; |
This calculates the sum of all previous salaries, excluding the current row.
The FILTER clause allows you to apply a condition to aggregate functions so that only rows that meet the condition are included in the calculation.
<aggregate_function> FILTER (WHERE <condition>) |
SELECT department_id, employee_id, salary, SUM(salary) FILTER (WHERE salary > 50000) OVER (PARTITION BY department_id) AS sum_high_salaries FROM employees; |
In this example, only salaries greater than 50,000 are summed, partitioned by department.
You can use the FILTER clause with any aggregate function such as SUM(), COUNT(), AVG(), MIN(), or MAX().
Window chaining allows you to apply multiple window functions on the same result set by reusing a window definition or extending an existing window specification.
<function_1> OVER <window_name>, <function_2> OVER (window_name [modifications]) |
SELECT employee_id, salary, department_id, SUM(salary) OVER w AS department_total_salary, AVG(salary) OVER (w ORDER BY hire_date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg_salary FROM employees WINDOW w AS (PARTITION BY department_id); |
In this example, a window named w is defined, which partitions the result set by department_id. The SUM() function uses the window as defined, while the AVG() function reuses w and adds an ORDER BY clause with a frame specification for running averages.
These advanced features can be combined to create highly specific and powerful window queries.
SELECT transaction_id, transaction_date, amount, SUM(amount) OVER (ORDER BY transaction_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW) AS sum_excluding_current, AVG(amount) FILTER (WHERE amount > 100) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS avg_high_value, COUNT(transaction_id) OVER (PARTITION BY customer_id) AS total_transactions FROM transactions; |
In this query:
Let’s start by creating a simple table to illustrate the use of window functions. We'll use a sales table with columns for salesperson, region, month, and amount.
CREATE TABLE sales ( id INT PRIMARY KEY, salesperson VARCHAR(50), region VARCHAR(50), month INT, amount DECIMAL(10, 2) ); |
Now, insert sample data into the sales table.
INSERT INTO sales (id, salesperson, region, month, amount) VALUES (1, 'Alice', 'North', 1, 500.00), (2, 'Bob', 'South', 1, 400.00), (3, 'Charlie', 'North', 1, 300.00), (4, 'Alice', 'North', 2, 700.00), (5, 'Bob', 'South', 2, 600.00), (6, 'Charlie', 'North', 2, 500.00), (7, 'Alice', 'North', 3, 900.00), (8, 'Bob', 'South', 3, 800.00), (9, 'Charlie', 'North', 3, 700.00); |
Create and insert data
This function assigns a unique sequential number to rows within a partition of a result set, starting at 1 for the first row in each partition.
SELECT salesperson, region, month, amount, ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS row_num FROM sales; |
Explanation:
Output:
Row number
This function assigns a rank to rows within a partition of a result set. The rank of a row is one plus the number of ranks that come before it.
SELECT salesperson, region, month, amount, RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS rank FROM sales; |
Output:
Rank
Notice that ranks can be tied.
Similar to RANK(), but it does not leave gaps in ranking when there are ties.
SELECT salesperson, region, month, amount, DENSE_RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS dense_rank FROM sales; |
Output:
Dense rank
This function divides rows in the result set into a specified number of approximately equal parts (buckets).
SELECT salesperson, region, month, amount, NTILE(3) OVER (ORDER BY amount DESC) AS bucket FROM sales; |
Explanation: This divides the rows into three buckets based on the amount.
Output:
ntile
Window functions can be combined with aggregate functions. The following example calculates the cumulative sum of amount by salesperson:
SELECT salesperson, region, month, amount, SUM(amount) OVER (PARTITION BY salesperson ORDER BY month) AS running_total FROM sales; |
Output:
Sum
Similarly, we can calculate a running average:
SELECT salesperson, region, month, amount, AVG(amount) OVER (PARTITION BY salesperson ORDER BY month) AS running_avg FROM sales; |
Output:
Avg
Example: Comparing the current month’s amount with the previous month’s value using LAG().
SELECT salesperson, month, amount, LAG(amount) OVER (PARTITION BY salesperson ORDER BY month) AS prev_month_amount FROM sales; |
Output:
Lag
These functions return the first or last value in the window.
Example: Retrieve the first sales amount in each region.
SELECT salesperson, region, month, amount, FIRST_VALUE(amount) OVER (PARTITION BY region ORDER BY month) AS first_sales FROM sales; |
Output:
First Value
SELECT salesperson, month, amount, SUM(amount) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total FROM sales; |
Output:
Running Total with ROWS Frame
This query calculates a cumulative running total of sales amounts from the first row up to the current row.
SELECT salesperson, month, amount, SUM(CASE WHEN amount >= 500 THEN amount ELSE 0 END) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS filtered_running_total FROM sales; |
Output:
Running Total with Filtering Rows
This query calculates a running total but only includes sales amounts of 500 or more, treating lesser amounts as 0
SELECT salesperson, month, amount, SUM(amount) OVER (ORDER BY month ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS two_month_sum FROM sales; |
Output:
Running Total with Filtering Rows
This query calculates the sum of sales for the current month and the previous month, creating a two-month rolling total.
SELECT salesperson, month, amount,
SUM(amount) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS sum_excluding_current
FROM sales;
Output:
Sum of Previous Rows
This query calculates the sum of all previous sales amounts but excludes the current row’s amount.
SELECT salesperson, month, amount, LAG(amount) OVER (ORDER BY month) AS prev_month_amount FROM sales; |
Output:
Using LAG() to Get Previous Month’s Amount
This query retrieves the sales amount from the previous month for each row, returning NULL for the first row as there is no previous value.
By applying these window functions, you can solve complex analytical queries without needing to resort to self-joins or subqueries.
In conclusion, window functions in SQL Server are used to calculate calculations across related rows of data sets. Unlike aggregate operations that group query rows into a single result row, a window function produces a result for each query row.
Answer:a. A function that performs operations on a set of rows and produces a result for each row
Answer:b. PARTITION BY
Answer:c. ROW_NUMBER()
Answer:a. RANK()
Answer:c. AVG()
Top Tutorials
Related Articles