Placements
About Us
Overview
Window functions perform aggregation operations on a set of query rows. However, aggregation operations group query rows into a single result row, whereas window functions produce results for each query row. In this lesson, we will see what windows functions are and the most common windows functions.
What is Windows Function?
Windows functions are built-in functions that allow you to perform calculations across rows that are related to the current row. These functions are commonly used in analytical queries and data processing operations to perform various aggregations, ranking, and grouping functions on data sets.
In general, a Windows function involves defining a window or subset of rows within the dataframe or group and applying a function to that window. The syntax usually involves specifying the window using a set of conditions or criteria, such as the range of rows or the partition key, and then specifying the function to apply.
Syntax of Windows Function
The syntax of a Windows Function in PostgreSQL is as follows:
<window function>([argument1 [, argument2, ...]])
OVER ([PARTITION BYpartition_expression, ... ]
[ORDER BY sort_expression [ASC | DESC], ... ]
[frame_clause] )
Explanation:
The OVER clause is required for a window function to operate as a window function. It defines the window specification, which includes the data's partitioning, ordering, and framing.
Example:
Suppose we have a table of sales that contains the following data:
region | month | sales |
---|---|---|
East | Jan | 100 |
East | Feb | 200 |
East | Mar | 300 |
West | Jan | 50 |
West | Feb | 75 |
West | Mar | 100 |
We can use the AVG window function to calculate the average sales for each region and show the result alongside each row of the table. Here's the SQL query to do that:
SELECT region, month, sales,
AVG(sales) OVER (PARTITION BY region ORDER BY month) AS avg_sales
FROM sales;
The result of the query would be:
region | month | sales | avg_sales |
---|---|---|---|
East | Jan | 100 | 100 |
East | Feb | 200 | 150 |
East | Mar | 300 | 200 |
West | Jan | 50 | 50 |
West | Feb | 75 | 62.5 |
West | Mar | 100 | 75 |
In this example, we're using the AVG window function to calculate the average sales for each region. The OVER clause is used to specify the partitioning of the data by region and the ordering of the data by month. The AVG(sales) function is the window function that is applied to each partition, and it calculates the average sales for the rows within that partition. The result is a new column, avg_sales, that shows the average sales for each region in each month.
How Windows function differ from the GroupBy function?
Commonly used Windows Function
Before discussing windows functions, we will first define a table to understand how we use windows functions. Suppose we have a table student that contains the following data:
id | name | grade |
---|---|---|
1 | Alice | 80 |
2 | Bob | 90 |
3 | Charlie | 85 |
4 | Dave | 95 |
5 | Eve | 75 |
6 | Frank | 85 |
Here are some of the commonly used Windows functions in SQL:
The ROW_NUMBER() window function assigns a unique sequential integer to each row within a partition of a result set. It is often used to generate a unique identifier for each row or to rank the rows based on a specific order.
Here's an example of using the ROW_NUMBER() window function in PostgreSQL:
We can use the ROW_NUMBER() window function to assign a unique number to each row based on the grade in descending order. Here's the SQL query to do that:
SELECT id, name, grade,
ROW_NUMBER() OVER (ORDER BY grade DESC) as row_num
FROM students;
The result of the query would be:
id | name | grade | row_num |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 2 |
3 | Charlie | 85 | 3 |
6 | Frank | 85 | 4 |
1 | Alice | 80 | 5 |
5 | Eve | 75 | 6 |
In this example, we're using the ROW_NUMBER() window function to assign a unique sequential number to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, row_num, that shows the ranking of each student based on their grade.
The RANK() window function is used to assign a rank to each row within a partition of a result set based on the values in one or more columns. It is similar to the ROW_NUMBER() function but can result in tied rankings.
Here's an example of using the RANK() window function in PostgreSQL:
We can use the RANK() window function to assign a rank to each row based on the grade in descending order. Here's the SQL query to do that:
SELECT id, name, grade,
RANK() OVER (ORDER BY grade DESC) as rank
FROM students;
The result of the query would be:
id | name | grade | rank |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 2 |
3 | Charlie | 85 | 3 |
6 | Frank | 85 | 3 |
1 | Alice | 80 | 5 |
5 | Eve | 75 | 6 |
In this example, we're using the RANK() window function to assign a rank to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column rank that shows the rank of each student based on their grade. Note that Frank and Charlie have the same grade of 85, so they have been assigned the same rank of 3, resulting in Alice's rank being 5 instead of 4.
The DENSE_RANK() window function assigns a rank to each row within a partition of a result set. It is similar to the ROW_NUMBER() function but does not leave gaps in the ranking when there are ties. If there are ties in the ranking, the next rank is assigned based on the number of tied rows.
Here's an example of using the DENSE_RANK():
SELECT id, name, grade,
DENSE_RANK() OVER (ORDER BY grade DESC) as dense_rank
FROM students;
We can use the DENSE_RANK() window function to assign a rank to each row based on the grade in descending order. Here's the SQL query to do that:
The result of the query would be:
id | name | grade | dense_rank |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 2 |
3 | Charlie | 85 | 3 |
6 | Frank | 85 | 3 |
1 | Alice | 80 | 4 |
5 | Eve | 75 | 5 |
In this example, we're using the DENSE_RANK() window function to assign a rank to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, dense_rank, that shows the dense rank of each student based on their grade. Charlie and Frank have the same grade, so they both get rank 3. The next rank is 4, not 5, as it would be with ROW_NUMBER().
The NTILE() function is a window function in PostgreSQL that is used to divide a result set into a specified number of groups, or "buckets", based on a specified expression. It assigns a bucket number to each row within a partition of a result set.
Here's an example of using the NTILE() window function :
SELECT id, name, grade,
NTILE(3) OVER (ORDER BY grade DESC) as bucket_num
FROM students;
We want to divide the students into three buckets based on their grades. We can use the NTILE() function to assign each student to one of the three buckets. Here's the SQL query to do that:
The result of the query would be:
id | name | grade | bucket_num |
---|---|---|---|
4 | Dave | 95 | 1 |
2 | Bob | 90 | 1 |
3 | Charlie | 85 | 2 |
6 | Frank | 85 | 2 |
1 | Alice | 80 | 3 |
5 | Eve | 75 | 3 |
n this example, we're using the NTILE() window function to divide the students into three buckets based on their grade. The OVER clause is used to specify the ordering of the data by grade in descending order. The result is a new column, bucket_num, that shows the bucket number assigned to each student.
The LAG() function is a window function in PostgreSQL that enables access to a previous row in a result set. It can be used to compare the values of the current row with the previous row or to calculate the difference between two consecutive rows.
Here's an example of using the LAG() window function:
We can use the LAG() window function to calculate the difference in grade between each student and the previous student based on their ordering by ID. Here's the SQL query to do that:
SELECT id, name, grade,
grade - LAG(grade, 1, 0) OVER (ORDER BY id) as grade_diff
FROM students;
The result of the query would be:
id | name | grade | grade_diff |
---|---|---|---|
1 | Alice | 80 | 0 |
2 | Bob | 90 | 10 |
3 | Charlie | 85 | -5 |
4 | Dave | 95 | 10 |
5 | Eve | 75 | -20 |
6 | Frank | 85 | 10 |
In this example, we're using the LAG() window function to calculate the difference in grade between each student and the previous student based on their ordering by ID. The OVER a clause is used to specify the ordering of the data by ID. The result is a new column grade_diff that shows the difference in grade between each student and the previous student, with a default value of 0 for the first row.
The LEAD() function is a window function that allows you to access the value of a subsequent row in the same result set. It is often used to compare the current row with the next row or to calculate the change or difference between consecutive rows.
Here's an example of using the LEAD() function:
Suppose we calculate the difference in grades between each student and the next student on the list. We can use the LEAD() function to retrieve the grade of the next student and then subtract it from the current student's grade. Here's the SQL query to do that:
SELECT name, grade, LEAD(grade) OVER (ORDER BY grade DESC) - grade as grade_diff
FROM students;
The result of the query would be:
name | grade | grade_diff |
---|---|---|
Dave | 95 | 5 |
Bob | 90 | 5 |
Charlie | 85 | 0 |
Frank | 85 | 5 |
Alice | 80 | 5 |
Eve | 75 | null |
In this example, we're using the LEAD() function to retrieve the grade of the next student in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, grade_diff, that shows the difference in grades between each student and the next student in the list. Note that the last row has a null value for grade_diff because there is no subsequent row to compare it.
Aggregate functions that compute a single result for a group of rows within a partition.
We can use the window functions to calculate various student grade metrics. Here's an example SQL query that calculates the sum, average, minimum, maximum, and count of grades for each row:
SELECT id, name, grade,
SUM(grade) OVER () AS sum_grades,
AVG(grade) OVER () AS avg_grades,
MIN(grade) OVER () AS min_grade,
MAX(grade) OVER () AS max_grade,
COUNT(grade) OVER () AS count_grades
FROM students;
The result of the query would be:
id | name | grade | sum_grades | avg_grades | min_grade | max_grade | count_grades |
---|---|---|---|---|---|---|---|
1 | Alice | 80 | 510 | 85 | 75 | 95 | 6 |
2 | Bob | 90 | 510 | 85 | 75 | 95 | 6 |
3 | Charlie | 85 | 510 | 85 | 75 | 95 | 6 |
4 | Dave | 95 | 510 | 85 | 75 | 95 | 6 |
5 | Eve | 75 | 510 | 85 | 75 | 95 | 6 |
6 | Frank | 85 | 510 | 85 | 75 | 95 | 6 |
In this example, we're using various window functions to calculate different metrics for the students' grades. The OVER clause is used to specify that these calculations should be performed over the entire result set (i.e. no partitioning is necessary). The result is a new set of columns sum_grades, avg_grades, min_grade, max_grade, and count_grades, which show the sum, average, minimum, maximum, and count of grades, respectively, for each row
Conclusion:
In conclusion, window functions in SQL Server are used to calculate calculations across related rows of data sets. Unlike aggregate operations that group query rows into a single result row, a window function produces a result for each query row.
Key Takeaways
Quiz
Answer:a. A function that performs operations on a set of rows and produces a result for each row
Answer:b. PARTITION BY
Answer:c. ROW_NUMBER()
Answer:a. RANK()
Answer:c. AVG()
Top Tutorials
Related Articles