Bytes

Window Function in SQL

Overview

Window functions perform aggregation operations on a set of query rows. However, aggregation operations group query rows into a single result row, whereas window functions produce results for each query row. In this lesson, we will see what windows functions are and the most common windows functions.

What is Windows Function?

Windows functions are built-in functions that allow you to perform calculations across rows that are related to the current row. These functions are commonly used in analytical queries and data processing operations to perform various aggregations, ranking, and grouping functions on data sets.

In general, a Windows function involves defining a window or subset of rows within the dataframe or group and applying a function to that window. The syntax usually involves specifying the window using a set of conditions or criteria, such as the range of rows or the partition key, and then specifying the function to apply.

Syntax of Windows Function

The syntax of a Windows Function in PostgreSQL is as follows:


<window function>([argument1 [, argument2, ...]]) 
    OVER ([PARTITION BYpartition_expression, ... ] 
                [ORDER BY sort_expression [ASC | DESC], ... ] 
                [frame_clause] )

Explanation:

  • <window function>: the name of the window function to be used, such as SUM, AVG, MAX, MIN, etc.
  • argument1,argument2,…: the arguments to the window function (if any).
  • PARTITION BY: a clause used to group rows into partitions based on the values of one or more columns.
  • ORDER BY: a clause used to order the rows within each partition based on one or more columns.
  • ASC | DESC: specifies the order in which the rows should be sorted (ascending or descending).
  • frame_clause: specifies the range of rows to be included in the window frame, such as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW or RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING.

The OVER clause is required for a window function to operate as a window function. It defines the window specification, which includes the data's partitioning, ordering, and framing.

Example:

Suppose we have a table of sales that contains the following data:

regionmonthsales
EastJan100
EastFeb200
EastMar300
WestJan50
WestFeb75
WestMar100

We can use the AVG window function to calculate the average sales for each region and show the result alongside each row of the table. Here's the SQL query to do that:

SELECT region, month, sales,
       AVG(sales) OVER (PARTITION BY region ORDER BY month) AS avg_sales
FROM sales;

The result of the query would be:

regionmonthsalesavg_sales
EastJan100100
EastFeb200150
EastMar300200
WestJan5050
WestFeb7562.5
WestMar10075

In this example, we're using the AVG window function to calculate the average sales for each region. The OVER clause is used to specify the partitioning of the data by region and the ordering of the data by month. The AVG(sales) function is the window function that is applied to each partition, and it calculates the average sales for the rows within that partition. The result is a new column, avg_sales, that shows the average sales for each region in each month.

How Windows function differ from the GroupBy function?

  • The Windows and GroupBy functions are used in data analysis and manipulation but serve different purposes.
  • The GroupBy function is utilized to group data by one or more columns in a dataframe or a database table.
  • This function combines all the rows with the same value in the specified column(s) and calculates aggregate functions like sum, count, mean, or any custom function on the grouped data.
  • The result of the GroupBy function is a new dataframe or table with the grouped data and the calculated aggregate values.
  • On the other hand, the Windows function performs calculations on a subset of rows within a group or the entire dataframe.
  • It enables the calculation of metrics such as moving averages, cumulative sums, rank, and percentiles of a subset of rows relative to other rows in the same group or the entire dataframe.
  • Unlike the GroupBy function, the Windows function does not combine or aggregate rows. Instead, it applies a function to a specific subset of rows in the dataframe or group.

Commonly used Windows Function

Before discussing windows functions, we will first define a table to understand how we use windows functions. Suppose we have a table student that contains the following data:

idnamegrade
1Alice80
2Bob90
3Charlie85
4Dave95
5Eve75
6Frank85

Here are some of the commonly used Windows functions in SQL:

  1. ROW_NUMBER():

The ROW_NUMBER() window function assigns a unique sequential integer to each row within a partition of a result set. It is often used to generate a unique identifier for each row or to rank the rows based on a specific order.

Here's an example of using the ROW_NUMBER() window function in PostgreSQL:

We can use the ROW_NUMBER() window function to assign a unique number to each row based on the grade in descending order. Here's the SQL query to do that:

SELECT id, name, grade, 
       ROW_NUMBER() OVER (ORDER BY grade DESC) as row_num
FROM students;

The result of the query would be:

idnamegraderow_num
4Dave951
2Bob902
3Charlie853
6Frank854
1Alice805
5Eve756

In this example, we're using the ROW_NUMBER() window function to assign a unique sequential number to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, row_num, that shows the ranking of each student based on their grade.

  1. RANK():

The RANK() window function is used to assign a rank to each row within a partition of a result set based on the values in one or more columns. It is similar to the ROW_NUMBER() function but can result in tied rankings.

Here's an example of using the RANK() window function in PostgreSQL:

We can use the RANK() window function to assign a rank to each row based on the grade in descending order. Here's the SQL query to do that:

SELECT id, name, grade, 
       RANK() OVER (ORDER BY grade DESC) as rank
FROM students;

The result of the query would be:

idnamegraderank
4Dave951
2Bob902
3Charlie853
6Frank853
1Alice805
5Eve756

In this example, we're using the RANK() window function to assign a rank to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column rank that shows the rank of each student based on their grade. Note that Frank and Charlie have the same grade of 85, so they have been assigned the same rank of 3, resulting in Alice's rank being 5 instead of 4.

  1. DENSE_RANK():

The DENSE_RANK() window function assigns a rank to each row within a partition of a result set. It is similar to the ROW_NUMBER() function but does not leave gaps in the ranking when there are ties. If there are ties in the ranking, the next rank is assigned based on the number of tied rows.

Here's an example of using the DENSE_RANK():

SELECT id, name, grade, 
       DENSE_RANK() OVER (ORDER BY grade DESC) as dense_rank
FROM students;

We can use the DENSE_RANK() window function to assign a rank to each row based on the grade in descending order. Here's the SQL query to do that:

The result of the query would be:

idnamegradedense_rank
4Dave951
2Bob902
3Charlie853
6Frank853
1Alice804
5Eve755

In this example, we're using the DENSE_RANK() window function to assign a rank to each row based on the grade in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, dense_rank, that shows the dense rank of each student based on their grade. Charlie and Frank have the same grade, so they both get rank 3. The next rank is 4, not 5, as it would be with ROW_NUMBER().

  1. NTILE():

The NTILE() function is a window function in PostgreSQL that is used to divide a result set into a specified number of groups, or "buckets", based on a specified expression. It assigns a bucket number to each row within a partition of a result set.

Here's an example of using the NTILE() window function :

SELECT id, name, grade, 
       NTILE(3) OVER (ORDER BY grade DESC) as bucket_num
FROM students;

We want to divide the students into three buckets based on their grades. We can use the NTILE() function to assign each student to one of the three buckets. Here's the SQL query to do that:

The result of the query would be:

idnamegradebucket_num
4Dave951
2Bob901
3Charlie852
6Frank852
1Alice803
5Eve753

n this example, we're using the NTILE() window function to divide the students into three buckets based on their grade. The OVER clause is used to specify the ordering of the data by grade in descending order. The result is a new column, bucket_num, that shows the bucket number assigned to each student.

  1. LAG():

The LAG() function is a window function in PostgreSQL that enables access to a previous row in a result set. It can be used to compare the values of the current row with the previous row or to calculate the difference between two consecutive rows.

Here's an example of using the LAG() window function:

We can use the LAG() window function to calculate the difference in grade between each student and the previous student based on their ordering by ID. Here's the SQL query to do that:

SELECT id, name, grade, 
       grade - LAG(grade, 1, 0) OVER (ORDER BY id) as grade_diff
FROM students;

The result of the query would be:

idnamegradegrade_diff
1Alice800
2Bob9010
3Charlie85-5
4Dave9510
5Eve75-20
6Frank8510

In this example, we're using the LAG() window function to calculate the difference in grade between each student and the previous student based on their ordering by ID. The OVER a clause is used to specify the ordering of the data by ID. The result is a new column grade_diff that shows the difference in grade between each student and the previous student, with a default value of 0 for the first row.

  1. LEAD():

The LEAD() function is a window function that allows you to access the value of a subsequent row in the same result set. It is often used to compare the current row with the next row or to calculate the change or difference between consecutive rows.

Here's an example of using the LEAD() function:

Suppose we calculate the difference in grades between each student and the next student on the list. We can use the LEAD() function to retrieve the grade of the next student and then subtract it from the current student's grade. Here's the SQL query to do that:

SELECT name, grade, LEAD(grade) OVER (ORDER BY grade DESC) - grade as grade_diff
FROM students;

The result of the query would be:

namegradegrade_diff
Dave955
Bob905
Charlie850
Frank855
Alice805
Eve75null

In this example, we're using the LEAD() function to retrieve the grade of the next student in descending order. The OVER clause is used to specify the ordering of the data by grade. The result is a new column, grade_diff, that shows the difference in grades between each student and the next student in the list. Note that the last row has a null value for grade_diff because there is no subsequent row to compare it.

  1. SUM(), AVG(), MIN(), MAX(), COUNT():

Aggregate functions that compute a single result for a group of rows within a partition.

We can use the window functions to calculate various student grade metrics. Here's an example SQL query that calculates the sum, average, minimum, maximum, and count of grades for each row:

SELECT id, name, grade,
       SUM(grade) OVER () AS sum_grades,
       AVG(grade) OVER () AS avg_grades,
       MIN(grade) OVER () AS min_grade,
       MAX(grade) OVER () AS max_grade,
       COUNT(grade) OVER () AS count_grades
FROM students;

The result of the query would be:

idnamegradesum_gradesavg_gradesmin_grademax_gradecount_grades
1Alice805108575956
2Bob905108575956
3Charlie855108575956
4Dave955108575956
5Eve755108575956
6Frank855108575956

In this example, we're using various window functions to calculate different metrics for the students' grades. The OVER clause is used to specify that these calculations should be performed over the entire result set (i.e. no partitioning is necessary). The result is a new set of columns sum_grades, avg_grades, min_grade, max_grade, and count_grades, which show the sum, average, minimum, maximum, and count of grades, respectively, for each row

Conclusion:

In conclusion, window functions in SQL Server are used to calculate calculations across related rows of data sets. Unlike aggregate operations that group query rows into a single result row, a window function produces a result for each query row.

Key Takeaways

  • This lesson explains Windows Function and how it differs from GroupBy Function in SQL.
  • Windows Function allows you to perform calculations across rows that are related to the current row. At the same time, GroupBy Function is used to group data by one or more columns in a dataframe or a database table.
  • The lesson also provides the syntax of the Windows Function in PostgreSQL and an example.
  • Additionally, it lists some of the commonly used Windows Functions in SQL, including ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LAG(), and LEAD().

Quiz

  1. What is a Window Function in SQL? 
    1. A function that performs operations on a set of rows and produces a result for each row 
    2. A function that groups rows into a single result row  
    3. A function that calculates aggregate functions on a dataset  
    4. A function that sorts rows within a dataset

Answer:a. A function that performs operations on a set of rows and produces a result for each row

  1. Which clause is required for a Window Function to operate in SQL?
    1. ORDER BY  
    2. PARTITION BY  
    3. GROUP BY 
    4. HAVING

Answer:b. PARTITION BY

  1. Which Window Function in SQL assigns a unique sequential integer to each row within a partition of a result set?
    1. RANK()
    2. DENSE_RANK()
    3. ROW_NUMBER()  
    4. NTILE()

Answer:c. ROW_NUMBER()

  1. Which Window Function in SQL returns the rank of each row within a result set, with ties receiving the same rank and leaving gaps? 
    1. RANK()  
    2. DENSE_RANK()  
    3. ROW_NUMBER() 
    4. NTILE()

Answer:a. RANK()

  1. Which Window Function in SQL returns the average value of a specified column over a partition of a result set?  
    1. SUM()  
    2. COUNT()  
    3. AVG() 
    4. MAX()

Answer:c. AVG()

Module 9: SQL Advanced TopicsWindow Function in SQL

Top Tutorials

Related Articles

AlmaBetter
Made with heartin Bengaluru, India
  • Official Address
  • 4th floor, 133/2, Janardhan Towers, Residency Road, Bengaluru, Karnataka, 560025
  • Communication Address
  • 4th floor, 315 Work Avenue, Siddhivinayak Tower, 152, 1st Cross Rd., 1st Block, Koramangala, Bengaluru, Karnataka, 560034
  • Follow Us
  • facebookinstagramlinkedintwitteryoutubetelegram

© 2024 AlmaBetter