Window Functions – PART 2: Aggregate Functions

As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of the different types of window functions which are as follows:

  1. Ranking
  2. Aggregate
  3. Offset

I know the ranking functions pretty well but haven’t used the aggregate and offset functions at all really.

This second post demonstrates the aggregate functions.

I’ve been going through the official exam guidebook by Itzik Ben-Gan and he explains how the window functions work very well and the whole book is certainly worth reading if you use T-SQL at all.

With every window function you use the OVER clause. This is the part that specifies the “window” you are ranking or aggregating over.

The Aggregate Functions

For the demonstrations I’m going to use the same data set I used in the ranking function demo and create a table with 11 rows that contain purchases for 3 customers as follows…

CREATE TABLE dbo.WindowFunctionDemo
(
   Id INT NOT NULL IDENTITY(1,1),
   CustomerID INT NOT NULL,
   PurchaseDate DATETIME2(0) NOT NULL,
   PurchaseItem VARCHAR(100) NOT NULL,
   Amount DECIMAL(10,2) NOT NULL,
);
GO

INSERT INTO dbo.WindowFunctionDemo
(
   CustomerID,
   PurchaseDate,
   PurchaseItem,
   Amount
)
VALUES
 (1, '20180306 12:34', 'Rugby Ball', 14.99),
 (1, '20180307 09:54', 'Frying Pan', 21.50),
 (1, '20180307 10:02', 'Vacuum Cleaner', 259.990),
 (2, '20180225 15:26', 'Orchid', 13.62),
 (2, '20180228 18:47', 'Book', 7.99),
 (2, '20180301 08:02', 'Snakes and Ladders', 8.00),
 (2, '20180306 14:32', 'Pillows', 20.00),
 (2, '20180306 14:32', 'Washing Machine', 359.99),
 (2, '20180307 23:15', 'Jeans', 59.99),
 (3, '20180301 09:27', 'Rucksack', 13.99),
 (3, '20180301 09:27', 'Scarf', 12.50);
GO

SELECT *
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 1

So what if we now wanted to see what percentage each purchase is of the total amount. Firstly, we’d need to sum all the amounts so let’s just add that to our column list as follows….

SELECT *, SUM(Amount) AS TotalSales
FROM dbo.WindowFunctionDemo;
GO

Msg 8120, Level 16, State 1, Line 42
Column 'dbo.WindowFunctionDemo.Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

As I’m sure you’re already aware, we can’t add an aggregate function without a GROUP BY clause if non-aggregated columns also exist.

We could remove all the other columns from the SELECT list and just return the total amount and perhaps store that value in a variable…

SELECT SUM(Amount) AS TotalSales
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 10

We can include some columns along with the aggregate, for example, including the CustomerID will give the total per customer but without using something like GROUPING SETS or ROLLUP we can’t see the grand total of all purchases in this query.

SELECT CustomerID, SUM(Amount) AS TotalSales
FROM dbo.WindowFunctionDemo
GROUP BY CustomerID
GO

Windows Ranking 11

However, the OVER clause does allow us to do this. The great thing about this is it allows us to return all of the detail available in all the other columns along with the aggregated amount. With this query we can see the total per customer included as an additional column in the full data set…

SELECT
   *, 
   SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalSalesByCustomer
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 12

Unlike the ranking functions the ORDER BY clause isn’t compulsory and you can see we’ve just partitioned by the CustomerID to give the sum per customer. In fact we don’t even need to do this and can aggregate over all rows by not supplying anything in the brackets as follows…

SELECT
   *,
   SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalSalesByCustomer,
   SUM(Amount) OVER () AS GrandTotal
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 13

So now we have a way if getting the grand total at the row level we can use this to calculate the percentage of total amount for each purchase as follows…

SELECT
   *,
   SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalSalesByCustomer,
   SUM(Amount) OVER () AS GrandTotal,
   CAST(Amount / SUM(Amount) OVER () * 100 AS DECIMAL(10,2)) AS GrandTotalPercent
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 14

As well as optionally including the ORDER BY and/or PARTITION BY clauses we can include something called a window frame. This allows us to specify a frame of rows or values that the aggregation is taken over relative the current row. It can only be used if the data set is ordered, i.e. the ORDER BY clause must be used.

We can use this to do things like generate running totals as follows…

SELECT
   *,
   SUM(Amount) OVER (ORDER BY Amount 
                     ROWS BETWEEN UNBOUNDED PRECEDING 
                     AND CURRENT ROW) AS RunningTotal
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 15

In this example we’ve ordered the data by the Amount and for each row we’re saying to sum the Amount values from the first row (UNBOUNDED PRECEDING) to the CURRENT ROW ordered by the Amount. For example, for the row for the “Scarf” purchase we’re saying to sum the Amount for the “Book”, “Snakes and Ladders”, and “Scarf”.

As well as UNBOUNDED PRECEDING meaning the first row we can also use UNBOUNDED FOLLOWING to mean the last row. For example…

SELECT
   *,
   SUM(Amount) OVER (ORDER BY Amount 
                     ROWS BETWEEN CURRENT ROW 
                     AND UNBOUNDED FOLLOWING) AS RunningTotal
FROM dbo.WindowFunctionDemo
ORDER BY Amount;
GO

Windows Ranking 17

This has basically just reversed how the running total is calculated.

We can set the window frame to include all rows as follows which is the same as giving the grand total over all rows.

SELECT
   *,
   SUM(Amount) OVER (ORDER BY Amount 
                     ROWS BETWEEN UNBOUNDED PRECEDING 
                     AND UNBOUNDED FOLLOWING) AS RunningTotal
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 16

As well as using UNBOUNDED to mean the ends of the data set we can specify a number of rows the window frame should cover. This next example shows the average of each purchase and the previous 2 for that customer based on the PurchaseDate

SELECT
   *,
   AVG(Amount) OVER (PARTITION BY CustomerID 
                     ORDER BY PurchaseDate 
                     ROWS BETWEEN 2 PRECEDING 
                     AND CURRENT ROW) AS RunningTotal
FROM dbo.WindowFunctionDemo
ORDER BY PurchaseDate;
GO

Windows Ranking 18

There is also an option to have the window frame over the values in the rows and not the rows themselves. To do this we just replace ROWS with RANGE. In order to show the difference I’m going to insert two new rows with an Amount of 7.99. This means we’ve now got 3 rows with the same Amount

INSERT INTO dbo.WindowFunctionDemo (CustomerID, PurchaseDate, PurchaseItem, Amount)
VALUES (2, '20180313 18:41', 'The BFG', 7.99),
(2, '20180313 18:41', 'Matilda', 7.99);

If we run our Running Total query from before we can still see it works as expected…

SELECT
   *,
   SUM(Amount) OVER (ORDER BY Amount
                     ROWS BETWEEN UNBOUNDED PRECEDING
                     AND CURRENT ROW) AS RunningTotal
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 19

However, if we use RANGE instead of ROWS we can see the difference…

SELECT
   *,
   SUM(Amount) OVER (ORDER BY Amount 
                     RANGE BETWEEN UNBOUNDED PRECEDING
                     AND CURRENT ROW) AS RunningTotal
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 20

Instead of treating each of the three rows separately it’s now treating each of the distinct values as the “current row” and all three get grouped together.

When using an aggregate window function with an ORDER BY clause the default window frame is actually:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

For example the following returns the same as the previous query…

SELECT
   *,
   SUM(Amount) OVER (ORDER BY Amount) AS RunningTotal
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 21

If you’re getting unexpected results in your aggregations it might be because you need to specify ROWS in your OVER clause instead of the default RANGE.

Advertisements

3 thoughts on “Window Functions – PART 2: Aggregate Functions

  1. Pingback: Window Functions – PART 1: Ranking Functions | Simon Learning SQL Server
  2. Pingback: WINDOW FUNCTIONS – PART 3: OFFSET FUNCTIONS | Simon Learning SQL Server
  3. Pingback: Exam prep for 70-761 | Simon Learning SQL Server

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s