As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of the different types of window functions which are as follows:
I know the ranking functions pretty well but haven’t used the aggregate and offset functions at all really.
This first post is just going to give a description of how the ranking functions work and I’ll try and go into more detail on the other two in further posts.
I’ve been going through the official exam guidebook by Itzik Ben-Gan and he explains how the window functions work very well and the whole book is certainly worth reading if you use T-SQL at all.
With every window function you use the OVER clause. This is the part that specifies the “window” you are ranking or aggregating over.
The Ranking Functions
There are four ranking functions and to demonstrate how they work I’m going to create a table with 11 rows in that contain purchases for 3 customers as follows…
CREATE TABLE dbo.WindowFunctionDemo ( Id INT NOT NULL IDENTITY(1,1), CustomerID INT NOT NULL, PurchaseDate DATETIME2(0) NOT NULL, PurchaseItem VARCHAR(100) NOT NULL, Amount DECIMAL(10,2) NOT NULL, ); GO INSERT INTO dbo.WindowFunctionDemo ( CustomerID, PurchaseDate, PurchaseItem, Amount ) VALUES (1, '20180306 12:34', 'Rugby Ball', 14.99), (1, '20180307 09:54', 'Frying Pan', 21.50), (1, '20180307 10:02', 'Vacuum Cleaner', 259.990), (2, '20180225 15:26', 'Orchid', 13.62), (2, '20180228 18:47', 'Book', 7.99), (2, '20180301 08:02', 'Snakes and Ladders', 8.00), (2, '20180306 14:32', 'Pillows', 20.00), (2, '20180306 14:32', 'Washing Machine', 359.99), (2, '20180307 23:15', 'Jeans', 59.99), (3, '20180301 09:27', 'Rucksack', 13.99), (3, '20180301 09:27', 'Scarf', 12.50); GO SELECT * FROM dbo.WindowFunctionDemo; GO
All the ranking functions require an ORDER BY clause as part of the OVER clause: this is because the rows must have an order to allow a ranking to be applied.
The first one I’m going to show is the ROW_NUMBER function that just returns a sequential BIGINT number in the order specified in the OVER clause.
This first example just orders by the Id column so we end up with the ROW_NUMBER function returning exactly the same value that is in the Id column.
SELECT ROW_NUMBER() OVER (ORDER BY Id) AS RowNumber, * FROM dbo.WindowFunctionDemo; GO
We can obviously order by any column or columns: this example orders by Amount descending…
SELECT ROW_NUMBER() OVER (ORDER BY Amount DESC) AS RowNumber, * FROM dbo.WindowFunctionDemo; GO
We can also include a PARTITION BY clause to the OVER clause which means that the ROW_NUMBER function is applied across each of the partitions independently. For example, we can add a partition on the CustomerID column to the statement above so it now ranks each customer by the amount as follows…
SELECT ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS RowNumber, * FROM dbo.WindowFunctionDemo; GO
We can see from the data set above that the row with a RowNumber of 1 is the most expensive purchase each customer has made. So what if we just want to return those rows. Let’s trying filtering on these ones…
SELECT ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS RowNumber, * FROM dbo.WindowFunctionDemo WHERE ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) = 1; GO Msg 4108, Level 15, State 1, Line 58 Windowed functions can only appear in the SELECT or ORDER BY clauses.
As we can see from the error above we can only use the window functions in the SELECT and ORDER BY clauses. This is because of the order SQL Server processes each part of a query. The order is:
- GROUP BY
- WITH CUBE or WITH ROLLUP
- ORDER BY
The window functions are calculated as part of the SELECT phase so they are not available for use in any of the preceding phases. One way to get the required results would be to use a derived table like a CTE as follows…
WITH Orders AS ( SELECT ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS RowNumber, * FROM dbo.WindowFunctionDemo ) SELECT * FROM Orders WHERE RowNumber = 1;
The second ranking function I’m going to demo is the RANK function. The syntax is the same as ROW_NUMBER and again we can use the ORDER BY and PARTITION BY clauses in the OVER clauses. The difference between RANK and ROW_NUMBER is that RANK assigns a rank based on the value in the row and give rows with the same value the same rank. For example, ordering by PurchaseDate with no partitioning we can see the two sets of the duplicate PurchaseDates have been given the same rank: these are rows with ranks 4 and 7.
SELECT RANK() OVER (ORDER BY PurchaseDate) AS Ranker, * FROM dbo.WindowFunctionDemo; GO
As we can see there are two rows with a rank of 4 and they both have the same PurchaseDate value. We can also see that the next value in the ranking sequence is 6. If there were 3 values then the next one would have been 7.
We can use the third ranking function, DENSE_RANK to not skip over the ranking values as follows…
SELECT DENSE_RANK() OVER (ORDER BY PurchaseDate) AS Ranker, * FROM dbo.WindowFunctionDemo; GO
The fourth and final ranking function is the NTILE function which just splits a data set into n parts as equally as possible. For example, we can split our purchases data set into 5 parts as follows:
SELECT NTILE(5) OVER (ORDER BY Amount) AS Ranker, * FROM dbo.WindowFunctionDemo; GO
Because the 11 rows in the data set can’t be split into 5 equal parts the first bucket ends up with 3 rows and the rest with 2 rows. It always puts the extra remainder rows starting from the first bucket upwards so if there 12 rows we’d have ended up with 3 row in the first two buckets.
For the final example, here’s everything together…
SELECT ROW_NUMBER() OVER (ORDER BY PurchaseDate) AS RowNumber, RANK() OVER (ORDER BY PurchaseDate) AS Ranker, DENSE_RANK() OVER (ORDER BY PurchaseDate) AS DenseRanker, NTILE(5) OVER (ORDER BY PurchaseDate) AS NTil, * FROM dbo.WindowFunctionDemo; GO