Modifying Partitions

In a previous post I talked about how to partition a table and touched on partition elimination which allows the optimiser to create a query plan where a much smaller amount of data is read.

A lot of the partitioning functions I’ve used in my working life are based on month, usually on reporting tables, where people need to be able to see all historic data but usually are only concerned with the current or last few months. When querying this data, partition elimination allows SQL Server to just read the data on the required partition(s) instead of the whole table. This is very useful if you have a lot of data going back a long time and usually query this data using a date type column.

However, when you first create your partition function based on a date you probably don’t go too far in the future and you might find that all of a sudden two years have passed and all your new data is just being put on a single large latest partition that just keeps growing as the months go by.

Even worse than that you might find the situation I had today where a new database had been recently released to production using an old database schema as a rough template. This database only contains data from March 2018 onward but the partition function for a certain large table was split by month from 1st Jan 2012 to 1st Jan 2016. So there were 48 partitions covering the 4 years from 2012 to 2015 inclusive but all the data is just sitting in the last partition.

What I needed to do was get rid of all the old partitions and just have a single one up to March 2018 and then create new partitions every month for the next few months.

I tried changing the partition function in SSDT and letting that sort things out but it attempts to rebuild the table and drop and recreate everything.

There is a way to do this using the ALTER PARTITION FUNCTION and ALTER PARTITION SCHEME statements and this is how I did it…

To recreate this issue I’ve created a new partition function named PF_MonthlyFinance which has monthly partitions for 2012 to 2015. Then I created a new partition scheme named PS_MonthlyFinance for this function with all partitions on the PRIMARY filegroup. This is the first few rows of the sys.partition_range_values view…

01

… and this is the last few rows…

02

First I thought I’d create the new partitions. We can do this using the split method of the ALTER PARTITION FUNCTION to add the first partition for March 2018.

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180301');

Now if we look at the last few rows of sys.partition_range_values we can see the new partition…

03

So now let’s add the new one for April 2018…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180401');

However, when we run this we get the following error…

Msg 7710, Level 16, State 1, Line 16
Warning: The partition scheme 'PS_MonthlyFinance' does not have any next used filegroup. Partition scheme has not been changed.

You might have noticed that when we created the PARTITION SCHEME the following message was returned…

Partition scheme 'PS_MonthlyFinance' has been created successfully. 'PRIMARY' is marked as the next used filegroup in partition scheme 'PS_MonthlyFinance'.

The next filegroup has already been used when creating the partition for March 2018 so now we need to say what filegroup the next partition should be on. We can do this with the ALTER PARTITION SCHEME statement as follows…

ALTER PARTITION SCHEME PS_MonthlyFinance 
NEXT USED [PRIMARY];

We can re-run the following and now it works fine…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180401');

04

Rather than adding each partition manually we can script it out to create a few in one go. I’ve added this code at the bottom of this post.

Before that I want to show how we can get rid of the old redundant partitions. Again we can use the ALTER PARTITION FUNCTION but this time we use the merge method instead of split as follows…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
MERGE RANGE ('20120101');

Now if we look at the first few rows of sys.partition_range_values we can see the partition for Jan 2012 is now gone…

05

Here’s a single piece of code that can be used to delete all these old partitions…

DECLARE @StartDate DATE = '20120101';
DECLARE @EndDate DATE = '20160101';

WHILE @StartDate < @EndDate
BEGIN

  IF EXISTS (
    SELECT 1
    FROM sys.partition_functions PF
    JOIN sys.partition_range_values PRV ON PRV.function_id = PF.function_id
    WHERE PF.name = 'PF_MonthlyFinance'
    AND PRV.value = @StartDate
  )
  BEGIN

    ALTER PARTITION FUNCTION PF_MonthlyFinance () 
    MERGE RANGE (@StartDate);

  END
 
  SELECT @StartDate = DATEADD(MONTH, 1, @StartDate);

END

We can now see only the two new partitions exist…

06

Now we can use the following to create the next batch of partitions going forward…

DECLARE @StartDate DATE = '20180301';
DECLARE @NumParts INT = 24;
DECLARE @Counter INT = 1;

WHILE @Counter <= @NumParts
BEGIN

  IF NOT EXISTS (
    SELECT 1
    FROM sys.partition_functions PF
    JOIN sys.partition_range_values PRV ON PRV.function_id = PF.function_id
    WHERE PF.name = 'PF_MonthlyFinance'
    AND PRV.value = @StartDate
  )
  BEGIN

    ALTER PARTITION FUNCTION PF_MonthlyFinance () 
    SPLIT RANGE (@StartDate);

    ALTER PARTITION SCHEME PS_MonthlyFinance 
    NEXT USED [PRIMARY];

  END
 
  SELECT @StartDate = DATEADD(MONTH, 1, @StartDate);
  SELECT @Counter += 1;

END

This will create 24 monthly partitions starting from March 2018 and now we can see all the new partitions exist…

07

Advertisements

JSON – Part 3: OPENJSON

In previous posts I demonstrated how to use the FOR JSON clause and how some of the built in JSON scalar functions work. In this final post in this series I’m going to demonstrate how the OPENJSON function works.

There are two ways we can use the OPENJSON function. The first is in discovery mode where the schema of the JSON we are dealing with is unknown. The second is to shred data into usable sets when we know the schema.

Firstly, I’m going to demonstrate how to use it in discovery mode to work out what data is in a supplied piece of JSON.

In this example the JSON is short and formatted so it’s easy to read so we know what’s in it just by looking at it, but when that’s not the case the OPENJSON function can be very useful to see what properties exist within a particular JSON object.

The following JSON has data of a few different types and we can just pass it straight into OPENJSON to return some useful details. The OPENJSON function is a table valued function and always returns a data set. For example…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json);

JSON3 1

… we can see details about each key-value pair in the JSON object along with a number indicating the type.

The type value corresponds to the following:

Value of the Type column JSON data type
0 null
1 string
2 int
3 true/false
4 array
5 object

That example used a single JSON object but if our JSON is an array of JSON objects then the OPENJSON function will shred them into individual rows and the key column contains the ordinal position of the JSON object in the array as follows…

DECLARE @json_array NVARCHAR(MAX) = 
'[
  {
    "CharacterID" : 12,
    "CharacterName" : "Egwene Al''Vere",
    "IsFemale" : true,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
    "Children" : null
  },
  {
    "CharacterID" : 13,
    "ChacterName" : "Perrin Aybara",
    "IsFemale" : false,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Egwene Al''Vere", "Matrim Cauthon", "Nynaeve Al''Merea"],
    "Children" : null
  }
]';

SELECT *
FROM OPENJSON(@json_array);

JSON3 2

If we want to access a single value from an array we can use the second optional parameter to drill down straight to that value as follows…

SELECT *
FROM OPENJSON(@json_array, '$[1]');

JSON3 3

This returns the shredded data from the second JSON object in the array (ordinal positions are zero based).

If the second parameter (the path to start shredding) is not supplied then the entire object specified in the first variable is used. This is the same a specifying “$” as the path. The following two queries are the same…

SELECT *
FROM OPENJSON(@json_array);

SELECT *
FROM OPENJSON(@json_array, '$');

JSON3 4

That covers a few examples of using OPENJSON in discovery mode and now I’m going to show a few examples of how to use it if you know a bit about the schema of the JSON supplied.

We’ve used OPENJSON to discover the schema of our @json_array object so now we can use this and some of the available scalar functions to shred our object into one or more tables.

In this example we’ve got an array and we want to shred that data into individual rows in a data set linked to a header data set containing the rest of the data. One way to do this is as follows…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT
 JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
 JSON_VALUE(@json, '$.CharacterName') AS CharacterName,
 JSON_VALUE(@json, '$.IsFemale') AS IsFemale,
 JSON_VALUE(@json, '$.Address.AddressLine1') AS AddressLine1,
 JSON_VALUE(@json, '$.Address.AddressLine2') AS AddressLine2,
 JSON_VALUE(@json, '$.Children') AS Children

SELECT
  JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
  OJ.value AS Friend
FROM OPENJSON(@json, '$.Friends') OJ;

JSON3 5

Here’s another example where the array is an array of JSON objects rather than individual values…

DECLARE @json_friend NVARCHAR(MAX) = 
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : [
    {
      "CharacterName" : "Rand Al''Thor",
      "CharacterID" : 7,
      "IsFemale" : false
    },
    {
      "CharacterName" : "Perrin Aybara",
      "CharacterID" : 13,
      "IsFemale" : false
    }, 
    {
      "CharacterName" : "Matrim Cauthon",
      "CharacterID" : 16,
      "IsFemale" : false
    }
  ],
  "Children" : null
}';

SELECT
  JSON_VALUE(@json_friend, '$.CharacterID') AS CharacterID,
  JSON_VALUE(OJ.Value, '$.CharacterName') AS FriendCharacterName,
  JSON_VALUE(OJ.Value, '$.CharacterID') AS FriendCharacterID,
  JSON_VALUE(OJ.Value, '$.IsFemale') AS FriendIsFemale
FROM OPENJSON(@json_friend, '$.Friends') OJ;

JSON3 6

As you can see we can use JSON_VALUE to pull out the individual values from the JSON objects in the Friends key.

There is another way we can shred the JSON using OPENJSON and this is by specifying a schema for the data set returned. For example…

DECLARE @json_known NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  Address NVARCHAR(MAX) '$.Address' AS JSON,
  Friends NVARCHAR(MAX) '$.Friends' AS JSON,
  Children VARCHAR(500) '$.Children'
);

JSON3 7

We have to specify the data type for each of the values we want to return in the data set. If the value we are returning is a JSON object or an array then we need to include the AS JSON clause at the end of the column declaration. When we use AS JSON the data type must be NVARCHAR(MAX). If we try to use anything else we get the following error…

AS JSON option can be specified only for column of nvarchar(max) type in WITH clause.

We don’t need to include every column in the data set so the following is fine…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName'
);

JSON3 8

We have to make sure the data types are correct. For example if we try to return a string value into an INT data type we get the following error as expected…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName INT '$.CharacterName'
);

Conversion failed when converting the nvarchar value 'Egwene Al'Vere' to data type int.

JSON is semi-structured so not every property in the table declaration needs to exist. For example, there is no CharacterAge property in the JSON but the following will still run correctly with NULL returned as the CharacterAge

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterAge INT '$.CharacterAge'
);

JSON3 9

If the JSON object contains embedded JSON or an array we are able to access values inside these by navigating down through the embedded objects or by using the ordinal position of the values in the arrays. We can pull all the values out of the @json_known object as follows…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  AddressLine1 VARCHAR(500) '$.Address.AddressLine1',
  AddressLine2 VARCHAR(500) '$.Address.AddressLine2',
  AddressLine3 VARCHAR(500) '$.Address.AddressLine3',
  Friends0 VARCHAR(100) '$.Friends[0]',
  Friends1 VARCHAR(100) '$.Friends[1]',
  Friends2 VARCHAR(100) '$.Friends[2]',
  Friends3 VARCHAR(100) '$.Friends[3]',
  Children VARCHAR(500) '$.Children'
);

JSON3 10.PNG

Lax and Strict

The last thing I want to demonstrate are an additional couple of options we can use in the path parameter of the OPENJSON function. These options are lax and strict.

Strict means that the path specified in the second parameter of the OPENJSON function must exist and lax means that it doesn’t have to. Lax is the default if nothing is supplied so all the examples above use the lax option. Here are some examples using these options…

DECLARE @json VARCHAR(500) = '{"Hello": {"Name" : "Lax Strict Test"} }';

SELECT *
FROM OPENJSON(@json, '$.Goodbye');

SELECT *
FROM OPENJSON(@json, 'lax $.Goodbye');

SELECT *
FROM OPENJSON(@json, 'strict $.Hello');

SELECT *
FROM OPENJSON(@json, 'strict $.Goodbye');

JSON3 11

Property cannot be found on the specified JSON path.

As we can see the first two queries are exactly the same: this shows the lax option is the default. The third query (the first using the strict option) returns the data correctly because the Hello key exists. The fourth and final query returns an error because the Goodbye key does not exist and the strict option is used.

JSON – Part 2: Built in Functions

In a previous post I demonstrated how to use the FOR JSON clause and in this post I’m going to show some of the in built JSON functions. In a future post I’ll show how the OPENJSON function works.

JSON in SQL Server 2016

Built in support for JSON was introduced in SQL Server 2016. There is no native support meaning that there is no JSON data type in the same way that there is an XML data type but there are certain built in functions that allow us to manipulate JSON and convert data sets into JSON. As mentioned above, in a previous post I covered converting data sets into JSON using the FOR JSON clause, so let’s now look at a few demos extracting values out of and storing JSON data.

Validating JSON

When we use JSON in SQL Server we would usually have it in a text string.

Here’s a simple example…

DECLARE @json AS NVARCHAR(MAX) = N'{
   "Customer" : {
       "Id" : 100,
       "Name" : "Lionel Teddington",
       "Address" : {
           "AddrType" : "Home",
           "AddrLine1" : "1 The Road",
           "AddrLine2" : "My Village",
           "Postcode" : "AA1 2BB"
       }
    }
}';

So now we’ve got a JSON object stored in the variable @json.

One of the built in functions will return a boolean indicating if the JSON is correctly formatted or not.

SELECT ISJSON(@json) AS IsValidJson;

The 1 value returned by this function tells us that the JSON is valid.

If we try the following…

DECLARE @json AS NVARCHAR(MAX) = N'{
    "Customer" : {
        "Id" : 100,
        "Name" : "Lionel Teddington",
        "Address" : {
            "AddrType" : "Home",
            "AddrLine1" : "1 The Road",
            "AddrLine2" : "My Village,
            "Postcode" : "AA1 2BB"
        }
    }
}';
 
SELECT ISJSON(@json) AS IsValidJson;

… we can see the My Village value is missing the closing double quote which makes the JSON invalid, hence the value of 0 being returned.

Extracting values from JSON

There are two built in functions that we can use to extract values from a JSON object. The first is JSON_VALUE which can be used to extract a scalar value for a particular key in the JSON object. The second in JSON_QUERY which returns a JSON object (as a text string) that is embedded in values or arrays within the JSON.

If we use the valid JSON object from above we can extract the customer name as follows…

SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

As we can see the JSON_VALUE function accepts two parameters. The first is the JSON object itself and the second is the path to the key corresponding to the value we want to extract. The $ (dollar sign) is used to represent the JSON object specified in the first parameter and from this point we can navigate down through the key-value pairs to get the required value.

As mentioned above, the JSON is just a text string and the JSON_VALUE function doesn’t perform any validation. The same query will work if we use the invalid JSON object from the example above…

DECLARE @json AS NVARCHAR(MAX) = N'{
    "Customer" : {
        "Id" : 100,
        "Name" : "Lionel Teddington",
        "Address" : {
            "AddrType" : "Home",
            "AddrLine1" : "1 The Road",
            "AddrLine2" : "My Village,
            "Postcode" : "AA1 2BB"
        }
    }
}';

SELECT ISJSON(@json) AS IsValidJson;

-- if the JSON is invalid you can still query it
 SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

… but we can only extract values up to the point where the JSON object becomes invalid. In this example the JSON becomes invalid after the My Village value as there is a missing double quote. So if we try to extract the value for the Postcode which appears after the My Village value we see the following error…

SELECT ISJSON(@json) AS IsValidJson;

-- if the JSON is invalid you can still query it
 SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;
 -- but only up to the point where the JSON becomes invalid
 SELECT JSON_VALUE(@json, '$.Customer.Address.Postcode') AS Postcode;

(1 row affected)

(1 row affected)
 Msg 13609, Level 16, State 2, Line 41
 JSON text is not properly formatted. Unexpected character '"' is found at position 160.

The JSON is treated as case sensitive so the following returns NULL on the valid JSON example because the Name key has a lower case n

SELECT JSON_VALUE(@json, '$.Customer.name') AS CustomerName;

If we look at the valid JSON example we can see that the Address key has a JSON object as it’s value. We can’t return this with the JSON_VALUE function and this would just return NULL, but we can use JSON_QUERY as follows…

SELECT JSON_QUERY(@json, '$.Customer.Address') AS AddressJson;

Like JSON_VALUE this function accepts a JSON object as the first parameter and then a path to the embedded JSON as the second parameter.

Modifying JSON objects

The fourth and final function I’m going to demonstrate is the JSON_MODIFY function which allows us to modify values for certain keys inside the supplied JSON.

For example, we can run the following to change the CustomerID value from 100 to 101…

SELECT JSON_MODIFY(@json, '$.Customer.Id', '101');

JSON2 8

As you can see the JSON_MODIFY function returns the modified JSON object. It’s worth noting that this function doesn’t modify the value in the existing JSON object, it just returns the modified JSON.

We can run the following to show this is the case…

SELECT JSON_MODIFY(@json, '$.Customer.Id', '101');
-- we don't actually see the data changed in the variable. modify just returns the amended JSON
SELECT JSON_VALUE(@json, '$.Customer.Id') AS NewCustomerId;

JSON2 9

… and we can see the CustomerID in @json is still 100.

What we can do is declare another variable and store the results of the JSON_MODIFY function there and use that variable in any further processing…

DECLARE @modifiedJson NVARCHAR(MAX);
SELECT @modifiedJson = JSON_MODIFY(@json, '$.Customer.Id', '101');
SELECT JSON_VALUE(@modifiedJson, '$.Customer.Id') AS NewCustomerId;

JSON2 10

Combining JSON_VALUE and JSON_QUERY

Now let’s have a look at a more complicated piece of JSON. In this example the Customer key has an array of JSON objects as it’s value. If we try to pull out the Customer Name value using the JSON_VALUE function we just get a NULL returned…

DECLARE @json AS NVARCHAR(MAX) = N'{
  "Customer" : [
    {
      "Id" : 100,
      "Name" : "Lionel Teddington",
      "Address" : {
        "AddrType" : "Home", 
        "AddrLine1" : "1 The Road",
        "AddrLine2" : "My Village",
        "Postcode" : "AA1 2BB"
      }
    },
    {
      "Id" : 101,
      "Name" : "Carolina Sweetcorn",
      "Address" : {
        "AddrType" : "Home", 
        "AddrLine1" : "1 The Road",
        "AddrLine2" : "My Village",
        "Postcode" : "AA1 2BB"
      }
    }
  ],
  "Shop" : {
    "ShopName" : "The best shop in the world",
    "ShopCode" : "0385920",
    "ShopAddress" : {
      "ShopPostCode" : "AA1 2BC"
    }
  }
}';

SELECT ISJSON(@json)

SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

JSON2 11

What we need to do is get the array of JSON objects using JSON_QUERY and then use JSON_VALUE to extract the required values from there…

-- must use JSON_QUERY for arrays
SELECT JSON_QUERY(@json, '$.Customer') AS CustomerJSON;

SELECT JSON_VALUE(
  JSON_QUERY(@json, '$.Customer'),
  '$.Name'
) AS CustomerName;

JSON2 12

The first SELECT statement shows the array being returned by the JSON_QUERY function but the second SELECT statement is still returning NULL.

This is because with arrays we need to specify which ordinal in the array we want to return the value for.

We can return the name for both customers in our array of JSON objects as follows…

SELECT
  JSON_VALUE(
    JSON_QUERY(@json, '$.Customer'),
    '$[0].Name'
  ) AS FirstCustomerName,
  JSON_VALUE(
    JSON_QUERY(@json, '$.Customer'),
    '$[1].Name'
  ) AS SecondCustomerName;

JSON2 13

Please note that the first ordinal is 0 and not 1

We have two JSON objects in our example but we still need to include the ordinal even if there is only one.

Storing Valid JSON

As previously mentioned there is no JSON data type but we may want to have some validation on any JSON objects that we store in our database.

Let’s create a new table that has just one column that we want to store some JSON in…

CREATE TABLE dbo.JSONDemo
(
  JsonDoc NVARCHAR(MAX)
);

… and now let’s insert some invalid JSON – note the missing double quote after the World value…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World}');

SELECT *
FROM dbo.JSONDemo JD

JSON2 14

This insert works without any errors and note the missing double quote in the results.

Now let’s clear out the table and add a check constraint using the ISJSON function…

TRUNCATE TABLE dbo.JSONDemo;

ALTER TABLE dbo.JSONDemo ADD CONSTRAINT CK_ValidJsonOnly CHECK (ISJSON(JsonDoc) = 1);

This means that now only valid JSON can be added to our table. If we try to insert the invalid JSON we get the following…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World}');

Msg 547, Level 16, State 0, Line 146
The INSERT statement conflicted with the CHECK constraint "CK_ValidJsonOnly". The conflict occurred in database "SimonBlog", table "dbo.JSONDemo", column 'JsonDoc'.
The statement has been terminated.

However, if we correct the JSON and add the missing closing double quote we can insert successfully…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World"}');

SELECT *
FROM dbo.JSONDemo JD

JSON2 15

The insert was successful and note all double quotes are now there.

Exam prep for 70-761

It’s been a little while since I took an exam so I’d thought I’d have a go at 70-761.

It covers a lot of the same material that was covered in 70-461 which was based on SQL Server 2012. 70-761 is based on SQL Server 2016 so I’m going to write about a few of the newer features in 2016 as part of my exam prep.

There is a good video on Pluralsight named “SQL Server 2016 New Features for Developers” by Leonard Lobel that covers the new features well.

Below are some details on some of the new features

I’ve also been working through the 70-761 guidebook by Itzik Ben-Gan and there are a few bits I’m less familiar with that I’m going to write about to make sure I fully understand them

GROUPING SETS (and ROLLUP & CUBE)

As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of some of the T-SQL I’m less familiar with. One example of this is the GROUPING SETS clause. There are also ROLLUP and CUBE clauses but I believe these are deprecated and GROUPING SETS can do everything ROLLUP and CUBE can do but with much more control.

The best way to explain how GROUPING SETS works is to show some example code. I’ll show how it compares to ROLLUP and CUBE in case you need to update any code to convert it to using GROUPING SETS.

So lets create some test data. I’m just creating a simple table containing a few purchases made by a few customers…

DROP TABLE IF EXISTS dbo.GroupingSetsDemo
GO

CREATE TABLE dbo.GroupingSetsDemo
 (
 Id INT NOT NULL IDENTITY(1,1),
 CustomerID INT NOT NULL,
 PurchaseDate DATETIME2(0) NOT NULL,
 PurchaseItem VARCHAR(100) NOT NULL,
 Amount DECIMAL(10,2) NOT NULL,
 );
 GO

INSERT INTO dbo.GroupingSetsDemo
 (
 CustomerID,
 PurchaseDate,
 PurchaseItem,
 Amount
 )
 VALUES
 (1, '20171106 12:34', 'Rugby Ball', 14.99),
 (1, '20180307 09:54', 'Frying Pan', 21.50),
 (1, '20180307 10:02', 'Vacuum Cleaner', 259.990),
 (2, '20171106 15:26', 'Orchid', 13.62),
 (2, '20180228 18:47', 'Book', 7.99),
 (2, '20180301 08:02', 'Snakes and Ladders', 8.00),
 (2, '20180306 14:32', 'Pillows', 20.00),
 (2, '20180306 14:32', 'Washing Machine', 359.99),
 (2, '20180307 23:15', 'Jeans', 59.99),
 (3, '20180301 09:27', 'Rucksack', 13.99),
 (3, '20180301 09:27', 'Scarf', 12.50);
 GO

SELECT *
 FROM dbo.GroupingSetsDemo;
 GO

GS1

If we want to see the total value of the purchases for each customer we can just group by the CustomerID and sum up the Amount values as follows…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY CustomerID

GS2

As well a grouping by columns we can group by what are called GROUPING SETS. This example just shows how we can use GROUPING SETS to mimic the previous query to get total sales by customer…

SELECT CustomerID, SUM(Amount) AS TotalAmount
 FROM dbo.GroupingSetsDemo WFD
 GROUP BY
 GROUPING SETS ((CustomerID));

GS2

So in this case we’ve just said to group by a single grouping set for CustomerID.

The really useful stuff starts to happen when we have more than one grouping set. One useful example is where you want to create a total value in the same data set. To do this we just need to add an additional grouping set telling SQL Server to also group over all columns to give us the total amount for all customers…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS ((CustomerID), ());

GS3

Please note, that you still need to make sure all columns in the SELECT clause appear in at least one grouping set. So we can’t just include the total group set from the above query. If we try, we get the following error as expected…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (());
Msg 8120, Level 16, State 1, Line 56
 Column 'dbo.GroupingSetsDemo.CustomerID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

More interesting things start to happen when we want to group over multiple columns. Let’s bring the purchase year into our query as follows..

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS ((CustomerID, YEAR(WFD.PurchaseDate)));

GS4

We can now add more grouping sets to this query, for example, to give total amount for each customer…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID)
 );

GS5

… or for each year…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (YEAR(WFD.PurchaseDate))
 );

GS6

We can also combine everything to give us all possible total values

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID),
    (YEAR(WFD.PurchaseDate)),
    ()
 );

GS7

As we can see in the previous example, whenever we are adding a grouping set and getting a total row appear, one or more of the values returned can be NULL.  However, what if one of the values being grouped on is NULL.  How do we differential between the original NULL and the NULL created as part of the grouping set.

In the following example one of the PurchaseItem values is NULL….

DROP TABLE IF EXISTS dbo.GroupingSetsDemoNULL

CREATE TABLE dbo.GroupingSetsDemoNULL
(
  Id INT NOT NULL IDENTITY(1,1),
  CustomerID INT NOT NULL,
  PurchaseItem VARCHAR(100) NULL,
  Amount DECIMAL(10,2) NOT NULL,
);
GO

INSERT INTO dbo.GroupingSetsDemoNULL
(
  CustomerID,
  PurchaseItem,
  Amount
)
VALUES
  (1, 'Rugby Ball', 10),
  (1, 'Frying Pan', 20),
  (1, NULL, 30);
GO

SELECT *
FROM dbo.GroupingSetsDemoNULL;

GS11

Let’s say we want to give a total per customer and per item, and a grand total.  We need to create three grouping sets as follows…

SELECT CustomerID, PurchaseItem, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemoNULL WFD
GROUP BY 
  GROUPING SETS (
    (CustomerID),
    (PurchaseItem),
    ()
  );

GS12

As we can see, we’ve got two rows where the CustomerID and PurchaseItems are NULL.  How do we know which one of these is the aggregation of the NULL PurchaseItem acorss all customer and which one is the grand total?  In this example looking at the amount we can see the that row 4 with the value of 60 is the grand total but we can also use the GROUPING_ID  to show this explicitly.

The GROUPING_ID function is a scalar function that returns a BIT.  We pass in a particular column name and this returns a flag saying if each row is part of the aggregation for that column or not.

In this example we’ve got two columns we are aggregating over so let’s use the GROUPING_ID function on both of these as follows…

SELECT CustomerID, PurchaseItem, SUM(Amount) AS TotalAmount,
     GROUPING_ID(CustomerID) AS CustomerIDGrouped, GROUPING_ID(PurchaseItem) AS PurchaseItemGrouped
FROM dbo.GroupingSetsDemoNULL WFD
GROUP BY 
  GROUPING SETS (
    (CustomerID),
    (PurchaseItem),
    ()
  );

GS13

As we can see row 1 is grouped on the CustomerID only and is the aggregation of the NULL PurchaseItem for all customers and row 4 is the grand total aggregated across all customers and items.

As mentioned above, the only requirement is that each non-aggregated column appears in at least one grouping set so we can return just the total for each customer and for each year as follows…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID),
    (YEAR(WFD.PurchaseDate))
 );

GS8

So what about ROLLUP and CUBE (but please remember these are deprecated)?

CUBE just means that the query is grouped by every possible grouping set, i.e. every combination of non-aggregated columns are used as the grouping sets. The syntax for using CUBE is as follows…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID
WITH CUBE
ORDER BY WFD.CustomerID, PurchaseYear

GS

This is the same as using the following grouping sets…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY
 GROUPING SETS (
    (YEAR(WFD.PurchaseDate), WFD.CustomerID),
    (WFD.CustomerID),
    (YEAR(WFD.PurchaseDate)),
    ()
 )
ORDER BY WFD.CustomerID, PurchaseYear

ROLLUP is slightly different and it depends on the order of the columns in the GROUP BY clause. It works by creating a grouping set across all columns in the GROUP BY clause, then creating additional sets across all columns except the last and each time dropping the last column off the end until it’s just left with the empty group. So if we’re grouping over three columns it will create a set for (1,2,3) then for (1,2) then for (1) and finally for (). Here’s an example…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY WFD.CustomerID, YEAR(WFD.PurchaseDate)
WITH ROLLUP

GS9

So this is grouping over both customer and year, then by just customer and finally over all columns. This is the same as creating the following grouping sets…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID),
    ()
 )

As mentioned, the order of the columns in the GROUP BY clause makes a difference and switching them from WFD.CustomerID, YEAR(WFD.PurchaseDate) to YEAR(WFD.PurchaseDate), WFD.CustomerID gives different results…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID
WITH ROLLUP

GS10

This time we’re grouping by year and customer, then by year, and finally by all columns.

70-761 Exam Prep: Miscellaneous T-SQL – Part 1

I’m currently preparing to take the exam 70-761 – Querying Data with Transact-SQL and have been going through various resources including Itzit Ben-Gan’s exam guidebook which is certainly worth reading if you’re planning on taking this exam.

There are quite a few smaller bits of T-SQL functionality that I’m not that familiar with that I’m going to demonstrate in this and further posts. It’s a bit of a random selection but hopefully it will help me remember them and might be of use to others as well.

Contents

  1. ALL ANY SOME with subqueries
  2. TOP (x) WITH TIES
  3. Truncate partition
  4. TRY_CAST, TRY_CONVERT, TRY_PARSE

1. ALL ANY SOME with subqueries

This is something that I’ve never seen anyone use anywhere but I’ve had questions come up on this in two different exams!

We use the ALL, ANY and SOME keywords when compare data against rows returned by a subquery.  For example, if we have two tables: the first containing numbers 1 to 5 and the second containing 3 to 5 we can use ALL to return all rows from the first table that are less than ALL the values in the second table.  The only values that are less than all the values in the second table are 1 and 2.  We can also use ANY or SOME (they both mean the same thing) to return all numbers less than any of the numbers in the second table.  1 and 2 are less so they all so they get returned, 3 is less than 4 and 5 that that get’s returns, and 4 is less than 5 so that gets returned.

Here’s the code, so it should make more sense looking at this…

DROP TABLE IF EXISTS dbo.AllAnySome;
GO

CREATE TABLE dbo.AllAnySome (SomeNumber INT);
GO

INSERT INTO dbo.AllAnySome (SomeNumber)
VALUES (1), (2), (3), (4), (5);
GO

DROP TABLE IF EXISTS dbo.AllAnySomeLookup;
GO

CREATE TABLE dbo.AllAnySomeLookup (SomeNumber INT);
GO

INSERT INTO AllAnySomeLookup (SomeNumber)
VALUES (3), (4), (5);
GO

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < ALL (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS1

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < SOME (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS2

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < ANY (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS2

The query above using ALL is returning all the row from dbo.AllAnySome that are less than the minimum value from dbo.AllAnySomeLookup, so it’s equivalent to…

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber < (SELECT MIN(SomeNumber) FROM dbo.AllAnySomeLookup);
GO

We can also do the following…

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber = ALL (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS3

… which is the same as using the IN keyword…

SELECT *
FROM dbo.AllAnySome
WHERE SomeNumber IN (SELECT * FROM dbo.AllAnySomeLookup);
GO

AAS3

In this example, ALL could also be replaced with SOME or ANY to return the same results.

2. TOP (x) WITH TIES

You can specify TOP (x) immediately after a SELECT clause to restrict the returned result set to only return x number of rows.  Here’s an example where we’ve got a table with 12 rows and we’re just asking the top 10 rows…

DROP TABLE IF EXISTS dbo.TiesTest;

CREATE TABLE dbo.TiesTest (Id INT);

INSERT INTO dbo.TiesTest (Id)
VALUES (1), (1), (1),
 (2), (2), (2),
 (3), (3), (3),
 (4), (4), (4);

SELECT TOP (10) *
FROM dbo.TiesTest
ORDER BY Id;

TopTies1

As we can see this query has returned all the rows with values 1, 2 and 3 but only one of the three rows with value 4.  If we wanted the TOP 10 rows but also wanted to include any other rows outside of the top 10 with the same values we can use the WITH TIES clause, meaning any subsequent rows where the value ties with the last value will be included.

SELECT TOP (10) WITH TIES *
FROM dbo.TiesTest
ORDER BY Id

TopTies2

As we can see we’ve now got 12 rows returned instead of the 10 as the values in rows 11 and 12 tie with the value in row 10.

One last thing to note is we must use ORDER BY if we’re using WITH TIES.

3. TRUNCATE PARTITION

In SQL Server 2016 we now have the ability to truncate a single partition.  Prior to this we could only truncate the entire table  Here’s an example of how to truncate a single partition…

CREATE PARTITION FUNCTION PF1 (DATETIME2(0))
AS RANGE RIGHT FOR VALUES ('20171001', '20171101', '20171201', '20180101', 
 '20180201', '20180301', '20180401');
GO

CREATE PARTITION SCHEME PS1
AS PARTITION PF1 ALL TO ([PRIMARY]);
GO

CREATE TABLE dbo.PartitionedTable
(
 Id INT IDENTITY(1,1) NOT NULL,
 FirstName VARCHAR(100) NOT NULL,
 LastName VARCHAR(100) NOT NULL,
 DateOfBirth DATETIME2(0) NOT NULL,
 Address VARCHAR(400) NOT NULL,
 RegisteredDate DATETIME2(0) NOT NULL
) ON PS1(RegisteredDate);

INSERT INTO dbo.PartitionedTable (FirstName, LastName, DateOfBirth, Address, RegisteredDate)
VALUES
('Angela', 'Bennett', '1985-06-03', '1 The Road, This Town, AA1 2BB', '20171015'),
('Colin', 'Denton', '1952-12-05', '12 The Road, This Town, AA1 2BB', '20171125'),
('Eleanor', 'Francis', '1988-07-23', '9 The Road, This Town, AA1 2BB', '20171214'),
('Gerald', 'Harris', '1985-07-25', '102 The Road, This Town, AA1 2BB', '20180109'),
('Iris', 'Jones', '1947-10-18', '93 The Road, This Town, AA1 2BB', '20180206'),
('Kirk', 'Lennox', '2001-04-19', '2 The Road, This Town, AA1 2BB', '20180328');


SELECT *
FROM dbo.PartitionedTable;

TruncPart1

So now we’ve got a partitioned table with a single row on different partitions.

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.PartitionedTable');

TruncPart2

As we can see there is a row on partitions 2 to 7.

We can now truncate partitions 2, 3 and 4 which contains the rows for Angela Bennett, Colin Denton and Eleanor Francis as follows…

TRUNCATE TABLE dbo.PartitionedTable WITH (PARTITIONS (2 TO 4));

We can now see that the data has been removed from the table and the partitions now have no rows…

SELECT *
FROM dbo.PartitionedTable;

TruncPart3

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.PartitionedTable')

TruncPart4

4. TRY_CAST, TRY_CONVERT, TRY_PARSE

Let’s say, for example, that we have a fairly simple process that loads data from one table to another.  We get a feed of the source data into a table from a third party and it contains date values stored as text strings.  Our target table needs to store the date as a DATE data type rather than a string so we explicitly convert it into a DATE when loading into the target.  We’re just using an insert statement to load the data but the data from the supplier is sometimes invalid and causes the whole batch to fail.  We want a way to allow the valid data to go through (and possibly store the failed data somewhere else).  Let’s set up some test data to demo this…

CREATE TABLE T_Source
(
 ItsADate VARCHAR(20)
);

INSERT INTO T_Source (ItsADate)
VALUES 
('20180203'),
('20180204'),
('ThisIsntADate'),
('20180205');

SELECT *
FROM T_Source;

misc1

As we can see the third row does not contain a valid date.

Let’s now create our target table and try and load the source data in…

CREATE TABLE T_Target
(
 DefinitelyADate DATE
);

INSERT INTO dbo.T_Target (DefinitelyADate)
SELECT CAST(ItsADate AS DATE)
FROM dbo.T_Source;

Msg 241, Level 16, State 1, Line 32
Conversion failed when converting date and/or time from character string.

So because of the one invalid row, the other valid rows do not get loaded.  Instead of using CAST we can use TRY_CAST instead which will return NULL if the value can’t be cast and won’t return an error.

SELECT TRY_CAST(ItsADate AS DATE)
FROM dbo.T_Source

misc2

We probably wouldn’t want to load NULL in but we could do some filtering or maybe just use ISDATE() = 1.

CONVERT and PARSE have “TRY” equivalents…

SELECT TRY_CONVERT(DATE, 'simon')
SELECT TRY_PARSE('99 March 2018' AS DATE USING 'en-gb')

These two queries just return NULL without throwing an error.

 

WINDOW FUNCTIONS – PART 3: Offset Functions

As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of the different types of window functions which are as follows:

  1. Ranking
  2. Aggregate
  3. Offset

I know the ranking functions pretty well but haven’t used the aggregate and offset functions at all really.

This third and final post demonstrates the offset functions.

I’ve been going through the official exam guidebook by Itzik Ben-Gan and he explains how the window functions work very well and the whole book is certainly worth reading if you use T-SQL at all.

With every window function you use the OVER clause. This is the part that specifies the “window” you are ranking or aggregating over.

The Offset Functions

For the demonstrations I’m going to use the same data set I used in the ranking function demo and create a table with 11 rows that contain purchases for 3 customers as follows…

CREATE TABLE dbo.WindowFunctionDemo
(
   Id INT NOT NULL IDENTITY(1,1),
   CustomerID INT NOT NULL,
   PurchaseDate DATETIME2(0) NOT NULL,
   PurchaseItem VARCHAR(100) NOT NULL,
   Amount DECIMAL(10,2) NOT NULL,
);
GO

INSERT INTO dbo.WindowFunctionDemo
(
   CustomerID,
   PurchaseDate,
   PurchaseItem,
   Amount
)
VALUES
 (1, '20180306 12:34', 'Rugby Ball', 14.99),
 (1, '20180307 09:54', 'Frying Pan', 21.50),
 (1, '20180307 10:02', 'Vacuum Cleaner', 259.990),
 (2, '20180225 15:26', 'Orchid', 13.62),
 (2, '20180228 18:47', 'Book', 7.99),
 (2, '20180301 08:02', 'Snakes and Ladders', 8.00),
 (2, '20180306 14:32', 'Pillows', 20.00),
 (2, '20180306 14:32', 'Washing Machine', 359.99),
 (2, '20180307 23:15', 'Jeans', 59.99),
 (3, '20180301 09:27', 'Rucksack', 13.99),
 (3, '20180301 09:27', 'Scarf', 12.50);
GO

SELECT *
FROM dbo.WindowFunctionDemo;
GO

Windows Ranking 1

The offset window functions allow you to return data from other rows based on their distance away from the current row.  For example, we might want to return all of the purchases made today but in the same row return the value of the previous purchase that the customer made.

The LAG and LEAD window functions allow you to do this: LAG allows you to access previous rows and LEAD allows you to access following rows.

The data set must be ordered so the functions can find previous and following rows correctly.

The following example returns all the data in the table along with the amount of the previous and next purchases each customer has made…

SELECT *,
 LAG(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS PreviousAmount,
 LEAD(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS NextAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 1

You can pass a second parameter into the LAG and LEAD functions to access other rows and not just the ones immediately either side of the current row.  This example returns the Amount value from the two previous orders…

SELECT *,
   LAG(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS PreviousAmount,
   LAG(Amount, 2) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS SecondPreviousAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 2

If not supplied, a default value of 1 is used as this parameter so

LAG(Amount)

is the same as

LAG(Amount,1)

There is also a third parameter that can be passed in that will convert NULLs to a default value…

SELECT *,
   LAG(Amount, 1, -100) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS PreviousAmount,
   LAG(Amount, 2, -200) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS SecondPreviousAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 3

There are also FIRST_VALUE and LAST_VALUE offset functions that return the first and last values in the window frame.

This example returns the Amount value of the first purchase for each customer…

SELECT *,
 FIRST_VALUE(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS FirstPurchaseAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 4

We can do something similar to find the last value in the window frame… (however there is something you need to be careful with here!!!)

SELECT *,
   LAST_VALUE(Amount) OVER (PARTITION BY CustomerID ORDER BY PurchaseDate) AS LatestPurchaseAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 5

As you can see this is not showing the results we might expect.  We might expect to see 259.99 in the LatestPurchaseAmount for all rows for CustomerID = 1.  If you read my post on the aggregate window functions you might remember that the default window frame is…

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

This means that for each row the window frame, i.e. the data available to the offset function, is all previous rows up to and including the current row.  As we’re ordering by PurchaseDate, the window frame for the first purchase by customer 1 of the Rugby Ball on 2018-03-06 12:34:00 only contains that single row.  The window frame is restricting access to the rows for later purchases.

All we need to do to get the required result is make sure we’re using the correct window frame, i.e. make sure the frame includes the latest purchase for each customer.  We can do this as follows…

SELECT *,
   LAST_VALUE(Amount) OVER (
                     PARTITION BY CustomerID 
                     ORDER BY PurchaseDate
                     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 ) AS LatestPurchaseAmount
FROM dbo.WindowFunctionDemo;
GO

LAG LEAD 6

We could also use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING which spreads the window frame across the whole partition.