Partitioning Series

I’ve now written a few different posts about various parts of partitioning so thought I would put them all together here in a single series.

  1. Introduction to Partitioning – this post covers how to set up a table with multiple partitions using a partition scheme and function and briefly touches on partition elimination to help reduce the amount of data being read in a query.
  2. More Partitioning – this covers partition elimination in more detail and includes tips to make sure the optimiser knows it can eliminate certain partitions. It also covers the $PARTITION function.
  3. Using $PARTITION to find the last time an event occured – this post demonstrates a real life scenario where I’ve been able to use the $PARTITION function to speed up a query.
  4. 70-761 Exam Prep: Miscellaneous T-SQL – Part 1 – section 3 of this post covers truncating partitions which is new functionality available in SQL Server 2016.
  5. Modifying Partitions – this post shows how to add and remove partitions from an existing partition function.
Advertisements

SSDT: Target Platform and Compatibility Level

I have just seen an issue in one of my database release scripts where I’m getting a load of statements like the following….

PRINT N'Altering [dbo].[TestQueue]...';

GO

PRINT N'No script could be generated for operation 2 on object ''[dbo].[TestQueue]''.';

GO

This is because of the “Target Platform” setting on the properties of the project.  For this database it was set to 2008 as follows:

This needs to be set to the version of the SQL Server that the database sits on in live.  In this case it was SQL Server 2014.

When I changed this it automatically updated the compatibility level of the database to 2014.  However, although sitting on a SQL Server 2014 instance this database is still on 2008 R2 compatibility level so I had to click on “Database Settings” then click on the “Miscellaneous” tab and change it back to 2008 as follows.

Then when I generated the scripts again all the service broker queue alter statements were gone.

In summary, the “Target Platform” needs to be the version of the SQL Server Instance and the “Compatibility level” is the compatibility level of the database itself.

Modifying Partitions

In a previous post I talked about how to partition a table and touched on partition elimination which allows the optimiser to create a query plan where a much smaller amount of data is read.

A lot of the partitioning functions I’ve used in my working life are based on month, usually on reporting tables, where people need to be able to see all historic data but usually are only concerned with the current or last few months. When querying this data, partition elimination allows SQL Server to just read the data on the required partition(s) instead of the whole table. This is very useful if you have a lot of data going back a long time and usually query this data using a date type column.

However, when you first create your partition function based on a date you probably don’t go too far in the future and you might find that all of a sudden two years have passed and all your new data is just being put on a single large latest partition that just keeps growing as the months go by.

Even worse than that you might find the situation I had today where a new database had been recently released to production using an old database schema as a rough template. This database only contains data from March 2018 onward but the partition function for a certain large table was split by month from 1st Jan 2012 to 1st Jan 2016. So there were 48 partitions covering the 4 years from 2012 to 2015 inclusive but all the data is just sitting in the last partition.

What I needed to do was get rid of all the old partitions and just have a single one up to March 2018 and then create new partitions every month for the next few months.

I tried changing the partition function in SSDT and letting that sort things out but it attempts to rebuild the table and drop and recreate everything.

There is a way to do this using the ALTER PARTITION FUNCTION and ALTER PARTITION SCHEME statements and this is how I did it…

To recreate this issue I’ve created a new partition function named PF_MonthlyFinance which has monthly partitions for 2012 to 2015. Then I created a new partition scheme named PS_MonthlyFinance for this function with all partitions on the PRIMARY filegroup. This is the first few rows of the sys.partition_range_values view…

01

… and this is the last few rows…

02

First I thought I’d create the new partitions. We can do this using the split method of the ALTER PARTITION FUNCTION to add the first partition for March 2018.

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180301');

Now if we look at the last few rows of sys.partition_range_values we can see the new partition…

03

So now let’s add the new one for April 2018…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180401');

However, when we run this we get the following error…

Msg 7710, Level 16, State 1, Line 16
Warning: The partition scheme 'PS_MonthlyFinance' does not have any next used filegroup. Partition scheme has not been changed.

You might have noticed that when we created the PARTITION SCHEME the following message was returned…

Partition scheme 'PS_MonthlyFinance' has been created successfully. 'PRIMARY' is marked as the next used filegroup in partition scheme 'PS_MonthlyFinance'.

The next filegroup has already been used when creating the partition for March 2018 so now we need to say what filegroup the next partition should be on. We can do this with the ALTER PARTITION SCHEME statement as follows…

ALTER PARTITION SCHEME PS_MonthlyFinance 
NEXT USED [PRIMARY];

We can re-run the following and now it works fine…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180401');

04

Rather than adding each partition manually we can script it out to create a few in one go. I’ve added this code at the bottom of this post.

Before that I want to show how we can get rid of the old redundant partitions. Again we can use the ALTER PARTITION FUNCTION but this time we use the merge method instead of split as follows…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
MERGE RANGE ('20120101');

Now if we look at the first few rows of sys.partition_range_values we can see the partition for Jan 2012 is now gone…

05

Here’s a single piece of code that can be used to delete all these old partitions…

DECLARE @StartDate DATE = '20120101';
DECLARE @EndDate DATE = '20160101';

WHILE @StartDate < @EndDate
BEGIN

  IF EXISTS (
    SELECT 1
    FROM sys.partition_functions PF
    JOIN sys.partition_range_values PRV ON PRV.function_id = PF.function_id
    WHERE PF.name = 'PF_MonthlyFinance'
    AND PRV.value = @StartDate
  )
  BEGIN

    ALTER PARTITION FUNCTION PF_MonthlyFinance () 
    MERGE RANGE (@StartDate);

  END
 
  SELECT @StartDate = DATEADD(MONTH, 1, @StartDate);

END

We can now see only the two new partitions exist…

06

Now we can use the following to create the next batch of partitions going forward…

DECLARE @StartDate DATE = '20180301';
DECLARE @NumParts INT = 24;
DECLARE @Counter INT = 1;

WHILE @Counter <= @NumParts
BEGIN

  IF NOT EXISTS (
    SELECT 1
    FROM sys.partition_functions PF
    JOIN sys.partition_range_values PRV ON PRV.function_id = PF.function_id
    WHERE PF.name = 'PF_MonthlyFinance'
    AND PRV.value = @StartDate
  )
  BEGIN

    ALTER PARTITION FUNCTION PF_MonthlyFinance () 
    SPLIT RANGE (@StartDate);

    ALTER PARTITION SCHEME PS_MonthlyFinance 
    NEXT USED [PRIMARY];

  END
 
  SELECT @StartDate = DATEADD(MONTH, 1, @StartDate);
  SELECT @Counter += 1;

END

This will create 24 monthly partitions starting from March 2018 and now we can see all the new partitions exist…

07

JSON – Part 3: OPENJSON

In previous posts I demonstrated how to use the FOR JSON clause and how some of the built in JSON scalar functions work. In this final post in this series I’m going to demonstrate how the OPENJSON function works.

There are two ways we can use the OPENJSON function. The first is in discovery mode where the schema of the JSON we are dealing with is unknown. The second is to shred data into usable sets when we know the schema.

Firstly, I’m going to demonstrate how to use it in discovery mode to work out what data is in a supplied piece of JSON.

In this example the JSON is short and formatted so it’s easy to read so we know what’s in it just by looking at it, but when that’s not the case the OPENJSON function can be very useful to see what properties exist within a particular JSON object.

The following JSON has data of a few different types and we can just pass it straight into OPENJSON to return some useful details. The OPENJSON function is a table valued function and always returns a data set. For example…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json);

JSON3 1

… we can see details about each key-value pair in the JSON object along with a number indicating the type.

The type value corresponds to the following:

Value of the Type column JSON data type
0 null
1 string
2 int
3 true/false
4 array
5 object

That example used a single JSON object but if our JSON is an array of JSON objects then the OPENJSON function will shred them into individual rows and the key column contains the ordinal position of the JSON object in the array as follows…

DECLARE @json_array NVARCHAR(MAX) = 
'[
  {
    "CharacterID" : 12,
    "CharacterName" : "Egwene Al''Vere",
    "IsFemale" : true,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
    "Children" : null
  },
  {
    "CharacterID" : 13,
    "ChacterName" : "Perrin Aybara",
    "IsFemale" : false,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Egwene Al''Vere", "Matrim Cauthon", "Nynaeve Al''Merea"],
    "Children" : null
  }
]';

SELECT *
FROM OPENJSON(@json_array);

JSON3 2

If we want to access a single value from an array we can use the second optional parameter to drill down straight to that value as follows…

SELECT *
FROM OPENJSON(@json_array, '$[1]');

JSON3 3

This returns the shredded data from the second JSON object in the array (ordinal positions are zero based).

If the second parameter (the path to start shredding) is not supplied then the entire object specified in the first variable is used. This is the same a specifying “$” as the path. The following two queries are the same…

SELECT *
FROM OPENJSON(@json_array);

SELECT *
FROM OPENJSON(@json_array, '$');

JSON3 4

That covers a few examples of using OPENJSON in discovery mode and now I’m going to show a few examples of how to use it if you know a bit about the schema of the JSON supplied.

We’ve used OPENJSON to discover the schema of our @json_array object so now we can use this and some of the available scalar functions to shred our object into one or more tables.

In this example we’ve got an array and we want to shred that data into individual rows in a data set linked to a header data set containing the rest of the data. One way to do this is as follows…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT
 JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
 JSON_VALUE(@json, '$.CharacterName') AS CharacterName,
 JSON_VALUE(@json, '$.IsFemale') AS IsFemale,
 JSON_VALUE(@json, '$.Address.AddressLine1') AS AddressLine1,
 JSON_VALUE(@json, '$.Address.AddressLine2') AS AddressLine2,
 JSON_VALUE(@json, '$.Children') AS Children

SELECT
  JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
  OJ.value AS Friend
FROM OPENJSON(@json, '$.Friends') OJ;

JSON3 5

Here’s another example where the array is an array of JSON objects rather than individual values…

DECLARE @json_friend NVARCHAR(MAX) = 
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : [
    {
      "CharacterName" : "Rand Al''Thor",
      "CharacterID" : 7,
      "IsFemale" : false
    },
    {
      "CharacterName" : "Perrin Aybara",
      "CharacterID" : 13,
      "IsFemale" : false
    }, 
    {
      "CharacterName" : "Matrim Cauthon",
      "CharacterID" : 16,
      "IsFemale" : false
    }
  ],
  "Children" : null
}';

SELECT
  JSON_VALUE(@json_friend, '$.CharacterID') AS CharacterID,
  JSON_VALUE(OJ.Value, '$.CharacterName') AS FriendCharacterName,
  JSON_VALUE(OJ.Value, '$.CharacterID') AS FriendCharacterID,
  JSON_VALUE(OJ.Value, '$.IsFemale') AS FriendIsFemale
FROM OPENJSON(@json_friend, '$.Friends') OJ;

JSON3 6

As you can see we can use JSON_VALUE to pull out the individual values from the JSON objects in the Friends key.

There is another way we can shred the JSON using OPENJSON and this is by specifying a schema for the data set returned. For example…

DECLARE @json_known NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  Address NVARCHAR(MAX) '$.Address' AS JSON,
  Friends NVARCHAR(MAX) '$.Friends' AS JSON,
  Children VARCHAR(500) '$.Children'
);

JSON3 7

We have to specify the data type for each of the values we want to return in the data set. If the value we are returning is a JSON object or an array then we need to include the AS JSON clause at the end of the column declaration. When we use AS JSON the data type must be NVARCHAR(MAX). If we try to use anything else we get the following error…

AS JSON option can be specified only for column of nvarchar(max) type in WITH clause.

We don’t need to include every column in the data set so the following is fine…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName'
);

JSON3 8

We have to make sure the data types are correct. For example if we try to return a string value into an INT data type we get the following error as expected…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName INT '$.CharacterName'
);

Conversion failed when converting the nvarchar value 'Egwene Al'Vere' to data type int.

JSON is semi-structured so not every property in the table declaration needs to exist. For example, there is no CharacterAge property in the JSON but the following will still run correctly with NULL returned as the CharacterAge

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterAge INT '$.CharacterAge'
);

JSON3 9

If the JSON object contains embedded JSON or an array we are able to access values inside these by navigating down through the embedded objects or by using the ordinal position of the values in the arrays. We can pull all the values out of the @json_known object as follows…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  AddressLine1 VARCHAR(500) '$.Address.AddressLine1',
  AddressLine2 VARCHAR(500) '$.Address.AddressLine2',
  AddressLine3 VARCHAR(500) '$.Address.AddressLine3',
  Friends0 VARCHAR(100) '$.Friends[0]',
  Friends1 VARCHAR(100) '$.Friends[1]',
  Friends2 VARCHAR(100) '$.Friends[2]',
  Friends3 VARCHAR(100) '$.Friends[3]',
  Children VARCHAR(500) '$.Children'
);

JSON3 10.PNG

Lax and Strict

The last thing I want to demonstrate are an additional couple of options we can use in the path parameter of the OPENJSON function. These options are lax and strict.

Strict means that the path specified in the second parameter of the OPENJSON function must exist and lax means that it doesn’t have to. Lax is the default if nothing is supplied so all the examples above use the lax option. Here are some examples using these options…

DECLARE @json VARCHAR(500) = '{"Hello": {"Name" : "Lax Strict Test"} }';

SELECT *
FROM OPENJSON(@json, '$.Goodbye');

SELECT *
FROM OPENJSON(@json, 'lax $.Goodbye');

SELECT *
FROM OPENJSON(@json, 'strict $.Hello');

SELECT *
FROM OPENJSON(@json, 'strict $.Goodbye');

JSON3 11

Property cannot be found on the specified JSON path.

As we can see the first two queries are exactly the same: this shows the lax option is the default. The third query (the first using the strict option) returns the data correctly because the Hello key exists. The fourth and final query returns an error because the Goodbye key does not exist and the strict option is used.

JSON – Part 2: Built in Functions

In a previous post I demonstrated how to use the FOR JSON clause and in this post I’m going to show some of the in built JSON functions. In a future post I’ll show how the OPENJSON function works.

JSON in SQL Server 2016

Built in support for JSON was introduced in SQL Server 2016. There is no native support meaning that there is no JSON data type in the same way that there is an XML data type but there are certain built in functions that allow us to manipulate JSON and convert data sets into JSON. As mentioned above, in a previous post I covered converting data sets into JSON using the FOR JSON clause, so let’s now look at a few demos extracting values out of and storing JSON data.

Validating JSON

When we use JSON in SQL Server we would usually have it in a text string.

Here’s a simple example…

DECLARE @json AS NVARCHAR(MAX) = N'{
   "Customer" : {
       "Id" : 100,
       "Name" : "Lionel Teddington",
       "Address" : {
           "AddrType" : "Home",
           "AddrLine1" : "1 The Road",
           "AddrLine2" : "My Village",
           "Postcode" : "AA1 2BB"
       }
    }
}';

So now we’ve got a JSON object stored in the variable @json.

One of the built in functions will return a boolean indicating if the JSON is correctly formatted or not.

SELECT ISJSON(@json) AS IsValidJson;

The 1 value returned by this function tells us that the JSON is valid.

If we try the following…

DECLARE @json AS NVARCHAR(MAX) = N'{
    "Customer" : {
        "Id" : 100,
        "Name" : "Lionel Teddington",
        "Address" : {
            "AddrType" : "Home",
            "AddrLine1" : "1 The Road",
            "AddrLine2" : "My Village,
            "Postcode" : "AA1 2BB"
        }
    }
}';
 
SELECT ISJSON(@json) AS IsValidJson;

… we can see the My Village value is missing the closing double quote which makes the JSON invalid, hence the value of 0 being returned.

Extracting values from JSON

There are two built in functions that we can use to extract values from a JSON object. The first is JSON_VALUE which can be used to extract a scalar value for a particular key in the JSON object. The second in JSON_QUERY which returns a JSON object (as a text string) that is embedded in values or arrays within the JSON.

If we use the valid JSON object from above we can extract the customer name as follows…

SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

As we can see the JSON_VALUE function accepts two parameters. The first is the JSON object itself and the second is the path to the key corresponding to the value we want to extract. The $ (dollar sign) is used to represent the JSON object specified in the first parameter and from this point we can navigate down through the key-value pairs to get the required value.

As mentioned above, the JSON is just a text string and the JSON_VALUE function doesn’t perform any validation. The same query will work if we use the invalid JSON object from the example above…

DECLARE @json AS NVARCHAR(MAX) = N'{
    "Customer" : {
        "Id" : 100,
        "Name" : "Lionel Teddington",
        "Address" : {
            "AddrType" : "Home",
            "AddrLine1" : "1 The Road",
            "AddrLine2" : "My Village,
            "Postcode" : "AA1 2BB"
        }
    }
}';

SELECT ISJSON(@json) AS IsValidJson;

-- if the JSON is invalid you can still query it
 SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

… but we can only extract values up to the point where the JSON object becomes invalid. In this example the JSON becomes invalid after the My Village value as there is a missing double quote. So if we try to extract the value for the Postcode which appears after the My Village value we see the following error…

SELECT ISJSON(@json) AS IsValidJson;

-- if the JSON is invalid you can still query it
 SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;
 -- but only up to the point where the JSON becomes invalid
 SELECT JSON_VALUE(@json, '$.Customer.Address.Postcode') AS Postcode;

(1 row affected)

(1 row affected)
 Msg 13609, Level 16, State 2, Line 41
 JSON text is not properly formatted. Unexpected character '"' is found at position 160.

The JSON is treated as case sensitive so the following returns NULL on the valid JSON example because the Name key has a lower case n

SELECT JSON_VALUE(@json, '$.Customer.name') AS CustomerName;

If we look at the valid JSON example we can see that the Address key has a JSON object as it’s value. We can’t return this with the JSON_VALUE function and this would just return NULL, but we can use JSON_QUERY as follows…

SELECT JSON_QUERY(@json, '$.Customer.Address') AS AddressJson;

Like JSON_VALUE this function accepts a JSON object as the first parameter and then a path to the embedded JSON as the second parameter.

Modifying JSON objects

The fourth and final function I’m going to demonstrate is the JSON_MODIFY function which allows us to modify values for certain keys inside the supplied JSON.

For example, we can run the following to change the CustomerID value from 100 to 101…

SELECT JSON_MODIFY(@json, '$.Customer.Id', '101');

JSON2 8

As you can see the JSON_MODIFY function returns the modified JSON object. It’s worth noting that this function doesn’t modify the value in the existing JSON object, it just returns the modified JSON.

We can run the following to show this is the case…

SELECT JSON_MODIFY(@json, '$.Customer.Id', '101');
-- we don't actually see the data changed in the variable. modify just returns the amended JSON
SELECT JSON_VALUE(@json, '$.Customer.Id') AS NewCustomerId;

JSON2 9

… and we can see the CustomerID in @json is still 100.

What we can do is declare another variable and store the results of the JSON_MODIFY function there and use that variable in any further processing…

DECLARE @modifiedJson NVARCHAR(MAX);
SELECT @modifiedJson = JSON_MODIFY(@json, '$.Customer.Id', '101');
SELECT JSON_VALUE(@modifiedJson, '$.Customer.Id') AS NewCustomerId;

JSON2 10

Combining JSON_VALUE and JSON_QUERY

Now let’s have a look at a more complicated piece of JSON. In this example the Customer key has an array of JSON objects as it’s value. If we try to pull out the Customer Name value using the JSON_VALUE function we just get a NULL returned…

DECLARE @json AS NVARCHAR(MAX) = N'{
  "Customer" : [
    {
      "Id" : 100,
      "Name" : "Lionel Teddington",
      "Address" : {
        "AddrType" : "Home", 
        "AddrLine1" : "1 The Road",
        "AddrLine2" : "My Village",
        "Postcode" : "AA1 2BB"
      }
    },
    {
      "Id" : 101,
      "Name" : "Carolina Sweetcorn",
      "Address" : {
        "AddrType" : "Home", 
        "AddrLine1" : "1 The Road",
        "AddrLine2" : "My Village",
        "Postcode" : "AA1 2BB"
      }
    }
  ],
  "Shop" : {
    "ShopName" : "The best shop in the world",
    "ShopCode" : "0385920",
    "ShopAddress" : {
      "ShopPostCode" : "AA1 2BC"
    }
  }
}';

SELECT ISJSON(@json)

SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

JSON2 11

What we need to do is get the array of JSON objects using JSON_QUERY and then use JSON_VALUE to extract the required values from there…

-- must use JSON_QUERY for arrays
SELECT JSON_QUERY(@json, '$.Customer') AS CustomerJSON;

SELECT JSON_VALUE(
  JSON_QUERY(@json, '$.Customer'),
  '$.Name'
) AS CustomerName;

JSON2 12

The first SELECT statement shows the array being returned by the JSON_QUERY function but the second SELECT statement is still returning NULL.

This is because with arrays we need to specify which ordinal in the array we want to return the value for.

We can return the name for both customers in our array of JSON objects as follows…

SELECT
  JSON_VALUE(
    JSON_QUERY(@json, '$.Customer'),
    '$[0].Name'
  ) AS FirstCustomerName,
  JSON_VALUE(
    JSON_QUERY(@json, '$.Customer'),
    '$[1].Name'
  ) AS SecondCustomerName;

JSON2 13

Please note that the first ordinal is 0 and not 1

We have two JSON objects in our example but we still need to include the ordinal even if there is only one.

Storing Valid JSON

As previously mentioned there is no JSON data type but we may want to have some validation on any JSON objects that we store in our database.

Let’s create a new table that has just one column that we want to store some JSON in…

CREATE TABLE dbo.JSONDemo
(
  JsonDoc NVARCHAR(MAX)
);

… and now let’s insert some invalid JSON – note the missing double quote after the World value…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World}');

SELECT *
FROM dbo.JSONDemo JD

JSON2 14

This insert works without any errors and note the missing double quote in the results.

Now let’s clear out the table and add a check constraint using the ISJSON function…

TRUNCATE TABLE dbo.JSONDemo;

ALTER TABLE dbo.JSONDemo ADD CONSTRAINT CK_ValidJsonOnly CHECK (ISJSON(JsonDoc) = 1);

This means that now only valid JSON can be added to our table. If we try to insert the invalid JSON we get the following…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World}');

Msg 547, Level 16, State 0, Line 146
The INSERT statement conflicted with the CHECK constraint "CK_ValidJsonOnly". The conflict occurred in database "SimonBlog", table "dbo.JSONDemo", column 'JsonDoc'.
The statement has been terminated.

However, if we correct the JSON and add the missing closing double quote we can insert successfully…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World"}');

SELECT *
FROM dbo.JSONDemo JD

JSON2 15

The insert was successful and note all double quotes are now there.

Exam prep for 70-761

It’s been a little while since I took an exam so I’d thought I’d have a go at 70-761.

It covers a lot of the same material that was covered in 70-461 which was based on SQL Server 2012. 70-761 is based on SQL Server 2016 so I’m going to write about a few of the newer features in 2016 as part of my exam prep.

There is a good video on Pluralsight named “SQL Server 2016 New Features for Developers” by Leonard Lobel that covers the new features well.

Below are some details on some of the new features

I’ve also been working through the 70-761 guidebook by Itzik Ben-Gan and there are a few bits I’m less familiar with that I’m going to write about to make sure I fully understand them

GROUPING SETS (and ROLLUP & CUBE)

As part of my preparation for the 70-761 exam I’ve been trying to get a full understanding of some of the T-SQL I’m less familiar with. One example of this is the GROUPING SETS clause. There are also ROLLUP and CUBE clauses but I believe these are deprecated and GROUPING SETS can do everything ROLLUP and CUBE can do but with much more control.

The best way to explain how GROUPING SETS works is to show some example code. I’ll show how it compares to ROLLUP and CUBE in case you need to update any code to convert it to using GROUPING SETS.

So lets create some test data. I’m just creating a simple table containing a few purchases made by a few customers…

DROP TABLE IF EXISTS dbo.GroupingSetsDemo
GO

CREATE TABLE dbo.GroupingSetsDemo
 (
 Id INT NOT NULL IDENTITY(1,1),
 CustomerID INT NOT NULL,
 PurchaseDate DATETIME2(0) NOT NULL,
 PurchaseItem VARCHAR(100) NOT NULL,
 Amount DECIMAL(10,2) NOT NULL,
 );
 GO

INSERT INTO dbo.GroupingSetsDemo
 (
 CustomerID,
 PurchaseDate,
 PurchaseItem,
 Amount
 )
 VALUES
 (1, '20171106 12:34', 'Rugby Ball', 14.99),
 (1, '20180307 09:54', 'Frying Pan', 21.50),
 (1, '20180307 10:02', 'Vacuum Cleaner', 259.990),
 (2, '20171106 15:26', 'Orchid', 13.62),
 (2, '20180228 18:47', 'Book', 7.99),
 (2, '20180301 08:02', 'Snakes and Ladders', 8.00),
 (2, '20180306 14:32', 'Pillows', 20.00),
 (2, '20180306 14:32', 'Washing Machine', 359.99),
 (2, '20180307 23:15', 'Jeans', 59.99),
 (3, '20180301 09:27', 'Rucksack', 13.99),
 (3, '20180301 09:27', 'Scarf', 12.50);
 GO

SELECT *
 FROM dbo.GroupingSetsDemo;
 GO

GS1

If we want to see the total value of the purchases for each customer we can just group by the CustomerID and sum up the Amount values as follows…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY CustomerID

GS2

As well a grouping by columns we can group by what are called GROUPING SETS. This example just shows how we can use GROUPING SETS to mimic the previous query to get total sales by customer…

SELECT CustomerID, SUM(Amount) AS TotalAmount
 FROM dbo.GroupingSetsDemo WFD
 GROUP BY
 GROUPING SETS ((CustomerID));

GS2

So in this case we’ve just said to group by a single grouping set for CustomerID.

The really useful stuff starts to happen when we have more than one grouping set. One useful example is where you want to create a total value in the same data set. To do this we just need to add an additional grouping set telling SQL Server to also group over all columns to give us the total amount for all customers…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS ((CustomerID), ());

GS3

Please note, that you still need to make sure all columns in the SELECT clause appear in at least one grouping set. So we can’t just include the total group set from the above query. If we try, we get the following error as expected…

SELECT CustomerID, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (());
Msg 8120, Level 16, State 1, Line 56
 Column 'dbo.GroupingSetsDemo.CustomerID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

More interesting things start to happen when we want to group over multiple columns. Let’s bring the purchase year into our query as follows..

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS ((CustomerID, YEAR(WFD.PurchaseDate)));

GS4

We can now add more grouping sets to this query, for example, to give total amount for each customer…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID)
 );

GS5

… or for each year…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (YEAR(WFD.PurchaseDate))
 );

GS6

We can also combine everything to give us all possible total values

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID),
    (YEAR(WFD.PurchaseDate)),
    ()
 );

GS7

As mentioned above, the only requirement is that each non-aggregated column appears in at least one grouping set so we can return just the total for each customer and for each year as follows…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY 
 GROUPING SETS (
    (CustomerID),
    (YEAR(WFD.PurchaseDate))
 );

GS8

So what about ROLLUP and CUBE (but please remember these are deprecated)?

CUBE just means that the query is grouped by every possible grouping set, i.e. every combination of non-aggregated columns are used as the grouping sets. The syntax for using CUBE is as follows…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID
WITH CUBE
ORDER BY WFD.CustomerID, PurchaseYear

GS

This is the same as using the following grouping sets…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY
 GROUPING SETS (
    (YEAR(WFD.PurchaseDate), WFD.CustomerID),
    (WFD.CustomerID),
    (YEAR(WFD.PurchaseDate)),
    ()
 )
ORDER BY WFD.CustomerID, PurchaseYear

ROLLUP is slightly different and it depends on the order of the columns in the GROUP BY clause. It works by creating a grouping set across all columns in the GROUP BY clause, then creating additional sets across all columns except the last and each time dropping the last column off the end until it’s just left with the empty group. So if we’re grouping over three columns it will create a set for (1,2,3) then for (1,2) then for (1) and finally for (). Here’s an example…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY WFD.CustomerID, YEAR(WFD.PurchaseDate)
WITH ROLLUP

GS9

So this is grouping over both customer and year, then by just customer and finally over all columns. This is the same as creating the following grouping sets…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY
 GROUPING SETS (
    (CustomerID, YEAR(WFD.PurchaseDate)),
    (CustomerID),
    ()
 )

As mentioned, the order of the columns in the GROUP BY clause makes a difference and switching them from WFD.CustomerID, YEAR(WFD.PurchaseDate) to YEAR(WFD.PurchaseDate), WFD.CustomerID gives different results…

SELECT CustomerID, YEAR(WFD.PurchaseDate) AS PurchaseYear, SUM(Amount) AS TotalAmount
FROM dbo.GroupingSetsDemo WFD
GROUP BY YEAR(WFD.PurchaseDate), WFD.CustomerID
WITH ROLLUP

GS10

This time we’re grouping by year and customer, then by year, and finally by all columns.