70-761 Exam

Exam passed last night with a score of 870 out of 1000. On to 70-762 – Developing SQL Databases next…

There are a nice lot of resources listed on the following site for 70-762…

https://www.mssqltips.com/sqlservertip/4690/study-materials-for-developing-sql-databases-exam-70762/

Advertisements

70-761 Exam Prep: Miscellaneous T-SQL – Part 2

I’m currently preparing to take the exam 70-761 – Querying Data with Transact-SQL and have been going through various resources including Itzit Ben-Gan’s exam guidebook which is certainly worth reading if you’re planning on taking this exam.

There are quite a few smaller bits of T-SQL functionality that I’m not that familiar with that I’m going to demonstrate in this and further posts. It’s a bit of a random selection but hopefully it will help me remember them and might be of use to others as well.

Contents

  1. Date and Time Functions
  2. IIF (Inline IF Function)
  3. CHOOSE Function
  4. String Functions

1. Date and Time Functions

There are quite a few different Date and Time functions available in SQL Server. There are a few I used regularly such as GETUTCDATE, DATEPART and DATEADD but some I hardly use or have never used at all. I can never remember the syntax for most of them so I’m going to give an example of each one along with a bit of detail about each.

Returning current date and time

The following functions all return the current date and time.
The timezone of my PC is BST.

This function is specific to SQL Server and returns a DATETIME value containing the current date and time.

SELECT GETDATE(); 
2018-04-30 22:13:18.403

This function is specific to SQL Server and returns a DATETIME value containing the current UTC date and time.

SELECT GETUTCDATE();
2018-04-30 21:13:18.403

This function is SQL standard and returns a DATETIME value containing the current date and time.

SELECT CURRENT_TIMESTAMP;
2018-04-30 22:13:18.403

This function is SQL standard and returns a DATETIME2(7) value containing the current date and time.

SELECT SYSDATETIME(); 
2018-04-30 22:13:18.4048719

This function is SQL standard and returns a DATETIME2(7) value containing the current UTC date and time.

SELECT SYSUTCDATETIME(); 
2018-04-30 21:13:18.4048719

This function is SQL standard and returns a DATETIMEOFFSET value containing the current date, time and offset.
As mentioned above this is BST which is 1 hour ahead of UTC.

SELECT SYSDATETIMEOFFSET();
2018-04-30 22:13:18.4048719 +01:00

Date Parts

These following two functions can be used to extract different parts of a supplied date and time.
For example, the following will return the month number from the supplied date or time parameter (please note I’m passing in a VARCHAR value but this is implicitly converted to a DATETIME2 data type).

SELECT DATEPART(MONTH, '20180430');
4

The following will return the minute value from the supplied date time value.

SELECT DATEPART(MINUTE, '20180430 03:23:17');
23

If no time part is included in the date value then it’s assumed to be midnight which is why this returns 0.

SELECT DATEPART(MINUTE, '20180430');
0

The following returns the number of days since the 1st Jan.

SELECT DATEPART(DAYOFYEAR, '20180430');
120

The following returns any offset in minutes. The offset is three hours so this returns 180 minutes.

SELECT DATEPART(TZoffset, '20180430 13:30:00 +03:00')
180

DATEPART always returns an INT value.

We can use the DATENAME function to return textual values for date parts.
For example, the following returns the month name for the supplied date.

SELECT DATENAME(MONTH, '20180430')
April

The DATENAME function is language dependent and will return the value in the language of the user running the query.

We can use the following functions to return date and time values from integer date parts. Each function requires a different number parameters and all are mandatory in order to build up the returned value correctly.

The following returns a DATE value.

SELECT DATEFROMPARTS(2018,04,30)
2018-04-30

The following returns a DATETIME value.

SELECT DATETIMEFROMPARTS(2018,04,30,19,42,34,13)
2018-04-30 19:42:34.013

All 7 parameters in this one are required but we can obviously use zeroes if we don’t have the required level of precision.

SELECT DATETIMEFROMPARTS(2018,04,30,19,42,0,0)
2018-04-30 19:42:00.000

The following returns a DATETIME2 type with an eighth parameter used to specify the precision of the data type.
The “fraction” parameter (the seventh one) is related to the precision which determines how many digits are allowed in the “fraction” parameter.
For example,

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,1);
2018-04-30 19:42:34.5


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,2);
2018-04-30 19:42:34.05


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,3);
2018-04-30 19:42:34.005


SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,7);
2018-04-30 19:42:34.0000005

If we’re only allowing a precision of 1 we can’t specify 2 digits in the “fraction” parameter.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,15,1);
Msg 289, Level 16, State 5, Line 89
Cannot construct data type datetime2, some of the arguments have values which are not valid.

If we are using a precision of 0, which is just to the nearest second, the “fraction” parameter must be 0 otherwise we get the same error as above.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,5,0);
Msg 289, Level 16, State 5, Line 53
Cannot construct data type datetime2, some of the arguments have values which are not valid.

SELECT DATETIME2FROMPARTS(2018,04,30,19,42,34,0,0);
2018-04-30 19:42:34

The following returns a DATETIMEOFFSET value with the offset hour and minute parameters before the precision.

SELECT DATETIMEOFFSETFROMPARTS(2018,4,30,19,42,34,0,2,30,1);
2018-04-30 19:42:34.0 +02:30

The following function returns a DATE value containing the last date in the month of the supplied date.

SELECT EOMONTH('20180315');
2018-03-31

There is an optional offset parameter that can move forward or back the specified number of months

SELECT EOMONTH('20180315', -1);
2018-02-28
SELECT EOMONTH('20180315', 2);
2018-05-31

There isn’t a “start of month” function but further down I show how to do this using the DATEADD and DATEDIFF functions.

DATEADD

The DATEADD function can be used to add a specified number to the specified date part of the supplied date time value and returns a DATETIME value.
For example, the following adds 10 days to 10th April 2018.

SELECT DATEADD(DAY, 10, '20180410');
2018-04-20 00:00:00.000

The following one takes 1 month from 10th April 2018.

SELECT DATEADD(MONTH, -1, '20180410');
2018-03-10 00:00:00.000

DATEDIFF

This function compares a certain date part value between two datetime values and returns an INT value.
For example, the following calculates that there are 2 days difference between 1st April and 3rd April 2018.

SELECT DATEDIFF(DAY, '20180401', '20180403');
2

If the number returned is very large we can use this function to return the value as a BIGINT.

SELECT DATEDIFF_BIG(MILLISECOND, '19000101', GETUTCDATE());
3734111598403

Above we saw we can use the built in EOMONTH function to get the last day of the month for the supplied date.
Here’s a nice little trick to get the first date of the supplied month, in this case the current month. I’m running this on 30th April 2018

SELECT DATEADD(MONTH, DATEDIFF(MONTH, 0, GETUTCDATE()), 0);
2018-04-01 00:00:00.000

To explain how this works let’s first look at the DATEDIFF part that is passed in as the second parameter in the DATEADD function.
The start date parameter is supplied as zero. This defaults to 1st Jan 1900.
So the following two statements return the same value which is the number of months since 1st Jan 1900.
I’m running this in April 2018 so this returns 1419

SELECT DATEDIFF(MONTH, 0, GETUTCDATE())
SELECT DATEDIFF(MONTH, '19000101', GETUTCDATE())

The DATEADD function also uses 0 as the date parameter which again defaults to 1st Jan 1900.
So this function is saying “add 1419 months to 1st Jan 1900” and this gives us 1st April 2018.

Offest functions

We can use the TODATETIMEOFFSET function to add an offset to a date time value. This returns a DATETIMEOFFSET value.

SELECT TODATETIMEOFFSET('20180430 10:00', '+02:00')
2018-04-30 10:00:00.0000000 +02:00

We can use the SWITCHOFFSET function to switch from one time zone to another.
The first parameter has to be a DATETIMEOFFSET value and the second is the offset from UTC.  I’ve run this at 21:16 BST which is UTC + 1.

SELECT GETDATE();
2018-04-30 22:13:18.403


SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(), '+02:00');
2018-04-30 23:13:18.4048719 +02:00

We can also add an offset using “AT TIME ZONE” and specifically specify the time zone as follows.

SELECT CAST('20180415 17:32' AS DATETIME) AT TIME ZONE 'Central European Standard Time'
2018-04-15 17:32:00.000 +02:00

2. IIF (Inline IF function)

Most programming languages provide some sort of ternary or condition operator and SQL Server is no different.  It provides the Inline IF function (IIF)
This function accepts three parameters. The first is a boolean, or an expression that returns a boolean.  If the boolean value is true then the value supplied in the second parameter is the value returned by the function.  If the boolean value is false then the value supplied in the third parameter is the value returned by the function.

SELECT IIF(1=1, 1, 0)
1

SELECT IIF(1=0, 1, 0)
0

If you are more familiar with the CASE statement, the queries above are equivalent to the following…

SELECT CASE WHEN 1=1 THEN 1 ELSE 0 END
SELECT CASE WHEN 1=0 THEN 1 ELSE 0 END

3. CHOOSE Function

The CHOOSE function accepts at least 2 parameters.
The first parameter is the “index” and has to be an INT that says which value from the preceding parameters should be returned.
The following examples all use four parameters meaning there are three values available to chose from the parameters supplied after the “index” parameter.

The following will return the value in the first parameter after the “index” value

SELECT CHOOSE(1,'a','b','c')
a

The following will return the value in the second parameter after the “index” value

SELECT CHOOSE(2,'a','b','c')
b

The following will return the value in the third parameter after the “index” value

SELECT CHOOSE(3,'a','b','c')
c

If we pass an “index” value greater than the number of values to choose from then the function returns NULL.

SELECT CHOOSE(4,'a','b','c')
NULL

4. String Functions

There are several string functions provided in SQL Server and I’m going to demonstrate some of the ones I find most useful…

 

The following repeats the string “ab” 5 times…

SELECT REPLICATE('ab', 5);
ababababab

The following reverses the supplied string…

SELECT REVERSE('abcd');
dcba

The SPLIT_STRING table valued function will split a string based on a supplied separator. This is available from SQL Server 2016 onwards.
The following splits the string into a data set of rows using the underscore character as the separator…

SELECT *
FROM STRING_SPLIT('My_Name_Is_Simon', '_')

SPLIT_STRING

The STUFF function is one of the most powerful string manipulation tools.
It can be used to insert (or stuff) a string at any point inside another string.
It can be used to replace a certain number of characters at any point within a string and replace them with another string.
The scalar function returns a VARCHAR and accepts four parameters…
1. This is the original string supplied for modification
2. This is the starting point for the string manipulation
3. This is the number of characters in the original string that are going to be deleted starting at the character supplied in parameter 2
4. This is the string that we want to “stuff” into the original string at the character supplied in parameter 2

In the following example the original string is “abcde”. Parameter 2 says we’re going to perform the “stuff” at character 1, i.e. at character “a”.  Parmeter 3 says we’re deleting 0 characters from the original string and the final parameter is the the string we want to “stuff” in at character 1.
This will insert the character “x” at the beginning of the original string…

SELECT STUFF('abcde',1,0,'x')
xabcde

We can’t use STUFF to add “x” to the end of the string.  We would want to place it as the 6th character in this example but if we try this is just returns NULL.

SELECT STUFF('abcde',6,0,'x')
NULL

However, add a string to the end of another is just simple concatenation.

SELECT 'abcde' + 'x'
SELECT CONCAT('abcde', 'x')
abcdex

Now let’s try and replace the “c” character with the “x” character using the STUFF function…

SELECT STUFF('abcde', 3, 1, 'x')
abxde

Obviously this is not specifically looking for the “c” character as the REPLACE function would, it’s just replacing the third character in the original string with the string in the fourth parameter.

If we want to replace the second and third characters of the string we can do the following…

SELECT STUFF('abcde', 2, 2, 'x')
axde

This is saying delete the two characters starting at character 2, i.e. “bc”, and insert the string in the fourth parameter, i.e. “x”.

The fourth parameter doesn’t have to be a single character…

SELECT STUFF('abcde', 2, 2, 'alongerstring')
aalongerstringde

The statement won’t error if the third parameter is a bigger number than the number of characters in the original string.
For example, if we know the original string is less than 100 characters long we can run the following to replace everything but the first character…

SELECT STUFF('abcde', 2, 100, 'XXXXXXX')
aXXXXXXX
SELECT STUFF('fghijklmnopq', 2, 100, 'XXXXXXX')
fXXXXXXX

SQL Server now supplies dynamic data masking functions but we could use STUFF to do something similar such as mask all but the last four digits of a credit card.
Card numbers are typically 16 digits long so we could delete the first 12 digits by starting at digit 1 supplying 12 as the third parameter.  We could then replace that deleted part of the string with 12 asterisk characters…

SELECT STUFF('1234567812345678',1 , 12, REPLICATE('*', 12));
************5678

One final thing on STUFF is that it can be used along with FOR XML to concatenate rows of string data into a single string.  Details of this can be found here.

The following functions extract a certain number of characters from a supplied string
This returns the first 2 characters on the left…

SELECT LEFT('abcde', 2);
ab

This returns the last 2 characters…

SELECT RIGHT('abcde', 2);
de

This returns the 3 character starting from character 2…

SELECT SUBSTRING('abcde', 2, 3);
bcd

I’m only going to show one example of the FORMAT function as there is quite a lot to this one. For full information see the MS documentation.
This example will format a supplied UK phone number into a more readable format…

SELECT FORMAT(07777123678, '+44 (0) #### ### ###');

Please note as the value supplied in the first parameter is a number the leading zero is ignored.

There are several other string functions available and please see the MS documentation for full details.

 

 

Forcing default values on a nullable columns

When we add a new column to an existing table that contains data we need to think about if we need to populate the data in this column for the existing rows.

If the column should not allow NULL then we need to supply a default for the new column (or set it to allow NULL, backfill it, then change to not allow NULL).

We can add a default constraint to a column that allows NULL but the default is not applied unless we specifically say to set it

Here’s a demonstration of working with defaults on nullable columns.

First let’s create a table as follows…

DROP TABLE IF EXISTS dbo.WithValuesTest;
GO

CREATE TABLE dbo.WithValuesTest
(
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    FullName VARCHAR(100) NOT NULL
);
GO

INSERT INTO dbo.WithValuesTest
(
    FullName
)
VALUES
    ('Leslie Tambourine'),
    ('Linda Pencilcase'),
    ('Sally Carpets'),
    ('Andy Anchovie'),
    ('Ronnie Clotheshorse');
GO

SELECT *
FROM dbo.WithValuesTest

This gives us the following data…
DefaultNULL01

If we try to add a new not nullable column to this table this errors because there is no data to add into the new column

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL;
Msg 4901, Level 16, State 1, Line 86

ALTER TABLE only allows columns to be added that…

  1. can contain nulls, or
  2. have a DEFAULT definition specified, or
  3. the column being added is an identity or timestamp column, or
  4. the table must be empty

The column DateOfBirth cannot be added to the non-empty table WithValuesTest because it does not satisfy these conditions.

However, it we specify a default it will work…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest

DefaultNULL02

To continue let’s drop this new column

ALTER TABLE dbo.WithValuesTest DROP CONSTRAINT IF EXISTS DF_DOB;
ALTER TABLE dbo.WithValuesTest DROP COLUMN IF EXISTS DateOfBirth;

Now let’s try to create the same column with the same default but now allow the column to be nullable

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL03

As you can see, unlike when the column didn’t allow NULL, the default value has not been used.

Let’s drop the column again and this time create it with the WITH VALUES clause…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL DEFAULT '19000101' WITH VALUES;

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL04

To complete the demo let’s now try and add some new rows.

If we don’t include the DateOfBirth column in the insert list then the default is used…

INSERT INTO dbo.WithValuesTest (FullName)
VALUES ('Bernard Seesaw');

DefaultNULL05

Suppose our code needs to insert the date of birth if it’s supplied but should use the default value if it’s not supplied, i.e. if it’s NULL.

If we just try a straight insert using the NULL value then NULL is what is put into the table.

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Lizzie Onion', '19830402'),
('Terence Artichoke', NULL);

DefaultNULL06

We can specify DEFAULT instead of NULL and it will use the default value on the insert

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Mavis Breadbin', DEFAULT);

DefaultNULL07

The DEFAULT value can be used in an UPDATE statement as well…

UPDATE dbo.WithValuesTest
SET DateOfBirth = DEFAULT
WHERE FullName = 'Terence Artichoke'

DefaultNULL08

 

Partitioning Series

I’ve now written a few different posts about various parts of partitioning so thought I would put them all together here in a single series.

  1. Introduction to Partitioning – this post covers how to set up a table with multiple partitions using a partition scheme and function and briefly touches on partition elimination to help reduce the amount of data being read in a query.
  2. More Partitioning – this covers partition elimination in more detail and includes tips to make sure the optimiser knows it can eliminate certain partitions. It also covers the $PARTITION function.
  3. Using $PARTITION to find the last time an event occured – this post demonstrates a real life scenario where I’ve been able to use the $PARTITION function to speed up a query.
  4. 70-761 Exam Prep: Miscellaneous T-SQL – Part 1 – section 3 of this post covers truncating partitions which is new functionality available in SQL Server 2016.
  5. Modifying Partitions – this post shows how to add and remove partitions from an existing partition function.

SSDT: Target Platform and Compatibility Level

I have just seen an issue in one of my database release scripts where I’m getting a load of statements like the following….

PRINT N'Altering [dbo].[TestQueue]...';

GO

PRINT N'No script could be generated for operation 2 on object ''[dbo].[TestQueue]''.';

GO

This is because of the “Target Platform” setting on the properties of the project.  For this database it was set to 2008 as follows:

This needs to be set to the version of the SQL Server that the database sits on in live.  In this case it was SQL Server 2014.

When I changed this it automatically updated the compatibility level of the database to 2014.  However, although sitting on a SQL Server 2014 instance this database is still on 2008 R2 compatibility level so I had to click on “Database Settings” then click on the “Miscellaneous” tab and change it back to 2008 as follows.

Then when I generated the scripts again all the service broker queue alter statements were gone.

In summary, the “Target Platform” needs to be the version of the SQL Server Instance and the “Compatibility level” is the compatibility level of the database itself.

Modifying Partitions

In a previous post I talked about how to partition a table and touched on partition elimination which allows the optimiser to create a query plan where a much smaller amount of data is read.

A lot of the partitioning functions I’ve used in my working life are based on month, usually on reporting tables, where people need to be able to see all historic data but usually are only concerned with the current or last few months. When querying this data, partition elimination allows SQL Server to just read the data on the required partition(s) instead of the whole table. This is very useful if you have a lot of data going back a long time and usually query this data using a date type column.

However, when you first create your partition function based on a date you probably don’t go too far in the future and you might find that all of a sudden two years have passed and all your new data is just being put on a single large latest partition that just keeps growing as the months go by.

Even worse than that you might find the situation I had today where a new database had been recently released to production using an old database schema as a rough template. This database only contains data from March 2018 onward but the partition function for a certain large table was split by month from 1st Jan 2012 to 1st Jan 2016. So there were 48 partitions covering the 4 years from 2012 to 2015 inclusive but all the data is just sitting in the last partition.

What I needed to do was get rid of all the old partitions and just have a single one up to March 2018 and then create new partitions every month for the next few months.

I tried changing the partition function in SSDT and letting that sort things out but it attempts to rebuild the table and drop and recreate everything.

There is a way to do this using the ALTER PARTITION FUNCTION and ALTER PARTITION SCHEME statements and this is how I did it…

To recreate this issue I’ve created a new partition function named PF_MonthlyFinance which has monthly partitions for 2012 to 2015. Then I created a new partition scheme named PS_MonthlyFinance for this function with all partitions on the PRIMARY filegroup. This is the first few rows of the sys.partition_range_values view…

01

… and this is the last few rows…

02

First I thought I’d create the new partitions. We can do this using the split method of the ALTER PARTITION FUNCTION to add the first partition for March 2018.

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180301');

Now if we look at the last few rows of sys.partition_range_values we can see the new partition…

03

So now let’s add the new one for April 2018…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180401');

However, when we run this we get the following error…

Msg 7710, Level 16, State 1, Line 16
Warning: The partition scheme 'PS_MonthlyFinance' does not have any next used filegroup. Partition scheme has not been changed.

You might have noticed that when we created the PARTITION SCHEME the following message was returned…

Partition scheme 'PS_MonthlyFinance' has been created successfully. 'PRIMARY' is marked as the next used filegroup in partition scheme 'PS_MonthlyFinance'.

The next filegroup has already been used when creating the partition for March 2018 so now we need to say what filegroup the next partition should be on. We can do this with the ALTER PARTITION SCHEME statement as follows…

ALTER PARTITION SCHEME PS_MonthlyFinance 
NEXT USED [PRIMARY];

We can re-run the following and now it works fine…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
SPLIT RANGE ('20180401');

04

Rather than adding each partition manually we can script it out to create a few in one go. I’ve added this code at the bottom of this post.

Before that I want to show how we can get rid of the old redundant partitions. Again we can use the ALTER PARTITION FUNCTION but this time we use the merge method instead of split as follows…

ALTER PARTITION FUNCTION PF_MonthlyFinance () 
MERGE RANGE ('20120101');

Now if we look at the first few rows of sys.partition_range_values we can see the partition for Jan 2012 is now gone…

05

Here’s a single piece of code that can be used to delete all these old partitions…

DECLARE @StartDate DATE = '20120101';
DECLARE @EndDate DATE = '20160101';

WHILE @StartDate < @EndDate
BEGIN

  IF EXISTS (
    SELECT 1
    FROM sys.partition_functions PF
    JOIN sys.partition_range_values PRV ON PRV.function_id = PF.function_id
    WHERE PF.name = 'PF_MonthlyFinance'
    AND PRV.value = @StartDate
  )
  BEGIN

    ALTER PARTITION FUNCTION PF_MonthlyFinance () 
    MERGE RANGE (@StartDate);

  END
 
  SELECT @StartDate = DATEADD(MONTH, 1, @StartDate);

END

We can now see only the two new partitions exist…

06

Now we can use the following to create the next batch of partitions going forward…

DECLARE @StartDate DATE = '20180301';
DECLARE @NumParts INT = 24;
DECLARE @Counter INT = 1;

WHILE @Counter <= @NumParts
BEGIN

  IF NOT EXISTS (
    SELECT 1
    FROM sys.partition_functions PF
    JOIN sys.partition_range_values PRV ON PRV.function_id = PF.function_id
    WHERE PF.name = 'PF_MonthlyFinance'
    AND PRV.value = @StartDate
  )
  BEGIN

    ALTER PARTITION FUNCTION PF_MonthlyFinance () 
    SPLIT RANGE (@StartDate);

    ALTER PARTITION SCHEME PS_MonthlyFinance 
    NEXT USED [PRIMARY];

  END
 
  SELECT @StartDate = DATEADD(MONTH, 1, @StartDate);
  SELECT @Counter += 1;

END

This will create 24 monthly partitions starting from March 2018 and now we can see all the new partitions exist…

07

JSON – Part 3: OPENJSON

In previous posts I demonstrated how to use the FOR JSON clause and how some of the built in JSON scalar functions work. In this final post in this series I’m going to demonstrate how the OPENJSON function works.

There are two ways we can use the OPENJSON function. The first is in discovery mode where the schema of the JSON we are dealing with is unknown. The second is to shred data into usable sets when we know the schema.

Firstly, I’m going to demonstrate how to use it in discovery mode to work out what data is in a supplied piece of JSON.

In this example the JSON is short and formatted so it’s easy to read so we know what’s in it just by looking at it, but when that’s not the case the OPENJSON function can be very useful to see what properties exist within a particular JSON object.

The following JSON has data of a few different types and we can just pass it straight into OPENJSON to return some useful details. The OPENJSON function is a table valued function and always returns a data set. For example…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json);

JSON3 1

… we can see details about each key-value pair in the JSON object along with a number indicating the type.

The type value corresponds to the following:

Value of the Type column JSON data type
0 null
1 string
2 int
3 true/false
4 array
5 object

That example used a single JSON object but if our JSON is an array of JSON objects then the OPENJSON function will shred them into individual rows and the key column contains the ordinal position of the JSON object in the array as follows…

DECLARE @json_array NVARCHAR(MAX) = 
'[
  {
    "CharacterID" : 12,
    "CharacterName" : "Egwene Al''Vere",
    "IsFemale" : true,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
    "Children" : null
  },
  {
    "CharacterID" : 13,
    "ChacterName" : "Perrin Aybara",
    "IsFemale" : false,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Egwene Al''Vere", "Matrim Cauthon", "Nynaeve Al''Merea"],
    "Children" : null
  }
]';

SELECT *
FROM OPENJSON(@json_array);

JSON3 2

If we want to access a single value from an array we can use the second optional parameter to drill down straight to that value as follows…

SELECT *
FROM OPENJSON(@json_array, '$[1]');

JSON3 3

This returns the shredded data from the second JSON object in the array (ordinal positions are zero based).

If the second parameter (the path to start shredding) is not supplied then the entire object specified in the first variable is used. This is the same a specifying “$” as the path. The following two queries are the same…

SELECT *
FROM OPENJSON(@json_array);

SELECT *
FROM OPENJSON(@json_array, '$');

JSON3 4

That covers a few examples of using OPENJSON in discovery mode and now I’m going to show a few examples of how to use it if you know a bit about the schema of the JSON supplied.

We’ve used OPENJSON to discover the schema of our @json_array object so now we can use this and some of the available scalar functions to shred our object into one or more tables.

In this example we’ve got an array and we want to shred that data into individual rows in a data set linked to a header data set containing the rest of the data. One way to do this is as follows…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT
 JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
 JSON_VALUE(@json, '$.CharacterName') AS CharacterName,
 JSON_VALUE(@json, '$.IsFemale') AS IsFemale,
 JSON_VALUE(@json, '$.Address.AddressLine1') AS AddressLine1,
 JSON_VALUE(@json, '$.Address.AddressLine2') AS AddressLine2,
 JSON_VALUE(@json, '$.Children') AS Children

SELECT
  JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
  OJ.value AS Friend
FROM OPENJSON(@json, '$.Friends') OJ;

JSON3 5

Here’s another example where the array is an array of JSON objects rather than individual values…

DECLARE @json_friend NVARCHAR(MAX) = 
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : [
    {
      "CharacterName" : "Rand Al''Thor",
      "CharacterID" : 7,
      "IsFemale" : false
    },
    {
      "CharacterName" : "Perrin Aybara",
      "CharacterID" : 13,
      "IsFemale" : false
    }, 
    {
      "CharacterName" : "Matrim Cauthon",
      "CharacterID" : 16,
      "IsFemale" : false
    }
  ],
  "Children" : null
}';

SELECT
  JSON_VALUE(@json_friend, '$.CharacterID') AS CharacterID,
  JSON_VALUE(OJ.Value, '$.CharacterName') AS FriendCharacterName,
  JSON_VALUE(OJ.Value, '$.CharacterID') AS FriendCharacterID,
  JSON_VALUE(OJ.Value, '$.IsFemale') AS FriendIsFemale
FROM OPENJSON(@json_friend, '$.Friends') OJ;

JSON3 6

As you can see we can use JSON_VALUE to pull out the individual values from the JSON objects in the Friends key.

There is another way we can shred the JSON using OPENJSON and this is by specifying a schema for the data set returned. For example…

DECLARE @json_known NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  Address NVARCHAR(MAX) '$.Address' AS JSON,
  Friends NVARCHAR(MAX) '$.Friends' AS JSON,
  Children VARCHAR(500) '$.Children'
);

JSON3 7

We have to specify the data type for each of the values we want to return in the data set. If the value we are returning is a JSON object or an array then we need to include the AS JSON clause at the end of the column declaration. When we use AS JSON the data type must be NVARCHAR(MAX). If we try to use anything else we get the following error…

AS JSON option can be specified only for column of nvarchar(max) type in WITH clause.

We don’t need to include every column in the data set so the following is fine…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName'
);

JSON3 8

We have to make sure the data types are correct. For example if we try to return a string value into an INT data type we get the following error as expected…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName INT '$.CharacterName'
);

Conversion failed when converting the nvarchar value 'Egwene Al'Vere' to data type int.

JSON is semi-structured so not every property in the table declaration needs to exist. For example, there is no CharacterAge property in the JSON but the following will still run correctly with NULL returned as the CharacterAge

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterAge INT '$.CharacterAge'
);

JSON3 9

If the JSON object contains embedded JSON or an array we are able to access values inside these by navigating down through the embedded objects or by using the ordinal position of the values in the arrays. We can pull all the values out of the @json_known object as follows…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  AddressLine1 VARCHAR(500) '$.Address.AddressLine1',
  AddressLine2 VARCHAR(500) '$.Address.AddressLine2',
  AddressLine3 VARCHAR(500) '$.Address.AddressLine3',
  Friends0 VARCHAR(100) '$.Friends[0]',
  Friends1 VARCHAR(100) '$.Friends[1]',
  Friends2 VARCHAR(100) '$.Friends[2]',
  Friends3 VARCHAR(100) '$.Friends[3]',
  Children VARCHAR(500) '$.Children'
);

JSON3 10.PNG

Lax and Strict

The last thing I want to demonstrate are an additional couple of options we can use in the path parameter of the OPENJSON function. These options are lax and strict.

Strict means that the path specified in the second parameter of the OPENJSON function must exist and lax means that it doesn’t have to. Lax is the default if nothing is supplied so all the examples above use the lax option. Here are some examples using these options…

DECLARE @json VARCHAR(500) = '{"Hello": {"Name" : "Lax Strict Test"} }';

SELECT *
FROM OPENJSON(@json, '$.Goodbye');

SELECT *
FROM OPENJSON(@json, 'lax $.Goodbye');

SELECT *
FROM OPENJSON(@json, 'strict $.Hello');

SELECT *
FROM OPENJSON(@json, 'strict $.Goodbye');

JSON3 11

Property cannot be found on the specified JSON path.

As we can see the first two queries are exactly the same: this shows the lax option is the default. The third query (the first using the strict option) returns the data correctly because the Hello key exists. The fourth and final query returns an error because the Goodbye key does not exist and the strict option is used.