JSON – Part 3: OPENJSON

In previous posts I demonstrated how to use the FOR JSON clause and how some of the built in JSON scalar functions work. In this final post in this series I’m going to demonstrate how the OPENJSON function works.

There are two ways we can use the OPENJSON function. The first is in discovery mode where the schema of the JSON we are dealing with is unknown. The second is to shred data into usable sets when we know the schema.

Firstly, I’m going to demonstrate how to use it in discovery mode to work out what data is in a supplied piece of JSON.

In this example the JSON is short and formatted so it’s easy to read so we know what’s in it just by looking at it, but when that’s not the case the OPENJSON function can be very useful to see what properties exist within a particular JSON object.

The following JSON has data of a few different types and we can just pass it straight into OPENJSON to return some useful details. The OPENJSON function is a table valued function and always returns a data set. For example…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json);

JSON3 1

… we can see details about each key-value pair in the JSON object along with a number indicating the type.

The type value corresponds to the following:

Value of the Type column JSON data type
0 null
1 string
2 int
3 true/false
4 array
5 object

That example used a single JSON object but if our JSON is an array of JSON objects then the OPENJSON function will shred them into individual rows and the key column contains the ordinal position of the JSON object in the array as follows…

DECLARE @json_array NVARCHAR(MAX) = 
'[
  {
    "CharacterID" : 12,
    "CharacterName" : "Egwene Al''Vere",
    "IsFemale" : true,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
    "Children" : null
  },
  {
    "CharacterID" : 13,
    "ChacterName" : "Perrin Aybara",
    "IsFemale" : false,
    "Address" : {
      "AddressLine1" : "Emonds Field",
      "AddressLine2" : "The Two Rivers"
    },
    "Friends" : ["Rand Al''Thor", "Egwene Al''Vere", "Matrim Cauthon", "Nynaeve Al''Merea"],
    "Children" : null
  }
]';

SELECT *
FROM OPENJSON(@json_array);

JSON3 2

If we want to access a single value from an array we can use the second optional parameter to drill down straight to that value as follows…

SELECT *
FROM OPENJSON(@json_array, '$[1]');

JSON3 3

This returns the shredded data from the second JSON object in the array (ordinal positions are zero based).

If the second parameter (the path to start shredding) is not supplied then the entire object specified in the first variable is used. This is the same a specifying “$” as the path. The following two queries are the same…

SELECT *
FROM OPENJSON(@json_array);

SELECT *
FROM OPENJSON(@json_array, '$');

JSON3 4

That covers a few examples of using OPENJSON in discovery mode and now I’m going to show a few examples of how to use it if you know a bit about the schema of the JSON supplied.

We’ve used OPENJSON to discover the schema of our @json_array object so now we can use this and some of the available scalar functions to shred our object into one or more tables.

In this example we’ve got an array and we want to shred that data into individual rows in a data set linked to a header data set containing the rest of the data. One way to do this is as follows…

DECLARE @json NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT
 JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
 JSON_VALUE(@json, '$.CharacterName') AS CharacterName,
 JSON_VALUE(@json, '$.IsFemale') AS IsFemale,
 JSON_VALUE(@json, '$.Address.AddressLine1') AS AddressLine1,
 JSON_VALUE(@json, '$.Address.AddressLine2') AS AddressLine2,
 JSON_VALUE(@json, '$.Children') AS Children

SELECT
  JSON_VALUE(@json, '$.CharacterID') AS CharacterID,
  OJ.value AS Friend
FROM OPENJSON(@json, '$.Friends') OJ;

JSON3 5

Here’s another example where the array is an array of JSON objects rather than individual values…

DECLARE @json_friend NVARCHAR(MAX) = 
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : [
    {
      "CharacterName" : "Rand Al''Thor",
      "CharacterID" : 7,
      "IsFemale" : false
    },
    {
      "CharacterName" : "Perrin Aybara",
      "CharacterID" : 13,
      "IsFemale" : false
    }, 
    {
      "CharacterName" : "Matrim Cauthon",
      "CharacterID" : 16,
      "IsFemale" : false
    }
  ],
  "Children" : null
}';

SELECT
  JSON_VALUE(@json_friend, '$.CharacterID') AS CharacterID,
  JSON_VALUE(OJ.Value, '$.CharacterName') AS FriendCharacterName,
  JSON_VALUE(OJ.Value, '$.CharacterID') AS FriendCharacterID,
  JSON_VALUE(OJ.Value, '$.IsFemale') AS FriendIsFemale
FROM OPENJSON(@json_friend, '$.Friends') OJ;

JSON3 6

As you can see we can use JSON_VALUE to pull out the individual values from the JSON objects in the Friends key.

There is another way we can shred the JSON using OPENJSON and this is by specifying a schema for the data set returned. For example…

DECLARE @json_known NVARCHAR(MAX) =
'{
  "CharacterID" : 12,
  "CharacterName" : "Egwene Al''Vere",
  "IsFemale" : true,
  "Address" : {
    "AddressLine1" : "Emonds Field",
    "AddressLine2" : "The Two Rivers"
  },
  "Friends" : ["Rand Al''Thor", "Perrin Aybara", "Matrim Cauthon"],
  "Children" : null
}';

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  Address NVARCHAR(MAX) '$.Address' AS JSON,
  Friends NVARCHAR(MAX) '$.Friends' AS JSON,
  Children VARCHAR(500) '$.Children'
);

JSON3 7

We have to specify the data type for each of the values we want to return in the data set. If the value we are returning is a JSON object or an array then we need to include the AS JSON clause at the end of the column declaration. When we use AS JSON the data type must be NVARCHAR(MAX). If we try to use anything else we get the following error…

AS JSON option can be specified only for column of nvarchar(max) type in WITH clause.

We don’t need to include every column in the data set so the following is fine…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName'
);

JSON3 8

We have to make sure the data types are correct. For example if we try to return a string value into an INT data type we get the following error as expected…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName INT '$.CharacterName'
);

Conversion failed when converting the nvarchar value 'Egwene Al'Vere' to data type int.

JSON is semi-structured so not every property in the table declaration needs to exist. For example, there is no CharacterAge property in the JSON but the following will still run correctly with NULL returned as the CharacterAge

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterAge INT '$.CharacterAge'
);

JSON3 9

If the JSON object contains embedded JSON or an array we are able to access values inside these by navigating down through the embedded objects or by using the ordinal position of the values in the arrays. We can pull all the values out of the @json_known object as follows…

SELECT *
FROM OPENJSON(@json_known)
WITH 
(
  CharacterID INT '$.CharacterID',
  CharacterName VARCHAR(200) '$.CharacterName',
  IsFemale BIT '$.IsFemale',
  AddressLine1 VARCHAR(500) '$.Address.AddressLine1',
  AddressLine2 VARCHAR(500) '$.Address.AddressLine2',
  AddressLine3 VARCHAR(500) '$.Address.AddressLine3',
  Friends0 VARCHAR(100) '$.Friends[0]',
  Friends1 VARCHAR(100) '$.Friends[1]',
  Friends2 VARCHAR(100) '$.Friends[2]',
  Friends3 VARCHAR(100) '$.Friends[3]',
  Children VARCHAR(500) '$.Children'
);

JSON3 10.PNG

Lax and Strict

The last thing I want to demonstrate are an additional couple of options we can use in the path parameter of the OPENJSON function. These options are lax and strict.

Strict means that the path specified in the second parameter of the OPENJSON function must exist and lax means that it doesn’t have to. Lax is the default if nothing is supplied so all the examples above use the lax option. Here are some examples using these options…

DECLARE @json VARCHAR(500) = '{"Hello": {"Name" : "Lax Strict Test"} }';

SELECT *
FROM OPENJSON(@json, '$.Goodbye');

SELECT *
FROM OPENJSON(@json, 'lax $.Goodbye');

SELECT *
FROM OPENJSON(@json, 'strict $.Hello');

SELECT *
FROM OPENJSON(@json, 'strict $.Goodbye');

JSON3 11

Property cannot be found on the specified JSON path.

As we can see the first two queries are exactly the same: this shows the lax option is the default. The third query (the first using the strict option) returns the data correctly because the Hello key exists. The fourth and final query returns an error because the Goodbye key does not exist and the strict option is used.

Advertisements

JSON – Part 2: Built in Functions

In a previous post I demonstrated how to use the FOR JSON clause and in this post I’m going to show some of the in built JSON functions. In a future post I’ll show how the OPENJSON function works.

JSON in SQL Server 2016

Built in support for JSON was introduced in SQL Server 2016. There is no native support meaning that there is no JSON data type in the same way that there is an XML data type but there are certain built in functions that allow us to manipulate JSON and convert data sets into JSON. As mentioned above, in a previous post I covered converting data sets into JSON using the FOR JSON clause, so let’s now look at a few demos extracting values out of and storing JSON data.

Validating JSON

When we use JSON in SQL Server we would usually have it in a text string.

Here’s a simple example…

DECLARE @json AS NVARCHAR(MAX) = N'{
   "Customer" : {
       "Id" : 100,
       "Name" : "Lionel Teddington",
       "Address" : {
           "AddrType" : "Home",
           "AddrLine1" : "1 The Road",
           "AddrLine2" : "My Village",
           "Postcode" : "AA1 2BB"
       }
    }
}';

So now we’ve got a JSON object stored in the variable @json.

One of the built in functions will return a boolean indicating if the JSON is correctly formatted or not.

SELECT ISJSON(@json) AS IsValidJson;

The 1 value returned by this function tells us that the JSON is valid.

If we try the following…

DECLARE @json AS NVARCHAR(MAX) = N'{
    "Customer" : {
        "Id" : 100,
        "Name" : "Lionel Teddington",
        "Address" : {
            "AddrType" : "Home",
            "AddrLine1" : "1 The Road",
            "AddrLine2" : "My Village,
            "Postcode" : "AA1 2BB"
        }
    }
}';
 
SELECT ISJSON(@json) AS IsValidJson;

… we can see the My Village value is missing the closing double quote which makes the JSON invalid, hence the value of 0 being returned.

Extracting values from JSON

There are two built in functions that we can use to extract values from a JSON object. The first is JSON_VALUE which can be used to extract a scalar value for a particular key in the JSON object. The second in JSON_QUERY which returns a JSON object (as a text string) that is embedded in values or arrays within the JSON.

If we use the valid JSON object from above we can extract the customer name as follows…

SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

As we can see the JSON_VALUE function accepts two parameters. The first is the JSON object itself and the second is the path to the key corresponding to the value we want to extract. The $ (dollar sign) is used to represent the JSON object specified in the first parameter and from this point we can navigate down through the key-value pairs to get the required value.

As mentioned above, the JSON is just a text string and the JSON_VALUE function doesn’t perform any validation. The same query will work if we use the invalid JSON object from the example above…

DECLARE @json AS NVARCHAR(MAX) = N'{
    "Customer" : {
        "Id" : 100,
        "Name" : "Lionel Teddington",
        "Address" : {
            "AddrType" : "Home",
            "AddrLine1" : "1 The Road",
            "AddrLine2" : "My Village,
            "Postcode" : "AA1 2BB"
        }
    }
}';

SELECT ISJSON(@json) AS IsValidJson;

-- if the JSON is invalid you can still query it
 SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

… but we can only extract values up to the point where the JSON object becomes invalid. In this example the JSON becomes invalid after the My Village value as there is a missing double quote. So if we try to extract the value for the Postcode which appears after the My Village value we see the following error…

SELECT ISJSON(@json) AS IsValidJson;

-- if the JSON is invalid you can still query it
 SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;
 -- but only up to the point where the JSON becomes invalid
 SELECT JSON_VALUE(@json, '$.Customer.Address.Postcode') AS Postcode;

(1 row affected)

(1 row affected)
 Msg 13609, Level 16, State 2, Line 41
 JSON text is not properly formatted. Unexpected character '"' is found at position 160.

The JSON is treated as case sensitive so the following returns NULL on the valid JSON example because the Name key has a lower case n

SELECT JSON_VALUE(@json, '$.Customer.name') AS CustomerName;

If we look at the valid JSON example we can see that the Address key has a JSON object as it’s value. We can’t return this with the JSON_VALUE function and this would just return NULL, but we can use JSON_QUERY as follows…

SELECT JSON_QUERY(@json, '$.Customer.Address') AS AddressJson;

Like JSON_VALUE this function accepts a JSON object as the first parameter and then a path to the embedded JSON as the second parameter.

Modifying JSON objects

The fourth and final function I’m going to demonstrate is the JSON_MODIFY function which allows us to modify values for certain keys inside the supplied JSON.

For example, we can run the following to change the CustomerID value from 100 to 101…

SELECT JSON_MODIFY(@json, '$.Customer.Id', '101');

JSON2 8

As you can see the JSON_MODIFY function returns the modified JSON object. It’s worth noting that this function doesn’t modify the value in the existing JSON object, it just returns the modified JSON.

We can run the following to show this is the case…

SELECT JSON_MODIFY(@json, '$.Customer.Id', '101');
-- we don't actually see the data changed in the variable. modify just returns the amended JSON
SELECT JSON_VALUE(@json, '$.Customer.Id') AS NewCustomerId;

JSON2 9

… and we can see the CustomerID in @json is still 100.

What we can do is declare another variable and store the results of the JSON_MODIFY function there and use that variable in any further processing…

DECLARE @modifiedJson NVARCHAR(MAX);
SELECT @modifiedJson = JSON_MODIFY(@json, '$.Customer.Id', '101');
SELECT JSON_VALUE(@modifiedJson, '$.Customer.Id') AS NewCustomerId;

JSON2 10

Combining JSON_VALUE and JSON_QUERY

Now let’s have a look at a more complicated piece of JSON. In this example the Customer key has an array of JSON objects as it’s value. If we try to pull out the Customer Name value using the JSON_VALUE function we just get a NULL returned…

DECLARE @json AS NVARCHAR(MAX) = N'{
  "Customer" : [
    {
      "Id" : 100,
      "Name" : "Lionel Teddington",
      "Address" : {
        "AddrType" : "Home", 
        "AddrLine1" : "1 The Road",
        "AddrLine2" : "My Village",
        "Postcode" : "AA1 2BB"
      }
    },
    {
      "Id" : 101,
      "Name" : "Carolina Sweetcorn",
      "Address" : {
        "AddrType" : "Home", 
        "AddrLine1" : "1 The Road",
        "AddrLine2" : "My Village",
        "Postcode" : "AA1 2BB"
      }
    }
  ],
  "Shop" : {
    "ShopName" : "The best shop in the world",
    "ShopCode" : "0385920",
    "ShopAddress" : {
      "ShopPostCode" : "AA1 2BC"
    }
  }
}';

SELECT ISJSON(@json)

SELECT JSON_VALUE(@json, '$.Customer.Name') AS CustomerName;

JSON2 11

What we need to do is get the array of JSON objects using JSON_QUERY and then use JSON_VALUE to extract the required values from there…

-- must use JSON_QUERY for arrays
SELECT JSON_QUERY(@json, '$.Customer') AS CustomerJSON;

SELECT JSON_VALUE(
  JSON_QUERY(@json, '$.Customer'),
  '$.Name'
) AS CustomerName;

JSON2 12

The first SELECT statement shows the array being returned by the JSON_QUERY function but the second SELECT statement is still returning NULL.

This is because with arrays we need to specify which ordinal in the array we want to return the value for.

We can return the name for both customers in our array of JSON objects as follows…

SELECT
  JSON_VALUE(
    JSON_QUERY(@json, '$.Customer'),
    '$[0].Name'
  ) AS FirstCustomerName,
  JSON_VALUE(
    JSON_QUERY(@json, '$.Customer'),
    '$[1].Name'
  ) AS SecondCustomerName;

JSON2 13

Please note that the first ordinal is 0 and not 1

We have two JSON objects in our example but we still need to include the ordinal even if there is only one.

Storing Valid JSON

As previously mentioned there is no JSON data type but we may want to have some validation on any JSON objects that we store in our database.

Let’s create a new table that has just one column that we want to store some JSON in…

CREATE TABLE dbo.JSONDemo
(
  JsonDoc NVARCHAR(MAX)
);

… and now let’s insert some invalid JSON – note the missing double quote after the World value…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World}');

SELECT *
FROM dbo.JSONDemo JD

JSON2 14

This insert works without any errors and note the missing double quote in the results.

Now let’s clear out the table and add a check constraint using the ISJSON function…

TRUNCATE TABLE dbo.JSONDemo;

ALTER TABLE dbo.JSONDemo ADD CONSTRAINT CK_ValidJsonOnly CHECK (ISJSON(JsonDoc) = 1);

This means that now only valid JSON can be added to our table. If we try to insert the invalid JSON we get the following…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World}');

Msg 547, Level 16, State 0, Line 146
The INSERT statement conflicted with the CHECK constraint "CK_ValidJsonOnly". The conflict occurred in database "SimonBlog", table "dbo.JSONDemo", column 'JsonDoc'.
The statement has been terminated.

However, if we correct the JSON and add the missing closing double quote we can insert successfully…

INSERT INTO dbo.JSONDemo
(
  JsonDoc
)
VALUES
(N'{ "Hello" : "World"}');

SELECT *
FROM dbo.JSONDemo JD

JSON2 15

The insert was successful and note all double quotes are now there.

JSON – Part 1: FOR JSON

JSON support was introduced in SQL Server 2016 and is part of the 70-761 – Querying Data with Transact-SQL exam.

Some of it works in a similar way to the XML functionality that has been around for a long time, for example, there is a FOR JSON clause that returns data in JSON format in a similar way FOR XML returns data in XML format. There are other functions available for reading and processing JSON data and I’ll cover some of those in future posts as part of my prep for exam 70-761.

Full details of JSON in SQL Server can be found here.

FOR JSON

Let’s create a table containing a few different data types and see how the FOR JSON clause works.

DROP TABLE IF EXISTS dbo.ForJsonTest1;
GO

CREATE TABLE dbo.ForJsonTest1
(
    IntColumn INT NOT NULL,
    BitColumn BIT NOT NULL,
    DecColumn DECIMAL(5,2) NOT NULL,
    FixedCharColumn CHAR(5) NOT NULL,
    VariableCharColumn VARCHAR(10) NOT NULL,
    DateColumn DATE NOT NULL,
    DateTimeColumn DATETIME2(7) NOT NULL,
    BinaryColumn VARBINARY(10) NOT NULL,
    MoneyColumn MONEY NOT NULL,
    UniqueIDColumn UNIQUEIDENTIFIER NOT NULL,
    NullableColumn VARCHAR(10) NULL
);
GO

INSERT INTO dbo.ForJsonTest1
(
    IntColumn,
    BitColumn,
    DecColumn,
    FixedCharColumn,
    VariableCharColumn,
    DateColumn,
    DateTimeColumn,
    BinaryColumn,
    MoneyColumn,
    UniqueIDColumn,
    NullableColumn
)
VALUES
(
    100,
    1,
    12.34,
    'AB',
    'ABCD',
    '20180301',
    '20180302 15:12',
    123,
    12.34,
    NEWID(),
    NULL
),
(
    250,
    0,
    54.21,
    'QWERT',
    'QWERTY',
    '20180302',
    '20180303 11:12',
    123,
    12.34,
    NEWID(),
    'NotNull'
);
GO

SELECT *
FROM dbo.ForJsonTest1;
GO

Here what the inserted data looks like in the table

JSON1

AUTO

We can now add the FOR JSON clause which has two different options. The simplest one is AUTO which works as follows…

SELECT *
FROM dbo.ForJsonTest1
FOR JSON AUTO
GO

This returns the JSON in a link as follows…

JSON2

… and when formatted it looks like this…

[
 {
 "IntColumn": 100,
 "BitColumn": true,
 "DecColumn": 12.34,
 "FixedCharColumn": "AB ",
 "VariableCharColumn": "ABCD",
 "DateColumn": "2018-03-01",
 "DateTimeColumn": "2018-03-02T15:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "7A53DDE9-C97F-4384-AB1B-D23BA5AB3ED5"
 },
 {
 "IntColumn": 250,
 "BitColumn": false,
 "DecColumn": 54.21,
 "FixedCharColumn": "QWERT",
 "VariableCharColumn": "QWERTY",
 "DateColumn": "2018-03-02",
 "DateTimeColumn": "2018-03-03T11:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "F36DBD9F-9B01-4C15-B562-BB0B9DF6A689",
 "NullableColumn": "NotNull"
 }
]

Certain SQL Server data types get converted to certain JSON data types and details of each data type can be found here.

We can add a root to the JSON similar to how the XML root works…

SELECT *
FROM dbo.ForJsonTest1
FOR JSON AUTO, ROOT('JSONTesting');
GO
{
 "JSONTesting": [
 {
 "IntColumn": 100,
 "BitColumn": true,
 "DecColumn": 12.34,
 "FixedCharColumn": "AB ",
 "VariableCharColumn": "ABCD",
 "DateColumn": "2018-03-01",
 "DateTimeColumn": "2018-03-02T15:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "7A53DDE9-C97F-4384-AB1B-D23BA5AB3ED5"
 },
 {
 "IntColumn": 250,
 "BitColumn": false,
 "DecColumn": 54.21,
 "FixedCharColumn": "QWERT",
 "VariableCharColumn": "QWERTY",
 "DateColumn": "2018-03-02",
 "DateTimeColumn": "2018-03-03T11:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "F36DBD9F-9B01-4C15-B562-BB0B9DF6A689",
 "NullableColumn": "NotNull"
 }
 ]
}

We can include null values as follows so the NullableColumn is always returned…

SELECT *
FROM dbo.ForJsonTest1
FOR JSON AUTO, INCLUDE_NULL_VALUES;
GO
[
 {
 "IntColumn": 100,
 "BitColumn": true,
 "DecColumn": 12.34,
 "FixedCharColumn": "AB ",
 "VariableCharColumn": "ABCD",
 "DateColumn": "2018-03-01",
 "DateTimeColumn": "2018-03-02T15:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "7A53DDE9-C97F-4384-AB1B-D23BA5AB3ED5",
 "NullableColumn": null
 },
 {
 "IntColumn": 250,
 "BitColumn": false,
 "DecColumn": 54.21,
 "FixedCharColumn": "QWERT",
 "VariableCharColumn": "QWERTY",
 "DateColumn": "2018-03-02",
 "DateTimeColumn": "2018-03-03T11:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "F36DBD9F-9B01-4C15-B562-BB0B9DF6A689",
 "NullableColumn": "NotNull"
 }
]

We can use the FOR JSON clause when joining tables together and we get nested JSON as follows…

DROP TABLE IF EXISTS dbo.Joining;
GO

CREATE TABLE dbo.Joining
(
    IntColumn INT NOT NULL,
    SomeText VARCHAR(100) NOT NULL
);
GO

INSERT INTO dbo.Joining (IntColumn, SomeText)
VALUES
(100, 'AAAAA'),
(100, 'BBBBB'),
(100, 'CCCCC');
GO
SELECT
 F.IntColumn,
 F.BitColumn,
 F.DecColumn,
 F.FixedCharColumn,
 F.VariableCharColumn,
 F.DateColumn,
 F.DateTimeColumn,
 F.BinaryColumn,
 F.MoneyColumn,
 F.UniqueIDColumn,
 F.NullableColumn,
 J.SomeText
FROM dbo.ForJsonTest1 F
LEFT JOIN dbo.Joining J ON J.IntColumn = F.IntColumn
FOR JSON AUTO;
GO

[
 {
 "IntColumn": 100,
 "BitColumn": true,
 "DecColumn": 12.34,
 "FixedCharColumn": "AB ",
 "VariableCharColumn": "ABCD",
 "DateColumn": "2018-03-01",
 "DateTimeColumn": "2018-03-02T15:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "7A53DDE9-C97F-4384-AB1B-D23BA5AB3ED5",
 "J": [
   {"SomeText": "AAAAA"},
   {"SomeText": "BBBBB"},
   {"SomeText": "CCCCC"}
 ]
 },
 {
 "IntColumn": 250,
 "BitColumn": false,
 "DecColumn": 54.21,
 "FixedCharColumn": "QWERT",
 "VariableCharColumn": "QWERTY",
 "DateColumn": "2018-03-02",
 "DateTimeColumn": "2018-03-03T11:12:00",
 "BinaryColumn": "AAAAew==",
 "MoneyColumn": 12.34,
 "UniqueIDColumn": "F36DBD9F-9B01-4C15-B562-BB0B9DF6A689",
 "NullableColumn": "NotNull",
 "J": [
   {}
 ]
 }
]

PATH

We get much more control using PATH instead of AUTO and can easily nest JSON using aliases separated with dots as follows.

SELECT
   IntColumn AS 'Numbers.IntColumn',
   BitColumn AS 'Numbers.BitColumn',
   DecColumn AS 'Numbers.DecColumn',
   MoneyColumn AS 'Numbers.MoneyColumn',
   FixedCharColumn AS 'Strings.FixedCharColumn',
   VariableCharColumn AS 'Strings.VariableCharColumn',
   NullableColumn AS 'Strings.NullableColumn',
   DateColumn AS 'Dates.DateColumn',
   DateTimeColumn AS 'Dates.DateTimeColumn',
   BinaryColumn AS 'Boolean.BinaryColumn',
   UniqueIDColumn AS 'Others.UniqueIDColumn'
FROM dbo.ForJsonTest1
FOR JSON PATH;
GO

[
 {
 "Numbers": {
   "IntColumn": 100,
   "BitColumn": true,
   "DecColumn": 12.34,
   "MoneyColumn": 12.34
 },
 "Strings": {
   "FixedCharColumn": "AB ",
   "VariableCharColumn": "ABCD"
 },
 "Dates": {
   "DateColumn": "2018-03-01",
   "DateTimeColumn": "2018-03-02T15:12:00"
 },
 "Boolean": {
   "BinaryColumn": "AAAAew=="
 },
 "Others": {
   "UniqueIDColumn": "7A53DDE9-C97F-4384-AB1B-D23BA5AB3ED5"
 }
 },
 {
 "Numbers": {
   "IntColumn": 250,
   "BitColumn": false,
   "DecColumn": 54.21,
   "MoneyColumn": 12.34
 },
 "Strings": {
   "FixedCharColumn": "QWERT",
   "VariableCharColumn": "QWERTY",
   "NullableColumn": "NotNull"
 },
 "Dates": {
   "DateColumn": "2018-03-02",
   "DateTimeColumn": "2018-03-03T11:12:00"
 },
 "Boolean": {
   "BinaryColumn": "AAAAew=="
 },
 "Others": {
   "UniqueIDColumn": "F36DBD9F-9B01-4C15-B562-BB0B9DF6A689"
 }
 }
]

Please note, you need to make sure the columns inside nested JSON are all next to each other in the SELECT statement. The following generates the following error because all the Numbers columns and Strings columns are not together…

SELECT
 IntColumn AS 'Numbers.IntColumn',
 BitColumn AS 'Numbers.BitColumn',
 DecColumn AS 'Numbers.DecColumn',
 FixedCharColumn AS 'Strings.FixedCharColumn',
 VariableCharColumn AS 'Strings.VariableCharColumn',
 DateColumn AS 'Dates.DateColumn',
 DateTimeColumn AS 'Dates.DateTimeColumn',
 BinaryColumn AS 'Boolean.BinaryColumn',
 MoneyColumn AS 'Numbers.MoneyColumn',
 UniqueIDColumn AS 'Others.UniqueIDColumn',
 NullableColumn AS 'Strings.NullableColumn'
FROM dbo.ForJsonTest1
FOR JSON PATH;
GO

Msg 13601, Level 16, State 1, Line 77
Property 'Numbers.MoneyColumn' cannot be generated in JSON output due to a conflict with another column name or alias. Use different names and aliases for each column in SELECT list.

Dynamic Data Masking

As part of my prep for exam 70-761 (Querying Data with Transact-SQL) I’m writing about a few of the new features available in SQL Server 2016 that weren’t included in the 70-461 exam I took a few years ago based on SQL Server 2012.

Dynamic Data Masking is one of these new festures and can be used to restrict certain users from viewing sensitive data. With the General Data Protection Regulation (GDPR) coming into force on 25th May 2018 this may be something that companies can use to ensure that people who don’t need it cannot see certain personally identifiable information (PII). The data is not actually stored any differently, it is just masked upon read.

There are four functions available to mask the data: default, random, partial and email. I’ll go into more details on each of these below but first I’m going to quickly demonstrate the default function so I can then talk about permissions.

I’m going to connect to my SimonBlog database which is SQL Server 2016 compatibility level and create a new table that contains PII.

USE SimonBlog
GO

CREATE SCHEMA DDM;
GO

CREATE TABLE DDM.MaskingDemo1
(
 Id INT NOT NULL IDENTITY(1,1),
 FirstName VARCHAR(50) NOT NULL,
 MiddleName VARCHAR(50) NULL,
 LastName VARCHAR(50) NOT NULL,
 Email VARCHAR(500) NULL,
 Address1 VARCHAR(100) NULL,
 Address2 VARCHAR(100) NULL,
 PostCode VARCHAR(10) NULL,
 DateOfBirth DATE NOT NULL,
 Salary INT NULL
);
GO

INSERT INTO DDM.MaskingDemo1
VALUES
('Anna', NULL, 'Testing', 'anna.learning@data.co.uk', '1 My Road', 'My Town', 'QQ1 4ZZ', '1985-07-04', 20000),
('Matilda', 'Mary', 'Test', 'matilda.mary@test.com', '100 The Street', 'Big City', 'AQ123ER', '1965-12-01', 50000);
GO

I can now view this data as follows

SELECT *
FROM DDM.MaskingDemo1

DDM1

I’m now going to add the default masking function on the FirstName column as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN FirstName 
  ADD MASKED WITH (FUNCTION='default()');

It’s as simple as that and now I can talk about how the permissions work with DDM.

Permissions

The masking is solely based on permissions and I’m sysadmin on my SimonBlog database so the masking is never going to be displayed for me as sysadmin contains the unmask permission on all databases. If I select all data from the table above I don’t see any masking. So to demonstrate how themasking works I’m going to create a new SQL user as follows:

USE master
GO

CREATE LOGIN DDMTestUser WITH PASSWORD = 'DDMTest1'
GO

USE SimonBlog
GO

CREATE USER DDMTestUser FROM LOGIN DDMTestUser;
GO

GRANT CONNECT ON DATABASE::SimonBlog TO DDMTestUser;
GO

GRANT SELECT ON OBJECT::DDM.MaskingDemo1 TO DDMTestUser;
GO

If I now run the following…

EXECUTE AS USER = 'DDMTestUser';
SELECT *
FROM DDM.MaskingDemo1;
REVERT;

… I get the following results

DDM2

You can see that the default mask on the FirstName column has replaced all text with 4 x characters regardless of the length of the original text value. The 7 character Matilda has been replaced by xxxx.

If it’s decided that the user DDMTestUser needs to be able to see the PII we can remove the mask as follows…

GRANT UNMASK TO DDMTestUser;
GO

If I now run the following I can see all the data…

EXECUTE AS USER = 'DDMTestUser';
SELECT *
FROM DDM.MaskingDemo1;
REVERT;

DDM1

As you can see the unmask permission is a database wide permission and it’s all or nothing, i.e. all masking is applied on all tables and columns within a database when reading data unless you have this unmask permission which means everything is available.

To turn it back off again I can run the following…

REVOKE UNMASK TO DDMTestUser;
GO

Now lets give DDMTestUser extra permissions to create a new table. We’ll create the table with one column using the default mask and then insert some data.

GRANT CREATE TABLE ON DATABASE::SimonBlog TO DDMTestUser;
GO

GRANT ALTER, SELECT, INSERT, UPDATE, DELETE ON SCHEMA::DDM TO DDMTestUser;
GO

EXECUTE AS USER = 'DDMTestUser';
CREATE TABLE DDM.MaskingDemo2
(
  SomeText VARCHAR(100) MASKED WITH (FUNCTION='default()'),
  SomeMoreText VARCHAR(100)
);
GO

INSERT INTO DDM.MaskingDemo2 (SomeText, SomeMoreText)
VALUES ('This is a fairly long piece of text', 'And here is a short one');

SELECT *
FROM DDM.MaskingDemo2;
GO

REVERT;

DDM3

So even though DDMTestUser inserted the unmasked data, when trying to read it they only see the mask because they do not have the unmask permission.

There is a second column in this table that is unmasked so lets now try and mask this one

EXECUTE AS USER = 'DDMTestUser';
ALTER TABLE DDM.MaskingDemo2
  ALTER COLUMN SomeMoreText 
  ADD MASKED WITH (FUNCTION='default()');
REVERT;

Msg 15247, Level 16, State 5, Line 51
User does not have permission to perform this action.

Any user can create a new table with masking enabled or add a new masked column to an existing table but they have to have the ALTER ANY MASK permissions on the database which can be set as follows…

GRANT ALTER ANY MASK ON DATABASE::SimonBlog TO DDMTestUser;
GO

DDMTestUser can now add the mask to the SomeMoreText column

EXECUTE AS USER = 'DDMTestUser';
ALTER TABLE DDM.MaskingDemo2
  ALTER COLUMN SomeMoreText 
  ADD MASKED WITH (FUNCTION='default()');
GO
SELECT *
FROM DDM.MaskingDemo2;
GO
REVERT;

DDM4

As we saw above a user can add data to a masked column but will not be able to see the data when reading without the unmask permission. The same applies for updating existing data. If I run the following and then try to read the data I still see the mask.

EXECUTE AS USER = 'DDMTestUser';
UPDATE DDM.MaskingDemo2
SET SomeText = 'this is some even longer text than was here before';
GO
SELECT *
FROM DDM.MaskingDemo2;
REVERT;
GO

DDM4

That’s a lot of the permissions covered so now let’s look at the differences between the 4 different masking functions.

Masking Functions

Default

We’ve covered a few examples of the default mask above where the text data in the FirstName column is masked with xxxx but what about where the value is less than 4 characters. If I add some nonsense data with character length of 1, 2 and 3 we can see how this looks masked.

INSERT INTO DDM.MaskingDemo1 (FirstName, LastName, DateOfBirth)
VALUES ('a', 'x', '19900502'), ('ab', 'xy', '19800603'), ('abc', 'xyz', '19701208')
GO

The documentation says that the mask can be fewer than four Xs if the text is less than 4 characters but this doesn’t seem to be the case

Unmasked…

DDM5

Masked…

DDM6

It looks like text data types are all masked with xxxx so let’s now add the default mask to the DateOfBirth and Salary columns

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN DateOfBirth 
 ADD MASKED WITH (FUNCTION='default()');

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN Salary 
 ADD MASKED WITH (FUNCTION='default()');

If we then view the data we see the following…

DDM7

The date type columns are displayed as 1990-01-01 and numeric columns are displayed as 0. I’ve not included them here but binary and bit columns are also displayed as 0. The salary column is the first one we’ve seen with NULL values and we can see that these values are unaffected by the mask.

Random

The random function is only available on numeric data types and you specify the start and end range for the random number to be between. Let’s add it to the Salary column with a range of 1 to 100.

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Salary 
  ADD MASKED WITH (FUNCTION='random(1,100)');

The Salary column already had the default mask applied to it and the statement above has replaced that with the random mask. I didn’t have to remove the default mask before adding the random mask.

If I query the table twice we can see that different random values are shown in the Salary column

DDM8

DDM9

Partial

The partial function is for text data types and gives you the most control over the mask. With this function you specify how many of the leading and trailing characters you want to display and what mask you want to show in between. For example, you could say I want to see the first 2 characters, then 4 asterisk characters, then the last 3 characters. Let’s try this on the Address1 column in the DDM.MaskingDemo1 table

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN Address1 
 ADD MASKED WITH (FUNCTION='partial(2,"****",3)');

If we then query the table we see the following

DDM10

If we look at the second row with the Address1 value of 100 The Street we can see it’s taken the first 2 characters 10 and the last 3 characters eet and stuffed **** in between to give 10****eet.

Let’s see how a similar mask would work on shorter data by adding the following mask to the LastName column.

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN LastName 
 ADD MASKED WITH (FUNCTION='partial(1,"%",1)');

At most this will return 3 characters: the first and last with % in between.

DDM11

The first two LastName values are 7 and 4 characters long and these have been masked as expected. The last was xyz which is 3 characters and has just resulted in the y being replaced by the %. The third and fourth LastName values are 1 and 2 characters respectively and if the character length is less than the number of characters in the mask (in this case 3) then the leading and trailing values are not included and the mask if the only thing returned.

As mentioned above the partial function is only available on character data types so let’s see what happens when we try to add it to the DateOfBirth column

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN DateOfBirth 
  ADD MASKED WITH (FUNCTION='partial(1,"%",1)');

Msg 16003, Level 16, State 0, Line 65
The data type of column 'DateOfBirth' does not support data masking function 'partial'.

Email

The email mask is also only available on character data types and will display the first character followed by XXX@XXXX.com

So if I add this to the Email column we see the following

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Email 
  ADD MASKED WITH (FUNCTION='email()');

DDM12

We can actually add the email mask to any character data type so let’s add it to the Address2 column.

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Address2 
  ADD MASKED WITH (FUNCTION='email()');

DDM13

As you can see the data doesn’t actually have to be an email and the email mask is basically just a specific version of the partial mask. We can replicate it as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Email 
  ADD MASKED WITH (FUNCTION='partial(1,"XXX@XXXX.com",0)');

DDM14

We can also replicate the default mask on character data types using the partial mask as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN FirstName 
  ADD MASKED WITH (FUNCTION='partial(0,"xxxx",0)');

Removing a mask

Removing the mask from a column can be done as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Address2 DROP MASKED;

Now the Address2 column is no longer masked

DDM16

SELECT INTO and Computed Columns

If we add a computed column based on a masked column then that computed value is also masked. I’ve created 5 different computed columns based on the FirstName, MiddleName and LastName columns. It seems that if there is any concatenation applied to the computed column then the default mask is always used. So by just adding abc to the LastName column which has the partial mask we see the data returned as xxxx. This is shown in the FullName4 column.

ALTER TABLE DDM.MaskingDemo1
  ADD FullName1 AS (FirstName + ' ' + ISNULL(MiddleName, '') + ' ' + LastName),
  FullName2 AS (ISNULL(MiddleName, '') + ' ' + LastName + ' ' + FirstName),
  FullName3 AS (LastName + ' ' + FirstName + ' ' + ISNULL(MiddleName, '')),
  FullName4 AS (LastName + 'abc'),
  FullName5 AS (LastName)

DDM17

If we view the data with an account with the unmask permission we can see the computed columns as expected

DDM18

A similar thing happens when loading data from one table to another. If the source table is masked and the user loading the data into the destination table does not have the unmask permission then the masked data is loaded into the destination and can never be unmasked. We can demo this with the SELECT INTO statement. If I run this as a user without unmask

EXECUTE AS USER = 'DDMTestUser';
SELECT *
INTO #masking
FROM DDM.MaskingDemo1;
REVERT;

… then select from #masking with a user with unmask permission we can see the following

SELECT *
FROM DDM.MaskingDemo1;

SELECT *
FROM #masking;

DDM19

Finally

To find out what columns in a database are masked you can run the following query using the sys.masked_columns view

SELECT
  SCHEMA_NAME(o.schema_id),
  OBJECT_NAME(o.object_id),
  mc.name,
  mc.masking_function
FROM sys.masked_columns mc
JOIN sys.objects o ON o.object_id = mc.object_id
ORDER BY
  SCHEMA_NAME(o.schema_id),
  OBJECT_NAME(o.object_id),
  mc.column_id;

DDM15

JSON Support in SQL Server 2016

Just seen a blog from Aaron Bertrand about JSON support in SQL SERVER 2016. In my last job we wanted to export some data stored in SQL SERVER into Elasticsearch and converting large amounts of data into JSON documents was quite tricky to get working at first. There are lots of other database applications that store their data as JSON, for example, MongoDB and this looks like it will be helpful loading data from SQL SERVER into these if needed. Microsoft have got their own rival for the likes of MongoDB named DocumentDB and this stores its documents in JSON so I guess this JSON support is something they needed to add.

I’ve added some addtional posts covering the JSON built in support in SQL Server 2016

JSON – Part 1: FOR JSON