Dynamic Data Masking

As part of my prep for exam 70-761 (Querying Data with Transact-SQL) I’m writing about a few of the new features available in SQL Server 2016 that weren’t included in the 70-461 exam I took a few years ago based on SQL Server 2012.

Dynamic Data Masking is one of these new festures and can be used to restrict certain users from viewing sensitive data. With the General Data Protection Regulation (GDPR) coming into force on 25th May 2018 this may be something that companies can use to ensure that people who don’t need it cannot see certain personally identifiable information (PII). The data is not actually stored any differently, it is just masked upon read.

There are four functions available to mask the data: default, random, partial and email. I’ll go into more details on each of these below but first I’m going to quickly demonstrate the default function so I can then talk about permissions.

I’m going to connect to my SimonBlog database which is SQL Server 2016 compatibility level and create a new table that contains PII.

USE SimonBlog
GO

CREATE SCHEMA DDM;
GO

CREATE TABLE DDM.MaskingDemo1
(
 Id INT NOT NULL IDENTITY(1,1),
 FirstName VARCHAR(50) NOT NULL,
 MiddleName VARCHAR(50) NULL,
 LastName VARCHAR(50) NOT NULL,
 Email VARCHAR(500) NULL,
 Address1 VARCHAR(100) NULL,
 Address2 VARCHAR(100) NULL,
 PostCode VARCHAR(10) NULL,
 DateOfBirth DATE NOT NULL,
 Salary INT NULL
);
GO

INSERT INTO DDM.MaskingDemo1
VALUES
('Anna', NULL, 'Testing', 'anna.learning@data.co.uk', '1 My Road', 'My Town', 'QQ1 4ZZ', '1985-07-04', 20000),
('Matilda', 'Mary', 'Test', 'matilda.mary@test.com', '100 The Street', 'Big City', 'AQ123ER', '1965-12-01', 50000);
GO

I can now view this data as follows

SELECT *
FROM DDM.MaskingDemo1

DDM1

I’m now going to add the default masking function on the FirstName column as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN FirstName 
  ADD MASKED WITH (FUNCTION='default()');

It’s as simple as that and now I can talk about how the permissions work with DDM.

Permissions

The masking is solely based on permissions and I’m sysadmin on my SimonBlog database so the masking is never going to be displayed for me as sysadmin contains the unmask permission on all databases. If I select all data from the table above I don’t see any masking. So to demonstrate how themasking works I’m going to create a new SQL user as follows:

USE master
GO

CREATE LOGIN DDMTestUser WITH PASSWORD = 'DDMTest1'
GO

USE SimonBlog
GO

CREATE USER DDMTestUser FROM LOGIN DDMTestUser;
GO

GRANT CONNECT ON DATABASE::SimonBlog TO DDMTestUser;
GO

GRANT SELECT ON OBJECT::DDM.MaskingDemo1 TO DDMTestUser;
GO

If I now run the following…

EXECUTE AS USER = 'DDMTestUser';
SELECT *
FROM DDM.MaskingDemo1;
REVERT;

… I get the following results

DDM2

You can see that the default mask on the FirstName column has replaced all text with 4 x characters regardless of the length of the original text value. The 7 character Matilda has been replaced by xxxx.

If it’s decided that the user DDMTestUser needs to be able to see the PII we can remove the mask as follows…

GRANT UNMASK TO DDMTestUser;
GO

If I now run the following I can see all the data…

EXECUTE AS USER = 'DDMTestUser';
SELECT *
FROM DDM.MaskingDemo1;
REVERT;

DDM1

As you can see the unmask permission is a database wide permission and it’s all or nothing, i.e. all masking is applied on all tables and columns within a database when reading data unless you have this unmask permission which means everything is available.

To turn it back off again I can run the following…

REVOKE UNMASK TO DDMTestUser;
GO

Now lets give DDMTestUser extra permissions to create a new table. We’ll create the table with one column using the default mask and then insert some data.

GRANT CREATE TABLE ON DATABASE::SimonBlog TO DDMTestUser;
GO

GRANT ALTER, SELECT, INSERT, UPDATE, DELETE ON SCHEMA::DDM TO DDMTestUser;
GO

EXECUTE AS USER = 'DDMTestUser';
CREATE TABLE DDM.MaskingDemo2
(
  SomeText VARCHAR(100) MASKED WITH (FUNCTION='default()'),
  SomeMoreText VARCHAR(100)
);
GO

INSERT INTO DDM.MaskingDemo2 (SomeText, SomeMoreText)
VALUES ('This is a fairly long piece of text', 'And here is a short one');

SELECT *
FROM DDM.MaskingDemo2;
GO

REVERT;

DDM3

So even though DDMTestUser inserted the unmasked data, when trying to read it they only see the mask because they do not have the unmask permission.

There is a second column in this table that is unmasked so lets now try and mask this one

EXECUTE AS USER = 'DDMTestUser';
ALTER TABLE DDM.MaskingDemo2
  ALTER COLUMN SomeMoreText 
  ADD MASKED WITH (FUNCTION='default()');
REVERT;

Msg 15247, Level 16, State 5, Line 51
User does not have permission to perform this action.

Any user can create a new table with masking enabled or add a new masked column to an existing table but they have to have the ALTER ANY MASK permissions on the database which can be set as follows…

GRANT ALTER ANY MASK ON DATABASE::SimonBlog TO DDMTestUser;
GO

DDMTestUser can now add the mask to the SomeMoreText column

EXECUTE AS USER = 'DDMTestUser';
ALTER TABLE DDM.MaskingDemo2
  ALTER COLUMN SomeMoreText 
  ADD MASKED WITH (FUNCTION='default()');
GO
SELECT *
FROM DDM.MaskingDemo2;
GO
REVERT;

DDM4

As we saw above a user can add data to a masked column but will not be able to see the data when reading without the unmask permission. The same applies for updating existing data. If I run the following and then try to read the data I still see the mask.

EXECUTE AS USER = 'DDMTestUser';
UPDATE DDM.MaskingDemo2
SET SomeText = 'this is some even longer text than was here before';
GO
SELECT *
FROM DDM.MaskingDemo2;
REVERT;
GO

DDM4

That’s a lot of the permissions covered so now let’s look at the differences between the 4 different masking functions.

Masking Functions

Default

We’ve covered a few examples of the default mask above where the text data in the FirstName column is masked with xxxx but what about where the value is less than 4 characters. If I add some nonsense data with character length of 1, 2 and 3 we can see how this looks masked.

INSERT INTO DDM.MaskingDemo1 (FirstName, LastName, DateOfBirth)
VALUES ('a', 'x', '19900502'), ('ab', 'xy', '19800603'), ('abc', 'xyz', '19701208')
GO

The documentation says that the mask can be fewer than four Xs if the text is less than 4 characters but this doesn’t seem to be the case

Unmasked…

DDM5

Masked…

DDM6

It looks like text data types are all masked with xxxx so let’s now add the default mask to the DateOfBirth and Salary columns

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN DateOfBirth 
 ADD MASKED WITH (FUNCTION='default()');

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN Salary 
 ADD MASKED WITH (FUNCTION='default()');

If we then view the data we see the following…

DDM7

The date type columns are displayed as 1990-01-01 and numeric columns are displayed as 0. I’ve not included them here but binary and bit columns are also displayed as 0. The salary column is the first one we’ve seen with NULL values and we can see that these values are unaffected by the mask.

Random

The random function is only available on numeric data types and you specify the start and end range for the random number to be between. Let’s add it to the Salary column with a range of 1 to 100.

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Salary 
  ADD MASKED WITH (FUNCTION='random(1,100)');

The Salary column already had the default mask applied to it and the statement above has replaced that with the random mask. I didn’t have to remove the default mask before adding the random mask.

If I query the table twice we can see that different random values are shown in the Salary column

DDM8

DDM9

Partial

The partial function is for text data types and gives you the most control over the mask. With this function you specify how many of the leading and trailing characters you want to display and what mask you want to show in between. For example, you could say I want to see the first 2 characters, then 4 asterisk characters, then the last 3 characters. Let’s try this on the Address1 column in the DDM.MaskingDemo1 table

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN Address1 
 ADD MASKED WITH (FUNCTION='partial(2,"****",3)');

If we then query the table we see the following

DDM10

If we look at the second row with the Address1 value of 100 The Street we can see it’s taken the first 2 characters 10 and the last 3 characters eet and stuffed **** in between to give 10****eet.

Let’s see how a similar mask would work on shorter data by adding the following mask to the LastName column.

ALTER TABLE DDM.MaskingDemo1
 ALTER COLUMN LastName 
 ADD MASKED WITH (FUNCTION='partial(1,"%",1)');

At most this will return 3 characters: the first and last with % in between.

DDM11

The first two LastName values are 7 and 4 characters long and these have been masked as expected. The last was xyz which is 3 characters and has just resulted in the y being replaced by the %. The third and fourth LastName values are 1 and 2 characters respectively and if the character length is less than the number of characters in the mask (in this case 3) then the leading and trailing values are not included and the mask if the only thing returned.

As mentioned above the partial function is only available on character data types so let’s see what happens when we try to add it to the DateOfBirth column

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN DateOfBirth 
  ADD MASKED WITH (FUNCTION='partial(1,"%",1)');

Msg 16003, Level 16, State 0, Line 65
The data type of column 'DateOfBirth' does not support data masking function 'partial'.

Email

The email mask is also only available on character data types and will display the first character followed by XXX@XXXX.com

So if I add this to the Email column we see the following

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Email 
  ADD MASKED WITH (FUNCTION='email()');

DDM12

We can actually add the email mask to any character data type so let’s add it to the Address2 column.

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Address2 
  ADD MASKED WITH (FUNCTION='email()');

DDM13

As you can see the data doesn’t actually have to be an email and the email mask is basically just a specific version of the partial mask. We can replicate it as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Email 
  ADD MASKED WITH (FUNCTION='partial(1,"XXX@XXXX.com",0)');

DDM14

We can also replicate the default mask on character data types using the partial mask as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN FirstName 
  ADD MASKED WITH (FUNCTION='partial(0,"xxxx",0)');

Removing a mask

Removing the mask from a column can be done as follows

ALTER TABLE DDM.MaskingDemo1
  ALTER COLUMN Address2 DROP MASKED;

Now the Address2 column is no longer masked

DDM16

SELECT INTO and Computed Columns

If we add a computed column based on a masked column then that computed value is also masked. I’ve created 5 different computed columns based on the FirstName, MiddleName and LastName columns. It seems that if there is any concatenation applied to the computed column then the default mask is always used. So by just adding abc to the LastName column which has the partial mask we see the data returned as xxxx. This is shown in the FullName4 column.

ALTER TABLE DDM.MaskingDemo1
  ADD FullName1 AS (FirstName + ' ' + ISNULL(MiddleName, '') + ' ' + LastName),
  FullName2 AS (ISNULL(MiddleName, '') + ' ' + LastName + ' ' + FirstName),
  FullName3 AS (LastName + ' ' + FirstName + ' ' + ISNULL(MiddleName, '')),
  FullName4 AS (LastName + 'abc'),
  FullName5 AS (LastName)

DDM17

If we view the data with an account with the unmask permission we can see the computed columns as expected

DDM18

A similar thing happens when loading data from one table to another. If the source table is masked and the user loading the data into the destination table does not have the unmask permission then the masked data is loaded into the destination and can never be unmasked. We can demo this with the SELECT INTO statement. If I run this as a user without unmask

EXECUTE AS USER = 'DDMTestUser';
SELECT *
INTO #masking
FROM DDM.MaskingDemo1;
REVERT;

… then select from #masking with a user with unmask permission we can see the following

SELECT *
FROM DDM.MaskingDemo1;

SELECT *
FROM #masking;

DDM19

Finally

To find out what columns in a database are masked you can run the following query using the sys.masked_columns view

SELECT
  SCHEMA_NAME(o.schema_id),
  OBJECT_NAME(o.object_id),
  mc.name,
  mc.masking_function
FROM sys.masked_columns mc
JOIN sys.objects o ON o.object_id = mc.object_id
ORDER BY
  SCHEMA_NAME(o.schema_id),
  OBJECT_NAME(o.object_id),
  mc.column_id;

DDM15

Advertisements

Paging in T-SQL

As part of my prep for the 70-461 exam I learnt how to use the new paging functionality that was made available in SQL Server 2012.  This is using the OFFSET FETCH statement and I’ve never used them since and have completely forgotten how they work.  So I’m going write out a demonstration using them to refresh my memory for my prep for exam 70-761.  Please note, this should not be confused with the FETCH statement available when using cursors although the concept is the same.

It turns out there is not much too this so here’s how it works:

You have to use the ORDER BY clause and then just follow this by OFFSET and specify the number of rows to skip.  Finally you can optionally use FETCH to return only a certain number of rows.  By not using FETCH you will return the remaining rows after the OFFSET.

For a demo I’ll populate a table with numbers 1 to 50 as follows:

CREATE TABLE dbo.OffestFetchDemo (Id INT);

WITH cte
AS ( 
 SELECT 1 AS n
 UNION ALL
 SELECT n + 1
 FROM cte
 WHERE n < 50
)
INSERT INTO dbo.OffestFetchDemo (Id)
SELECT n
FROM cte;

Next I can run the following to return 10 rows only starting from 6

SELECT *
FROM dbo.OffestFetchDemo
ORDER BY Id
OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;

OF1

You can not skip rows by using an offset of zero.  This will return the first 5 rows.

SELECT *
FROM dbo.OffestFetchDemo
ORDER BY Id
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;

OF2

As mentioned you can omit the FETCH and just return rows 46 onward as follows.

SELECT *
FROM dbo.OffestFetchDemo
ORDER BY Id
OFFSET 45 ROWS;

OF3

You can combine the two and just return everything with the following (not that you would ever want to do this)

SELECT *
FROM dbo.OffestFetchDemo
ORDER BY Id
OFFSET 0 ROWS;

The OFFSET FETCH statement is a bit weird as you can swap ROWS with ROW and NEXT with FIRST so

SELECT *
FROM dbo.OffestFetchDemo
ORDER BY Id
OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;

is the same as

SELECT *
FROM dbo.OffestFetchDemo
ORDER BY Id
OFFSET 5 ROW FETCH FIRST 10 ROW ONLY;