Forcing default values on a nullable columns

When we add a new column to an existing table that contains data we need to think about if we need to populate the data in this column for the existing rows.

If the column should not allow NULL then we need to supply a default for the new column (or set it to allow NULL, backfill it, then change to not allow NULL).

We can add a default constraint to a column that allows NULL but the default is not applied unless we specifically say to set it

Here’s a demonstration of working with defaults on nullable columns.

First let’s create a table as follows…

DROP TABLE IF EXISTS dbo.WithValuesTest;
GO

CREATE TABLE dbo.WithValuesTest
(
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    FullName VARCHAR(100) NOT NULL
);
GO

INSERT INTO dbo.WithValuesTest
(
    FullName
)
VALUES
    ('Leslie Tambourine'),
    ('Linda Pencilcase'),
    ('Sally Carpets'),
    ('Andy Anchovie'),
    ('Ronnie Clotheshorse');
GO

SELECT *
FROM dbo.WithValuesTest

This gives us the following data…
DefaultNULL01

If we try to add a new not nullable column to this table this errors because there is no data to add into the new column

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL;
Msg 4901, Level 16, State 1, Line 86

ALTER TABLE only allows columns to be added that…

  1. can contain nulls, or
  2. have a DEFAULT definition specified, or
  3. the column being added is an identity or timestamp column, or
  4. the table must be empty

The column DateOfBirth cannot be added to the non-empty table WithValuesTest because it does not satisfy these conditions.

However, it we specify a default it will work…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NOT NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest

DefaultNULL02

To continue let’s drop this new column

ALTER TABLE dbo.WithValuesTest DROP CONSTRAINT IF EXISTS DF_DOB;
ALTER TABLE dbo.WithValuesTest DROP COLUMN IF EXISTS DateOfBirth;

Now let’s try to create the same column with the same default but now allow the column to be nullable

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL CONSTRAINT DF_DOB DEFAULT '19000101';

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL03

As you can see, unlike when the column didn’t allow NULL, the default value has not been used.

Let’s drop the column again and this time create it with the WITH VALUES clause…

ALTER TABLE dbo.WithValuesTest ADD DateOfBirth DATETIME2(0) NULL DEFAULT '19000101' WITH VALUES;

SELECT *
FROM dbo.WithValuesTest;

DefaultNULL04

To complete the demo let’s now try and add some new rows.

If we don’t include the DateOfBirth column in the insert list then the default is used…

INSERT INTO dbo.WithValuesTest (FullName)
VALUES ('Bernard Seesaw');

DefaultNULL05

Suppose our code needs to insert the date of birth if it’s supplied but should use the default value if it’s not supplied, i.e. if it’s NULL.

If we just try a straight insert using the NULL value then NULL is what is put into the table.

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Lizzie Onion', '19830402'),
('Terence Artichoke', NULL);

DefaultNULL06

We can specify DEFAULT instead of NULL and it will use the default value on the insert

INSERT INTO dbo.WithValuesTest (FullName, DateOfBirth)
VALUES ('Mavis Breadbin', DEFAULT);

DefaultNULL07

The DEFAULT value can be used in an UPDATE statement as well…

UPDATE dbo.WithValuesTest
SET DateOfBirth = DEFAULT
WHERE FullName = 'Terence Artichoke'

DefaultNULL08

 

Advertisements

SSDT: Target Platform and Compatibility Level

I have just seen an issue in one of my database release scripts where I’m getting a load of statements like the following….

PRINT N'Altering [dbo].[TestQueue]...';

GO

PRINT N'No script could be generated for operation 2 on object ''[dbo].[TestQueue]''.';

GO

This is because of the “Target Platform” setting on the properties of the project.  For this database it was set to 2008 as follows:

This needs to be set to the version of the SQL Server that the database sits on in live.  In this case it was SQL Server 2014.

When I changed this it automatically updated the compatibility level of the database to 2014.  However, although sitting on a SQL Server 2014 instance this database is still on 2008 R2 compatibility level so I had to click on “Database Settings” then click on the “Miscellaneous” tab and change it back to 2008 as follows.

Then when I generated the scripts again all the service broker queue alter statements were gone.

In summary, the “Target Platform” needs to be the version of the SQL Server Instance and the “Compatibility level” is the compatibility level of the database itself.

Return Max (or Min) Value Across Different Columns in Result Set

Here’s a nice way to return the minimum or maximum value across multiple columns.  Obviously, it’s easy to work out the min or max value for all rows in a single column but it’s a bit trickier to find the min or max value across multiple columns for all rows.

You can use the VALUES clause in a sub-query to do this.  For example, if you have an account table with different dates for actions like first game play, first purchase etc, you can pull out the maximum date of all these account actions as follows:

CREATE TABLE dbo.AccountActions
(
    AccountId INT NOT NULL IDENTITY(1,1),
    FullName VARCHAR(100) NOT NULL,
    AccountOpenedDate DATETIME2(0) NOT NULL,
    FirstPurchasedDate DATETIME2(0) NULL,
    FirstGamePlayDate DATETIME2(0) NULL,
    FirstMultiPlayerDate DATETIME2(0) NULL
);
GO

INSERT INTO dbo.AccountActions (FullName, AccountOpenedDate, FirstGamePlayDate, FirstMultiPlayerDate, FirstPurchasedDate)
VALUES 
('Adam Bennett', '20180301 17:06', '20180301 17:08', NULL, NULL),
('Charlie Dawson', '20180301 20:54', '20180301 21:32', NULL, '20180301 21:17'),
('Eric Franks', '20180301 09:41', '20180302 06:38', '20180303 20:12', NULL),
('Gina Harris', '20180303 23:19', '20180303 23:21', '20180304 06:52', '20180303 23:20');
GO

SELECT
    AccountId,
    FullName,
    (
        SELECT MAX(AllDates.TheDate)
        FROM (
            VALUES (AccountOpenedDate), (FirstGamePlayDate), (FirstMultiPlayerDate), (FirstPurchasedDate)
        ) AS AllDates (TheDate)
    ) AS MaxDate
FROM dbo.AccountActions;
GO

Using $PARTITION to find the last time an event occured

In a previous post I showed how you can use the $PARTITION system function to force your query to only look at one or more partitions and today I have a semi-practical example of how to use this.

We have a very big table that holds all our financial transactions along with an integer indicating what type of transaction it is.  I wanted to find out when the last time we recorded a certain transaction type as I think it might have been removed quite some time ago.

This could obviously been done as follows:

SELECT MAX(T.MyDate)
FROM dbo.MyTable T (NOLOCK)
WHERE T.TransTypeId = X

and depending on how your table is structured and indexes this may be the best way to do it.

Our table is partitioned by month on our date column so I can use $PARTITION to search each partition, starting with the most recent, and search for the last date TransTypeID = X was used.  When I find a value we can stop.  This can be done as follows:

DECLARE @Counter INT = 200 -- this should be the maximum partition ID on our table
DECLARE @MaxDate DATETIME2(0);

WHILE @Counter > 0
BEGIN

  RAISERROR('%d', 0,1,@Counter) WITH NOWAIT;

  SELECT @MaxDate = MAX(T.MyDate)
  FROM dbo.MyTable T
  WHERE T.TransTypeId = X
  AND $partition.PF_MyTable(MyDate) = @Counter

  IF @MaxDate IS NOT NULL
  BEGIN
    SELECT @MaxDate, @Counter
    BREAK;
  END

  SET @Counter -= 1;

END;

If TransTypeID X has occurred recently then this code won’t need to loop through many partitions but if it last occurred in partition 1 then this is probably going to be much slower.  It all depends on how your table is set up but I think it’s always nice to have multiple ways to potentially speed up a query.

Parameters in Queries

We have some functionality within our application that generates some search strings for an entity before they are compiled into a JSON document and sent up to Elasticsearch.  We have one row in a table per entity and there is a StatusId column used to track the progress of the generation of the search strings.  There are a few different stored procedures that change the values in the StatusId column and recently we’ve been using a standard piece of code in each of these stored procedures that declares some variables with descriptive names to hold the different StatusIds.  These variables then get used within the stored procedure instead of hard coded values.

So instead of something like

UPDATE dbo.Test
SET StatusId = 2
WHERE StatusId = 1;

we use

DECLARE @Queue INT = 1;
DECLARE @Pickup INT = 2;
DECLARE @Process INT = 3;

UPDATE dbo.Test
SET StatusId = @Pickup
WHERE StatusId = @Queue;

I’ve just been looking into whether SQL Server behaves differently when executing these two queries and have found that it does.  There are potential performance issues where the optimiser gets the estimated number of rows very wrong when using the query with the parameters.

To demonstrate this lets create a table so follows:

CREATE TABLE dbo.Test
(
    Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    StatusId INT NOT NULL,
    SomeText VARCHAR(100) NOT NULL
);
GO

Now we can populate this with 100,00 rows as follows:

INSERT INTO dbo.Test (StatusId, SomeText)
VALUES (1, 'Some test data');
GO 100000

So we now have 100,000 rows where the StatusId = 1.  Let’s update 100 rows to have a StatusId = 2.

UPDATE TOP (100) dbo.Test
SET StatusId = 2
WHERE StatusId = 1;

Let’s update the statistics on the table just to make sure we’re all up to date.

UPDATE STATISTICS dbo.Test;

Now we can run the two different types of query that yield the same results.

SELECT *
FROM dbo.Test
WHERE StatusId = 1;
GO
DECLARE @StatusId INT = 1;
SELECT *
FROM dbo.Test
WHERE StatusId = @StatusId;
GO

If we look at the execution plan we can see the first query without the parameter has managed to estimate the number of rows correctly.

ParametersInQueries01

The query with the parameters has estimated the number of rows as 50,000.

ParametersInQueries02

SQL Server has worked out that there are two possible values in the StatusId column and it’s parametrised the query based on either of those values.  If we update another 100 rows to have a third StatusId will we see the estimated number of rows changes.

UPDATE TOP (100) dbo.Test
SET StatusId = 3
WHERE StatusId = 1;
GO
UPDATE STATISTICS dbo.Test;
GO
SELECT *
FROM dbo.Test
WHERE StatusId = 1;
GO
DECLARE @StatusId INT = 1;
SELECT *
FROM dbo.Test
WHERE StatusId = @StatusId;
GO

ParametersInQueries03

As this is a simple SELECT from one table SQL Server has parametrised the query which can be seen in the execution plan.  Notice the SELECT statement near the top has StatusId = @1.

ParametersInQueries04

This means that SQL Server will use the same execution plan if the query remains the same but with a different value for the StatusId.  So if we change the query to return the 100 rows that are at StatusId = 2 the estimated number of rows is bad for each query.

SELECT *
FROM dbo.Test
WHERE StatusId = 2;
GO
DECLARE @StatusId INT = 2;
SELECT *
FROM dbo.Test
WHERE StatusId = @StatusId;
GO

For the first query we have the following:

ParametersInQueries05

This is only because SQL Server parametrised the query and used the same plan for StatusId = 1 and StatusId = 2 but there are lots of restrictions that stop SQL Server doing this by default.  For example we can add another table that contains the status and create a foreign key to our first table as follows.

CREATE TABLE dbo.Status
(
    Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    StatusDescription VARCHAR(100) NOT NULL
);
GO
INSERT INTO dbo.Status (StatusDescription)
VALUES ('Queue'), ('Pickup'), ('Process');
GO
ALTER TABLE dbo.Test ADD CONSTRAINT FK_Test_StatusId FOREIGN KEY (StatusId) REFERENCES dbo.Status (Id);
GO

Using a JOIN prevent SQL Server from parametrising the query so we can run the following queries, one without using a parameter and one using a parameter.

SELECT *
FROM dbo.Test T
INNER JOIN dbo.Status S ON S.Id = T.StatusId
WHERE S.Id = 2;
GO
DECLARE @StatusId INT = 2;
SELECT *
FROM dbo.Test T
INNER JOIN dbo.Status S ON S.Id = T.StatusId
WHERE S.Id = @StatusId;
GO

We can see that both queries return the same 100 rows.  If we look at the execution plans we can see they are very different.  We can also see that the first query has not been parametrised and the plan contains the actual value of 2.

ParametersInQueries06

ParametersInQueries07

If we check the estimated and actual rows in the first execution plan we can see that the index seek and key lookup both have 100 for both values.  The estimated number of rows for the clustered index scan is still 33333.33 which is the number of rows divided by the number of distinct values in those rows.

Conclusion:

Each situation will be different but in most cases it looks like it is not a good idea to use a descriptive static variable in your code instead of using a hard coded value, i.e. one where the value of the parameter is always going to be the same.  SQL Server doesn’t know that the value in the variable is going to be the same every time and generates an execution plan based on all possible values.

Create CLR objects in SSDT

I don’t know whether the same thing was possible in Data Dude / DB Pro but in SSDT for VS 2012 and VS 2013 it is very simple to create your CLR objects. I did originally think (maybe incorrectly) that you had to create a separate solution for your CLR .NET code that when built would produce a DLL that you included as a reference in your DB project. You then needed to create a CREATE ASSEMBLY script and then a script to create the SQL object, e.g. CREATE FUNCTION…

I’ve recently found out that in SSDT you just create your CLR .NET object within your DB project and when you publish the DB it generates everything for you.

In a previous post I talked about using a CLR aggregate function to concatenate row data into a single string. https://simonlearningsqlserver.wordpress.com/2013/09/10/how-to-concatenate-values-in-multiple-rows-to-a-single-string/

I created this aggregate in SSDT by creating a new SQL CLR C# object in my DB project as shown below.

SSDT CLR  C# Aggregates 1

As you can see in the list on the left there are SQL CLR and SQL CLR C# object types available. The SQL CLR objects are the objects that you see in SQL Server, i.e. the objects that you can create with T-SQL CREATE statements. The SQL CLR C# objects are the things that contain the .Net code.

You can tell SSDT to automatically create the SQL objects required to utilise your new CLR code by opening the properties of the database project that the code exists in.  On the SQLCLR tab there is a checkbox Generate DDL.

SSDT CLR  C# Aggregates 5

If this is checked then SSDT will automatically generate the SQL object in the dbo schema.  

You can see this in the Projects folder in SQL Server Object Explorer in VS as shown below…

SSDT CLR  C# Aggregates 2

… and also once you’ve published the database project you will see the objects in SSMS as shown below.

SSDT CLR  C# Aggregates 3

Above you can see both the assembly that has been created (with the same name as the database project, SimonBlog) and the SQL object (in this case an aggregate function, ConcatMagic).  Any other CLR objects that you add to the same database project will also get published in the same assembly.  You can confirm that the correct .Net method is being used by scripting the aggregate function.

SSDT CLR  C# Aggregates 4

If you want to add your objects to a schema other than dbo then it seems you have to uncheck the Generate DDL checkbox in the database project settings and add the required SQL objects, e.g. the CREATE AGGREGATE script.  Then when you publish SSDT won’t create the object automatically in dbo but will still publish the assmebly and run in the CREATE AGGREGATE script.

How to concatenate values in multiple rows to a single string

We quite often have a requirement (mainly in our reporting) where we need to list, in a single string, a bunch of values for a particular entity that are stored in our DB as a set of rows.

For example if we use the following script…

CREATE TABLE dbo.ConcatTest
(
 Id INT PRIMARY KEY IDENTITY NOT NULL,
 FKId INT NOT NULL,
 Country VARCHAR(100)
);
INSERT INTO dbo.ConcatTest (FKId, Country)
VALUES
(1, 'UK'),
(1, 'France'),
(2, 'US'),
(3, 'Germany'),
(3, 'France'),
(3, 'UK');
SELECT *
FROM dbo.ConcatTest;

This gives us the following data.

Concatenation1

We need a way of returning one row per FKId value along with a concatenated list of related countries

The following blog from  contains several ways of doing this but the one we use almost all of the time is the FOR XML one.

https://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/

i.e. we run the following.

DECLARE @Delimiter VARCHAR(10) = ' '; -- this is the delimiter we will use when we concatenate the values

SELECT DISTINCT
 FKId,
 (SELECT STUFF(
 (SELECT @Delimiter + Country
 FROM dbo.ConcatTest CT2
 WHERE CT2.FKId = CT1.FKId
 FOR XML PATH(''), ROOT('root'), TYPE
 ).value('/root[1]','VARCHAR(MAX)') -- using the .value method allows use to extract special characters such as &. 
 ,1,1,'')) AS Countries
FROM dbo.ConcatTest CT1;

We were originally not including the .values part but we found the following blog from Rob Farley that helped us to handle the characters that would get escaped as part of the conversion to XML.

http://sqlblog.com/blogs/rob_farley/archive/2010/04/15/handling-special-characters-with-for-xml-path.aspx

This gives us the following result.

Concatenation2

However, after reading the first blog I thought I’d give the CLR aggregate a try and it seems to perform a bit better and is a lot easier to use.  I found the following blog from  and decided to use this version.

http://www.mssqltips.com/sqlservertip/2022/concat-aggregates-sql-server-clr-function/

Here is the final CLR aggregate I went with (pretty much pinchded from Andy).

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data.SqlTypes;
using System.IO;
using System.Text;
using Microsoft.SqlServer.Server;
[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = -1)]
public struct ConcatMagic : IBinarySerialize
{
 private StringBuilder _accumulator;
 private string _delimiter;
 public Boolean IsNull { get; private set; }
public void Init()
 {
 _accumulator = new StringBuilder();
 _delimiter = string.Empty;
 this.IsNull = true;
 }
public void Accumulate(SqlString value, SqlString delimiter)
 {
 if (!delimiter.IsNull & delimiter.Value.Length > 0)
 {
 _delimiter = delimiter.Value;
if (_accumulator.Length > 0)
 {
 _accumulator.Append(delimiter.Value);
 }
}
_accumulator.Append(value.Value);
if (value.IsNull == false)
 {
 this.IsNull = false;
 }
 }
public void Merge(ConcatMagic group)
 {
 if (_accumulator.Length > 0 & group._accumulator.Length > 0)
 {
 _accumulator.Append(_delimiter);
 }
_accumulator.Append(group._accumulator.ToString());
 }
public SqlString Terminate()
 {
 return new SqlString(_accumulator.ToString());
 }
void IBinarySerialize.Read(System.IO.BinaryReader r)
 {
 _delimiter = r.ReadString();
 _accumulator = new StringBuilder(r.ReadString());
if (_accumulator.Length != 0) this.IsNull = false;
 }
void IBinarySerialize.Write(System.IO.BinaryWriter w)
 {
 w.Write(_delimiter);
 w.Write(_accumulator.ToString());
 }
}

We can now run the following T-SQL to produce the same result as the FOR XML T-SQL above

DECLARE @Delimiter VARCHAR(10) = ' '; -- this is the delimiter we will use when we concatenate the values
SELECT FKId, dbo.ConcatMagic(Country, @Delimiter)
FROM dbo.ConcatTest
GROUP BY FKId

This gives us the following result.

Concatenation3

Now to test the performance I’ve turned on statistics io and time and included the execution plan.  I get the following statistics. 

Concatenation4

The number of logical reads is much more in the FOR XML T-SQL but the elapsed time is under half the time.  For larger data sets I was getting the elapsed time to be much lower for the CLR aggregate.

I get the following execution plans

Concatenation5

I’ve zoomed into the FOR XML plan so you can see how much more complicated it is for SQL Server to process.  All the work for the aggregate is done by the CLR so obviously it’s not really appropriate here to try to compare the execution plans.  The useful metrics are the reads and time.