Using $PARTITION to find the last time an event occured

In a previous post I showed how you can use the $PARTITION system function to force your query to only look at one or more partitions and today I have a semi-practical example of how to use this.

We have a very big table that holds all our financial transactions along with an integer indicating what type of transaction it is.  I wanted to find out when the last time we recorded a certain transaction type as I think it might have been removed quite some time ago.

This could obviously been done as follows:

SELECT MAX(T.MyDate)
FROM dbo.MyTable T (NOLOCK)
WHERE T.TransTypeId = X

and depending on how your table is structured and indexes this may be the best way to do it.

Our table is partitioned by month on our date column so I can use $PARTITION to search each partition, starting with the most recent, and search for the last date TransTypeID = X was used.  When I find a value we can stop.  This can be done as follows:

DECLARE @Counter INT = 200 -- this should be the maximum partition ID on our table
DECLARE @MaxDate DATETIME2(0);

WHILE @Counter > 0
BEGIN

  RAISERROR('%d', 0,1,@Counter) WITH NOWAIT;

  SELECT @MaxDate = MAX(T.MyDate)
  FROM dbo.MyTable T
  WHERE T.TransTypeId = X
  AND $partition.PF_MyTable(MyDate) = @Counter

  IF @MaxDate IS NOT NULL
  BEGIN
    SELECT @MaxDate, @Counter
    BREAK;
  END

  SET @Counter -= 1;

END;

If TransTypeID X has occurred recently then this code won’t need to loop through many partitions but if it last occurred in partition 1 then this is probably going to be much slower.  It all depends on how your table is set up but I think it’s always nice to have multiple ways to potentially speed up a query.

Advertisements

More Partitioning

In a previous post I wrote an introduction to partitioning with a simple demo.  In this post I’m going to use a larger data set to compare the performance of partitioned and non partitioned tables.  I’ll also show that you need make sure any queries where you want to use partition elimination have the correct search terms.

Firstly, I’ll create a new partition function and scheme that will partition across every month in 2016.

CREATE PARTITION FUNCTION BlogPartFunc1 (DATETIME2(0))
AS RANGE RIGHT FOR VALUES ('20160101', '20160201', '20160301', '20160401', 
 '20160501', '20160601', '20160701', '20160801', 
 '20160901', '20161001', '20161101', '20161201' );
GO
CREATE PARTITION SCHEME BlogPartScheme1
AS PARTITION BlogPartFunc1 ALL TO ([PRIMARY]);
GO

Next I’ll create two tables…

CREATE TABLE dbo.LargeVolumeNonPartitioned
(
 Id INT NOT NULL IDENTITY(1,1),
 SomeDate DATETIME2(0) NOT NULL,
 SomePadding1 NVARCHAR(4000) NOT NULL,
 CONSTRAINT PK_LargeVolumeNonPartitioned PRIMARY KEY CLUSTERED (Id, SomeDate)
) ON [PRIMARY];
GO
CREATE TABLE dbo.LargeVolumePartitioned
(
 Id INT NOT NULL IDENTITY(1,1),
 SomeDate DATETIME2(0) NOT NULL,
 SomePadding1 NVARCHAR(4000) NOT NULL,
 CONSTRAINT PK_LargeVolumePartitioned PRIMARY KEY CLUSTERED (Id, SomeDate)
) ON BlogPartScheme1 (SomeDate);
GO

This has created two tables with the same columns, one partitioned and one not, and both with a clustered key on the Id and the SomeDate  columns.  Note that, I’ve purposely not put the SomeDate column as the first column in the clustered index as just doing this would bring the performance of querying the non partitioned table close to the partitioned table when filtering on a date range.  I’ll explain more about this further down.  There is also a column SomePadding1 where we can insert a large amount of text data (this will ensure we don’t get too many rows stored on the same data page and highlight the performance improvement even further)

I will run the following to stick some data in the two tables

SET NOCOUNT ON
DECLARE @DateTime DATETIME2(0) = '20151201'

WHILE @DateTime < '20170101'
BEGIN

 INSERT INTO dbo.LargeVolumePartitioned (SomeDate, SomePadding1)
 SELECT @DateTime, REPLICATE('A', 1000);

 INSERT INTO dbo.LargeVolumeNonPartitioned (SomeDate, SomePadding1)
 SELECT @DateTime, REPLICATE('A', 1000);

 SET @DateTime = DATEADD(MINUTE, 1, @DateTime)

END
GO

This will add a row for every minute between 01/12/2015 and 01/01/2017.  This takes a few minutes to run so feel free to cut it down if you’re using it.

This gives us 571,680 rows in each table, 1440 for each day and spread over just under 145,000 data pages

If we run a query to return all rows for 01/01/2016 from both tables in the same batch we obviously get the same data returned but we get a slightly different execution plan and different info returned by setting statistics IO and time on.

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT *
FROM dbo.LargeVolumeNonPartitioned LVP
WHERE LVP.SomeDate >= CAST('20160101' AS DATETIME2(0))
AND LVP.SomeDate < CAST('20160102' AS DATETIME2(0))

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE LVP.SomeDate >= CAST('20160101' AS DATETIME2(0))
AND LVP.SomeDate < CAST('20160102' AS DATETIME2(0))

We get the following in the Messages tab from statistics IO and time

(1440 row(s) affected)
Table 'LargeVolumeNonPartitioned'. Scan count 5, logical reads 162373, physical reads 0, read-ahead reads 79043, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
 CPU time = 374 ms, elapsed time = 9401 ms.
SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 0 ms.

(1440 row(s) affected)
Table 'LargeVolumePartitioned'. Scan count 1, logical reads 11219, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
 CPU time = 16 ms, elapsed time = 243 ms.
SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 0 ms.

There is obviously a large factor in the difference between the page reads and the query time differs from over 9 seconds to 0.243.

Here is the execution plans for both

01

If we look at the properties of the Clustered Index Scan on the partitioned table we see that we’ve only accessed 1 partition (Actual Partition Count) and partition 2 is the one that’s been accessed (Actual Partitions Accessed), i.e. the optimiser has used partition elimination to just look at the subset of data in that single partition and just scanned that. Whereas, the Clustered Index Scan on the non partitioned table has had to scan then entire table.

02

As I mentioned above I purposely didn’t put the SomeDate column as the first column in the clustered index.  If I had this would have meant that a clustered index seek operation would probably have been performed instead of a scan as shown in the execution plans above.  This means that on the non partitioned table, the query engine has had to read every page in the table.  On the partitioned table, the optimser has already told the query engine to just use partition 2 so it just has to do a full scan of this subset of data.

So like everything in SQL Server it depends on how you want to store your data and how you read and write that determines whether you partition the table or not.  If you have a large table and several of the your queries are causing long clustered index scans you might benefit from trying to partition the table (or you clustered index needs to be changed or new non clustered indexes added).

We have a table with a composite clustered key spread across 6 different columns.  We query this table in our code in lots of different ways and it was best to have a particular column as the first column in the clustered index as most queries used this one.  However, we have a date column that stored the date that the event for that row of data occurred.  This is part of the clustered key and we have partitioned the table on this column by month.  This allows us to quickly reduce the data set to the current month or last 6 months in lots of other queries and then perform additional filtering on the smaller data sets.

One thing that you do have to be wary of is ensuring your query allows partition elimination to occur.  In the example above the SomeDate column is a DATETIME2(0) data type.  You might have noticed that when filtering my query on the SomeDate column I explicitly cast the textual date values to a DATETIME2(0) data type.  If I hadn’t have done this we would have seen the following results in the statistics IO and Clustered Index Scan operation

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE LVP.SomeDate >= '20160101'
AND LVP.SomeDate < '20160102'

(1440 row(s) affected)
Table 'LargeVolumePartitioned'. Scan count 13, logical reads 143680, physical reads 0, read-ahead reads 132006, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
 CPU time = 483 ms, elapsed time = 2157 ms.
SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 0 ms.

03

The page reads has gone up from 11,219 to 143,680 and the Cluster Index Scan has now accessed all 13 partitions.

Even if we cast the date value to a DATETIME2 but with a different precision we still don’t get partition elimination.

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE LVP.SomeDate >= CAST('20160101' AS DATETIME2(1))
AND LVP.SomeDate < CAST('20160102' AS DATETIME2(1))

04

The same is true for other data types such as INT, BIGINT etc.  It’s something you need to be wary of when querying partitioned tables.

One last thing I want to demonstrate is a way of forcing the query engine to just look at one or more partitions.  We can do this with the $PARTITION system function.  We can use this to work out what partition a certain value sits on and when we know that we can ue that to filter our data.

Firstly, let’s use to should how many rows exist on each partition as follows…

SELECT $PARTITION.BlogPartFunc1(SomeDate) AS PartitionNumber, COUNT(*) as RowCount
FROM dbo.LargeVolumePartitioned LVP
GROUP BY $PARTITION.BlogPartFunc1(SomeDate)
ORDER BY 1

10

Here PartitionNumber = 1 is all the data before 01/01/2016, PartitionNumber = 2 is all the data in January 2016 and so on.  As you can remember we’ve got a row for every minute between 01/12/2015 and 01/01/2017 and you can see that all the months with 31 days have 44,640 rows, all the months with 30 days have 43,200 and February (PartitionNumber = 3) has 41,760 rows.

The query above comparing the partitioned and non-partitioned tables was looking at data just on 01/01/2016.  We know that this data sits on PartitionNumber = 2 but we can confirm this as follows…

SELECT $PARTITION.BlogPartFunc1('20160101');

11

Let’s say we want to return all the data for February.  We know this is on partition 3 so we can run the following…

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE $PARTITION.BlogPartFunc1(SomeDate) = 3

12

If we check the properties of the Clustered Index Scan operator in the execution plan we can see that only partition 3 was accessed.

13

Partitioning is a powerfully optimisation tool and is something that definitely should be considered during design time, particularly on wide tables where you might want to filter in several different ways but are only usually just interested in the latest data.

There is one final part of partitioning that I want to demonstrate and this is Partition Switching which I will show in the next post.

 

Introduction to Partitioning

Introduction

Table and index partitioning is something that has been around for a while and it’s something we use a lot on some of our large tables, particularly where we only really want to access the newest data and can do without having to think about older data.  There are two parts to partitioning that I think are very useful.  One is the ability to switch partitions between tables.  This is handy, for example, if you have a “live” table and an “archive” table.  You may only want to keep the last 12 months worth of data in your “live” table but still keep the older data in the “archive” table for various reasons including auditing.  You might want a process that archives the oldest month of data from the “live” table to the “archive” table.  Without partitioning you’ll likely have to copy the data from the “live” table into the “archive” table and delete the “live” data.  This means that all that data gets moved from one bunch of data pages to another.  If your “live” and “archive” tables are both partitioned by month then you can just switch the oldest partition from “live” to “archive”.  This leaves the data in the same pages and just updates that metadata that says those pages belong to the “archive” table now.  I’m not going to demo partition switching here but here is some more detail if you are interested: https://technet.microsoft.com/en-us/library/ms191160(v=sql.105).aspx

The second useful part is the bit I use all the time and this is a performance improvement where we only touch a small number of the partitions in a table rather than the whole table.  This is called “partition elimination” and is where the optimiser works out it doesn’t have to bother with certain partitions so eliminates them from the execution plan.  An example of this is a bunch of reporting tables we have with a website that calls a stored proc to read the report data.  The reports are all displayed by month and you chose the month you want to view from a drop down list at the top.  These reporting tables are partitioned by month and only one partition is ever accessed at a time.

You can store each partition on a different filegroup so you can have data that is accessed more frequently on disks with better I/O and put stuff rarely touched on less efficient disks.  In the following demo I’m showing you how to put the partitions on different filegroups but the main point for this is to show the performance improvement with partition elimination.

Demo

I’ve already got a database “SimonBlog” with a single PRIMARY filegroup.  I’m now going to add two new filegroups and files.

USE master
GO

ALTER DATABASE SimonBlog
ADD FILEGROUP BlogFG1;
GO

ALTER DATABASE SimonBlog
ADD FILEGROUP BlogFG2;
GO

ALTER DATABASE SimonBlog 
ADD FILE 
(
 NAME = BlogFG1F1,
 FILENAME = 'D:\Data\MSSQL11.MSSQLSERVER\MSSQL\DATA\BlogFG1F1.ndf',
 SIZE = 50MB,
 MAXSIZE = 1000MB,
 FILEGROWTH = 50MB
)
TO FILEGROUP BlogFG1;
GO

ALTER DATABASE SimonBlog 
ADD FILE 
(
 NAME = BlogFG2F1,
 FILENAME = 'D:\Data\MSSQL11.MSSQLSERVER\MSSQL\DATA\BlogFG2F1.ndf',
 SIZE = 50MB,
 MAXSIZE = 1000MB,
 FILEGROWTH = 50MB
)
TO FILEGROUP BlogFG2;
GO

Before we can create a partitioned table we need to create a partition function and a scheme that maps the partitions to one of more filegroups.  The partition function is the thing that specifies how you want your table partitioned and this can be used by more than one table.

So let’s create a partition function that partitions the first quarter of this year into different months.  The partitioning data type is DATETIME2(0) which means we can only use this partition on a table with a DATETIME2(0) data type column.

USE SimonBlog
GO

CREATE PARTITION FUNCTION BlogPartFunc (DATETIME2(0))
AS RANGE RIGHT FOR VALUES ('20160101', '20160201', '20160301');

We can see this partition function and it’s values in the following DMVs

SELECT *
FROM sys.partition_functions;

partitioning01

SELECT *
FROM sys.partition_range_values;

partitioning02

Even though we’ve only specified three  boundaries, ‘20160101’, ‘20160201’ and ‘20160301’, we will actually have four possible partitions: first, everything up to ‘20160101’; second, everything from ‘20160101’ up to ‘20160201; third, everything from ‘20160201’ up to ‘20160301’: and fourth, everything from ‘20160301 onward.

The “RIGHT” keyword in the “…AS RANGE RIGHT FOR VALUES…” part can either be “LEFT” or “RIGHT” and specifies whether the boundary value is in the left partition or the right.  In this case we’ve used “RIGHT” and this means the first partition is everything up to but not including ‘20160101’ and the second partition is everything after and including ‘20160101’, i.e, the boundary value of ‘20160101’ is included in the second partition.  If we used “LEFT” then ‘20160101’ would be in the first partition, i.e. it would be everything up to and including ‘20160101’.

So anyway, we have four possible partitions and now we need to map these to one or more filegroups.  We can map all partitions to one filegroup in the following way

CREATE PARTITION SCHEME BlogPartScheme
AS PARTITION BlogPartFunc ALL TO ([PRIMARY]);

or we can specify where each of the four partitions goes

CREATE PARTITION SCHEME BlogPartScheme
AS PARTITION BlogPartFunc TO (BlogFG1, [PRIMARY], BlogFG2, [PRIMARY]);

This puts the first partition on the filegroup BlogFG1, the third partition on BlogFG2 and the second and fourth both on PRIMARY – you probably wouldn’t do this but this just highlights that it’s completely flexible.

We can now create our partitioned table.  We just have to specify that the table is being created on our partition scheme and what column is going to be used for partitioning.

CREATE TABLE dbo.BlogPartitionTest
(
 Id INT NOT NULL IDENTITY(1,1),
 SomeDate DATETIME2(0) NOT NULL
) ON BlogPartScheme (SomeDate);

You might notice that this table is currently a heap.  There is a requirement where the partitioning column must be part of any unique index on the table so I couldn’t use the Id column as a PRIMARY KEY.  If I tried to create a new table on the same partition scheme with a PRIMARY KEY on the Id column then I get an error

CREATE TABLE dbo.BlogPartitionTest2
(
 Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
 SomeDate DATETIME2(0) NOT NULL
) ON BlogPartScheme (SomeDate);

Msg 1908, Level 16, State 1, Line 1
Column 'SomeDate' is partitioning column of the index 'PK__BlogPartitionTes__5EA4867D'. Partition columns for a unique index must be a subset of the index key.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

The table doesn’t have to be a heap, it can be a clustered index, it just can’t be unique unless the partitioning column is included.  For example we can add a clustered index to the first table

CREATE CLUSTERED INDEX IX_BlogPartitionTest ON dbo.BlogPartitionTest (Id)

We can now query another DMV to view our four partitions

SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'BlogPartitionTest'

partitioning03

Please note, if you query sys.partitions with no WHERE clause you might be surprised to see several rows returned.  This DMV considers every table to contain at least one partition, so all non-partitioned tables are actually tables with a single partition.

Now we can insert some data across the various partitions and see where it all sits

INSERT INTO dbo.BlogPartitionTest(SomeDate)
VALUES ('20151203'),
('20160101'),
('20160131'),
('20160215'),
('20160302');

So we’ve inserted five rows.  Looking at the dates it should be obvious that the first rows will be on the first partition, the second and third rows will be on the second partition, the fourth row will be on the third partition and finally the fifth row on the last partition. If we re-query sys.partitions we can see the rows counts confirm this

SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'BlogPartitionTest'

partitioning04

Now we can query the table and try to pull back the row for ‘20160101’ as follows

SELECT *
FROM dbo.BlogPartitionTest
WHERE SomeDate = CAST('20160101' AS DATETIME2(0))

partitioning05

If we include the execution plan we can view the properties of the data retrieval operation and confirm that partition elimination has been used and only one partition has been accessed

partitioning06

partitioning07

The “Actual Partition Count” tells us that on 1 partition was accessed and “Actual Partitions Accessed” tells us that partition 2 was the only one accessed.

If we run a select over the whole table and view the properties of the data retrieval operation we see the following

partitioning08

As expected we can see that all 4 partitions were accessed.

This demo only contains a very small amount of data but in the next post I’ll show another demo with a larger volume of data and compare performance of partitioned versus non-partitioned.  I’ll also show that you need make sure any queries that you want to use partition elimination use the correct search terms.

Create CLR objects in SSDT

I don’t know whether the same thing was possible in Data Dude / DB Pro but in SSDT for VS 2012 and VS 2013 it is very simple to create your CLR objects. I did originally think (maybe incorrectly) that you had to create a separate solution for your CLR .NET code that when built would produce a DLL that you included as a reference in your DB project. You then needed to create a CREATE ASSEMBLY script and then a script to create the SQL object, e.g. CREATE FUNCTION…

I’ve recently found out that in SSDT you just create your CLR .NET object within your DB project and when you publish the DB it generates everything for you.

In a previous post I talked about using a CLR aggregate function to concatenate row data into a single string. https://simonlearningsqlserver.wordpress.com/2013/09/10/how-to-concatenate-values-in-multiple-rows-to-a-single-string/

I created this aggregate in SSDT by creating a new SQL CLR C# object in my DB project as shown below.

SSDT CLR  C# Aggregates 1

As you can see in the list on the left there are SQL CLR and SQL CLR C# object types available. The SQL CLR objects are the objects that you see in SQL Server, i.e. the objects that you can create with T-SQL CREATE statements. The SQL CLR C# objects are the things that contain the .Net code.

You can tell SSDT to automatically create the SQL objects required to utilise your new CLR code by opening the properties of the database project that the code exists in.  On the SQLCLR tab there is a checkbox Generate DDL.

SSDT CLR  C# Aggregates 5

If this is checked then SSDT will automatically generate the SQL object in the dbo schema.  

You can see this in the Projects folder in SQL Server Object Explorer in VS as shown below…

SSDT CLR  C# Aggregates 2

… and also once you’ve published the database project you will see the objects in SSMS as shown below.

SSDT CLR  C# Aggregates 3

Above you can see both the assembly that has been created (with the same name as the database project, SimonBlog) and the SQL object (in this case an aggregate function, ConcatMagic).  Any other CLR objects that you add to the same database project will also get published in the same assembly.  You can confirm that the correct .Net method is being used by scripting the aggregate function.

SSDT CLR  C# Aggregates 4

If you want to add your objects to a schema other than dbo then it seems you have to uncheck the Generate DDL checkbox in the database project settings and add the required SQL objects, e.g. the CREATE AGGREGATE script.  Then when you publish SSDT won’t create the object automatically in dbo but will still publish the assmebly and run in the CREATE AGGREGATE script.

Locking, Blocking and Isolation Levels

Last updated: 12/01/2014

As I mentioned in a previous post all actions performed against data or the SQL objects, i.e. any DML or DDL statements, are run inside transactions.  This post delves into more details into how transactions work…

SQL Server maintains data integrity and system stability by adhering to the ACID properties which is an acronym for Atomicity, Consistency, Isolation and Durability.

The isolation part of this acronym states that all transactions are run in isolation from each other and no two resources can update the same data or definition of a SQL object at the same time. In the pessimistic isolation levels (Read Uncommitted, Read Committed, Repeatable Read and Serializable) this is controlled by locking. In the optimistic isolation levels (Read Commited Snapshot and Snapsnot) this is ontrolled by row versioning.  In this post I’m only going to deal with the four pessimistic isolation levels.

The are several types of locks, including shared, exclusive, update, intent and schema. To start with we’ll just look at shared and exclusive as these are the critical ones for getting an understanding of how SQL Server uses locks when reading or writing data.

The locks can be placed on various resources and by default this is handled by SQL Server. For example, if you are just updating one row in a table then SQL Server will most likely just place a lock against that one row. If you are updating every row in a table then most likely the whole table will have a lock placed against it.

The way locks behave is controlled by the isolation level and the the default level is READ COMMITTED. At this level shared locks are requested when reading data and are released once the data has been read. They are not held on until the transaction commits. Exclusive locks are requested whenever data or SQL objects are updated. In the default level these are not released until the transaction commits.

If a resource has a lock held against it then another transaction that tries to access the same resource (by requesting a lock) may be blocked. Blocking will occur if the lock currently held is incompatible with the newly requested lock (see below for more on lock compatibility).  Exclusive locks are not compatible with other exclusive locks or with shared locks. Shared locked are compatible with each other. This means that two transactions can issue shared locks on a resource and can both read the data. It also means that one transaction cannot update data until another transaction has finished reading it and one transaction cannot read data that is being updated by another transaction (except in the READ UNCOMMITTED isolation level which is discussed in the next section).

Isolation Levels

There are four isolation levels that use locks and these allow you to control whether you want more consistency or more concurrency in your application.  The greater the consistency, the lower the concurrency and the greater the concurrency, the lower the consistency.  If you want more resources to be able to access the data quicker then you need to reduce locking which may reduce the consistency of the data.  If you want more consistency in your data then you may have to holds locks for longer which can cause more blocking and can reduce concurrency.

I’m going to demo how some of these different levels work so let’s set up a bit of test data quickly…

CREATE TABLE dbo.IsolationLevelTesting
(
 Id INT NOT NULL PRIMARY KEY
 ,SomeData VARCHAR(100) NOT NULL
);
INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (1, 'Simon');

READ UNCOMMITTED

This level allows the most concurrency but the least consistency.  When this level is set, shared locks are not requested whenever data is read.  This removes the blocking caused by the incompatibility between exclusive and shared locks and means that data can be read even if it has an exclusive lock against it if, for example, it is being updated.  The read request will read the new uncommitted version of the data that is currently held in the other transaction.  If the transaction is rolled back for some reason and the data is re-read then this time the original version of the data will be read and the uncommitted version is lost.  If you application has already read the uncommitted version and starting doing something with it then you may end up with data integrity issues if it suddenly disappears.  Reading uncommitted data is called a dirty read.  Here’s a quick demo…

From one connection begin a transaction and update the row in dbo.IsolationLevelTesting.

BEGIN TRAN
UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Peacock'
WHERE Id = 1;

This will leave the transaction open.  Now from a second connection set the isolation level to READ UNCOMMITTED and select the data from dbo.IsolationLevelTesting.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT *
FROM dbo.IsolationLevelTesting;

You should see the value Peacock returned which is the uncommitted version.  Now rollback the transaction from the first connection.

ROLLBACK TRAN

Re-run the select statement from the second connection and you should see the data is back to Simon.  

You can also use hints to override the existing session transaction isolation level and can use the following to run the select statement in the READ UNCOMMITTED level.

SELECT *
FROM dbo.IsolationLevelTesting WITH (NOLOCK);

In my opinion you should only use the WITH (NOLOCK) table hint in your production code in exceptional circumstances and I can think of no reason that you would use it.  If you are having performance problems and think using the WITH (NOLOCK) table hint will fix them then you probably need to first look at the code that is requesting the exclusive locks (or other incompatible locks) on the resource you are trying to access.  Remember that in the READ UNCOMMITTED level any select statements will read the dirty uncommitted data and not the original data before it was locked.

READ COMMITED

As mentioned above this is the default level.  This provides more consistency than READ UNCOMMITTED as read commands request shared locks which are incompatible with any exclusive locks held in open transactions that are updating data.  This prevents dirty reads.  Let’s try and re-run the code in the READ UNCOMMITTED level demo above…

Start in new transaction in one connection and update the data in dbo.IsolationLevelTesting.

BEGIN TRAN
UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Peacock'
WHERE Id = 1;

Now from a second connection set the isolation level to READ COMMITTED and then try and read from dbo.IsolationLevelTesting.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT *
FROM dbo.IsolationLevelTesting;

You’ll find that the read request doesn’t return and just hangs.  This request is blocked by the open transaction in the first connection and will just hang until the lock is released.  Now is an appropriate time to look at how we can view what locks are currently in place and whether any of them are blocking other requests.  The SPID of my first connection with the open transaction is 51 and the SPID of the second connection with the hanging read request is 52.

You can run the built-in sp_who2 stored proc to list all connections and their status.  When I run this I see the following two rows…

Locking1

You can see that SPID 52 is SUSPENDED and is blocked by SPID 51 (shown in column 5). This stored proc is quite handy for seeing if there is currently any blocking if a query you are running is hanging.  However if you want to see more information then download and run Adam Machanic’s sp_whoisactive. If I run this I can see the following two rows returned…

Locking2

This shows that SPID 51 is blocking 52 and also gives you a sql_text column that when you click on it shows you the statement being run. Here’s what I see when I click on the one of SPID 51…

Locking3

You can also delve deeper into what locks are being held by looking at the sys.dm_tran_locks view.  If I filter it by my two SPIDs as follows…

SELECT *
FROM sys.dm_tran_locks
WHERE request_session_id IN (51, 52)
ORDER BY request_session_id;

I get the following rows returned…

Locking5

Here you can see all the different locks that are placed on the various resources within each session.

The first four rows show the locks taken out for SPID 51 which is the one with the open transaction in.  You can see that there is an exclusive lock (request_mode = X) placed on the KEY resource_type.  The KEY value means a row in a clustered index as the table has a primary key.  You can also see that there is a shared lock (request_mode = S) on the DATABASE and intent exclusive locks (request_mode = IX) on the OBJECT and PAGE resources.  Intent locks are used to allow SQL Server to determine if there are any locks further down the resource chain that would prevent a lock being placed on a higher-level resource.  For example, if there was an exclusive lock on a single row in a table, there would be an intent exclusive lock on the page and the table.  If a query was run to select all data in the table, then a shared lock on the table would be requested. Because there is already an intent exclusive lock on the table, SQL Server knows that the requested shared lock on the table must wait until the exclusive lock further down the resource chain is released.  This removes the need for SQL Server to scan every page and row to check for locks.

The second four rows in sys.dm_tran_locks show the locks taken out for SPID 52 which is the hanging SELECT statement.  You can see there is a shared lock (request_mode = S) at the KEY level that has a request_status of WAIT.  This means this lock is waiting for the resource to become available.  You can also see the shared and intent shared locks on the DATABASE, PAGE and OBJECT resources.

There are also several other ways to identify and monitor locks and blocks but these you usually enough for me to work out what is going on.

Now back to the isolation levels…

If we now go and commit the open transaction in the first connection the new value is committed to the database, the exclusive lock is released and the shared lock from the second connection is allowed in and the newly committed data is read.

As mentioned above in the default READ COMMITTED level any shared locks are released as soon as the read has completed and does not care if there is still an open transaction for that batch.  Say, for example, you have a stored procedure that is run inside a transaction.  At the beginning you run a SELECT statement to check the state of some piece of data.  A shared lock is requested, the data is read and then the lock is released.  The stored proc may then do some processing and then re-run the original select statement to confirm the state of the data.  Because the shared lock is not held on the original SELECT statement another transaction could have updated the data while the other statements in the stored proc were running.  When the second SELECT happens the updated data will be returned and this could cause problems with data integrity if further processing, this time with the different data, is required. Here’s a simple demo…

From one connection set the level to READ COMMITTED, begin a transaction and select the data from dbo.IsolationLevelTesting.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

The transaction is left open but the shared locks are released and this returns the following data…

Locking6

If we check in sys.dm_tran_locks there are no open locks (except the ones at the database level – which is because we are connected to the database).  Now from a second connection run the following to update the data in the table…

UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Updated'
WHERE Id = 1;

Now back in the first connection with the open transaction re-run the SELECT statement and commit the transaction.

SELECT *
FROM dbo.IsolationLevelTesting;
COMMIT TRAN

The following is returned..

Locking7

The means that the same query executed within a single transaction can return different results.  This is called a non-repeatable read and can be prevented by using the next level up which is aptly named…

REPEATABLE READ

This level prevents non-repeatable reads and offers more consistency than READ COMMITTED by not releasing the shared locks until any open transactions are committed.  We’ll repeat the last demo…

Set the isolation level to REPEATABLE READ, begin a transaction and then select from the table.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

This returns the following…

Locking8

Now if we query sys.dm_tran_locks as follows…

SELECT *
FROM sys.dm_tran_locks
WHERE request_session_id IN (51, 52)
ORDER BY request_session_id

we get the following returned…

Locking9

We can now see the shared lock on the KEY that is still being held as the transaction is still open.

Now from a second connection if we try to update the data in that row as follows the query will just hang…

UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Repeatable'
WHERE Id = 1;

If we again query sys.dm_tran_locks we get the following…

Locking10

Now we can see the exclusive lock at the KEY level waiting for the resource to become available.  If we go back to the first connection and re-run the SELECT statement we will still see the original value of Updated returned.  If we now commit the transaction the exclusive lock in the second connection is granted, the data will be updated and if we re-query from the first connection the updated value of Repeatable will be returned.

Obviously as locks are held for a longer time there is going to be more blocking on other requests and this could reduce concurrency.  However, please note, it is only incompatible locks that are blocked.

If we consider the previous example about a stored proc that opens a transaction, reads some data, does some processing and then re-read the same data, we’ve seen how to prevent dirty reads and non-repeatable reads.  But this only takes into consideration existing data.  What if another row is added after the first read that satisfies the criteria of the SELECT statement in the open transaction.  Even though the shared lock still exists in the open transaction a new row can be added and a second read of the same data will result in a different data set begin returned.  This can be demonstrated as follows.  First we need to add some more test data…

INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (2, 'Transaction'),
(4, 'Level');

Now if we use the REPEATABLE READ level, begin a transaction and SELECT from the table as follows…

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

This returns the following results…

Locking11

The transaction is still open and the shared locks still exist on these three rows.  This means these rows cannot be modified as an exclusive lock would be blocked.

However, from a second connection we can open another transaction which inserts a row into the table as follows…

BEGIN TRAN
INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (3, 'Isolation');

Now if we rerun the SELECT statement from the first transaction we get the following results returned…

Locking12

If the transaction that did the insert of the row with Id = 3 is rolled back and the SELECT  statement in the first transaction is re-run then only the original three rows are returned.

This is a phantom read and is where over the course of a transaction, two or more identical queries are run and the data sets returned are different.

To combat this we can use the serializable level.

SERIALIZABLE

This is the last of the four levels and is the strongest as far a consistency is concerned but will also cause the most blocking and is more likely to reduce concurrency. In this isolation level key-range locks are used which stop data being added if  that satisfies the conditions of a SELECT statement in an open transaction.

We can demonstrate this as follows (roll back any open transaction from previous examples)…

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

If we again query sys.dm_tran_locks we can see the range locks that have been granted…

Locking13

Now if we try to add a new row to the table (which would satisfy the conditions of the SELECT statement in the open transaction as we are just returning all rows) the query will just hang…

BEGIN TRAN
INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (5, 'Locking');

The data in sys.dm_tran_locks for the INSERT statement is as follows…

Locking14

If we commit the first transaction the waiting lock will be granted and the insert will happen.

Lock Modes and Compatibility

I have already mentioned some of the lock modes that SQL Server uses and here is the complete list http://technet.microsoft.com/en-us/library/ms175519(v=sql.105).aspx.

Here is the list of the resources that SQL Server can place locks on http://technet.microsoft.com/en-us/library/ms189849(v=sql.105).aspx.

Here is the matrix that shows lock compatibility. http://technet.microsoft.com/en-us/library/ms186396(v=sql.105).aspx

Row-Version Transaction Isolation Levels

In addition to the four transaction isolations levels that use locking there are also the SNAPSHOT and READ COMMITTED SNAPSHOT levels.  Rather than placing locks on resources previous versions of data are stored in tempdb.  I hope to talk about these in a separate post at some point in the future but for further reading please see here http://technet.microsoft.com/en-us/library/ms177404(v=sql.105).aspx

Summary

In most cases the default READ COMMITTED transaction isolation level is the one to use.  It’s always good practise to keep you transaction as short as possible as this will prevent locks being held for a long time.

Controlling Database Compatibility Level with SSDT

We’re currently look into upgrading from SQL Server 2008 R2 to SQL Server 2012.  I think the plan is to detach the existing databases, install 2012 and attach to the new instance.  Once we’ve attached we will set the compatibility level in the databases properties to be SQL Server 2012.

SSDT Compatibility Level 1

Our databases are under source control in TFS using SSDT for VS2012.  We have several enivornments that we will need to upgrade and to make sure we don’t forget to update the compatibility level we will update it in our SSDT projects and then it will get updated when we publish.  To set the compatibility level in SSDT click on the “Database Settings…” button in the “Project Settings” tab of the database properties.  Then under “Miscellaneous” select the required value from the “Compatibility Level” drop down.

SSDT Compatibility Level 2

However, by default SSDT doesn’t check for differences in the compatibility level when it publishes a database project.  You can enable it in the “Advanced” settings of the “Publish Wizard” and then save the publish profile for later use.

SSDT Compatibility Level 3

Then if there is a difference between the compatibility level in your database and project a statement similar to the following will be included in the publish script.

SSDT Compatibility Level 4

Query Execution Phases

Here’s how SQL Server executes a query…

Performed By Phase Description
Relational Engine T-SQL Statement that needs to be executed
Parse Check the T-SQL syntax
Create a tree of logical operators called the parse tree
Bind Check whether the objects exist
Bind the objects to the logical operators from the parse phase and create a tree of logical operators bound to the actual objects called the algebrised tree
Optimise Generate candidate plans and select the optimal one
Create the execution plan with the logical operators mapped to the physical operators
Storage Engine Execute Execute the query
Cache the plan