Indexed Views and Blocking

I recently came across at situation in SQL Server where one process was blocking another and there seemed to be absolutely no link between the two tables in question.  It took a little while to work out what was causing the blocking but I’ve managed to recreate what I found with a more simple example.

We’d identified a bug where a MERGE statement into a table was now taking a very long time to complete but the part I couldn’t immediately understand was why this long running merge statement was blocking an update on another seemingly unrelated table.

Here’s something similar to what I found using sp_whoisactive

01

The blocking query (this is equivalent to the MERGE statement I was seeing on my live environment) is as follows

UPDATE dbo.Address
SET PostCode = 'AA11AA'
WHERE AddressID = 1;

and the query being blocked is

UPDATE dbo.Person
SET Name = 'Simon'
WHERE PersonID = 1;

In my database there is no direct link between the Person and Address tables, e.g. no direct foreign keys from one table to another, and there are no explicit transactions being used – so there isn’t an update to the Person table in the same transaction as the update to the Address table, each will just commit once completed.

The block was caused because there was an indexed view using both the tables in the two transactions.

The bug has since been fixed so the MERGE statement runs very quickly now so the block doesn’t last long but it’s certainly something I will be more wary of when using indexed views.  For example, if you need to run a large update on a table within an indexed view, such as a backfill of data, you might inadvertently block writing of data to one or more other tables within that same indexed view.

If it’s not obvious why SQL Server has to do this, below are details of what is actually happening during the block.


To replicate the issue I ran the following code to create a few tables and put an indexed view over the top of them.  This is just something simple that links people to addresses in a many-to-many relationship.

CREATE TABLE dbo.Person
( 
  PersonID INT CONSTRAINT Person_PersonID PRIMARY KEY NOT NULL IDENTITY(1,1), 
  Name VARCHAR(100), 
  DOB DATETIME2(0)
);
GO

CREATE TABLE dbo.AddressType
( 
  AddressTypeID INT CONSTRAINT AddressType_AddressTypeID PRIMARY KEY NOT NULL IDENTITY(1,1), 
  AddressType VARCHAR(100)
);
GO

CREATE TABLE dbo.Address
(
  AddressID INT CONSTRAINT Address_AddressID PRIMARY KEY NOT NULL IDENTITY(1,1), 
  AddressLine1 VARCHAR(100), 
  AddressLine2 VARCHAR(100), 
  PostCode VARCHAR(10), 
  AddressTypeID INT CONSTRAINT FK_Address_AddressTypeID FOREIGN KEY REFERENCES dbo.AddressType (AddressTypeID)
);
GO
CREATE TABLE dbo.PersonAddress
(
  PersonID INT CONSTRAINT FK_PersonAddress_PersonID FOREIGN KEY REFERENCES dbo.Person (PersonID), 
  AddressID INT CONSTRAINT FK_PersonAddress_AddressID FOREIGN KEY REFERENCES dbo.Address (AddressID)
);
GO
CREATE VIEW dbo.PeopleAddresses
WITH SCHEMABINDING
AS
SELECT P.PersonID, P.Name, P.DOB, A.AddressLine1, A.PostCode, T.AddressType
FROM dbo.PersonAddress PA
JOIN dbo.Person P ON P.PersonID = PA.PersonID
JOIN dbo.Address A ON A.AddressID = PA.AddressID
JOIN dbo.AddressType T ON T.AddressTypeID = A.AddressTypeID;
GO
CREATE UNIQUE CLUSTERED INDEX IX_PeopleAddresses ON dbo.PeopleAddresses (PersonID, Name, DOB, AddressLine1, PostCode, AddressType);
GO

I’ve then put one row in each of the three main tables as follows

INSERT INTO dbo.AddressType (AddressType)
VALUES ('Home'), ('Work');
GO

INSERT INTO dbo.Address (AddressLine1, AddressLine2, PostCode, AddressTypeID)
VALUES ('AddressLine1', 'AddressLine2', 'PostCode', 1);
GO

INSERT INTO dbo.Person (Name, DOB)
VALUES ('My Name', '20000101');
GO

INSERT INTO dbo.PersonAddress (AddressID, PersonID)
SELECT PersonID, PersonID
FROM dbo.Person;
GO

Next I’m going to update the PostCode in the Address table for our single row.  To mimic a long running update I am actually explicitly beginning a transaction.

BEGIN TRAN

UPDATE dbo.Address
SET PostCode = 'AA11AA'
WHERE AddressID = 1;

As this transaction is still active we can check which locks have been taken using the following query which gives the following results (the SPID of my query is 54).

SELECT
  ISNULL(OBJECT_NAME(p.object_id),o.name) AS ObjectName, 
  resource_type, 
  resource_subtype, 
  resource_description, 
  resource_associated_entity_id, 
  request_mode, 
  request_type, 
  request_status
FROM sys.dm_tran_locks l
LEFT JOIN sys.partitions p ON p.hobt_id = l.resource_associated_entity_id
LEFT JOIN sys.objects o ON o.object_id = l.resource_associated_entity_id
WHERE request_session_id = 54
ORDER BY resource_type, ObjectName

02

As expected we’ve got a shared lock (S) on the database and the second row is the exclusive lock (X) on the row we’re updating in the Address table. This lock also means we get intent exclusive locks (IX) on the page and the table. SQL Server takes these intent locks for two reasons: firstly, in case the amount of locks taken on individual rows breaches the threshold and the lock gets escalated to the table or page – the intent lock stops the escaltion being blocked; and secondly, for performance reasons – if another process tries to update the table it would have to check every row for existing locks, but the intent lock tells this process there is a lock somewhere below so it has to wait. For more details on locking, blocking and isolation levels see the links below.

As well as the expected locks on the Address table we’ve also got the X and IX locks on the PeopleAddresses indexed view which means that we would be blocked reading from PeopleAddresses (unless using the read uncommitted isolation level).

So if we now try and update the Person table in a different process as follows

UPDATE dbo.Person
SET Name = 'Simon'
WHERE PersonID = 1;

this query just hangs and we see the same blocking using whoisactive shown at the top of this post.

This is the locks that are held for the hanging query for the update on the Person table

03

An exclusive lock (X) has been granted on the row in the Person table but the process is also requesting a shared lock (S) on the row in the Address table.  This row already has the X lock from SPID 54 so the Person update is blocked.

In this example we had one row in Address and one in Person and they were connected using the PersonAddress table. But what would happen if we tried to update a Person record that isn’t linked to the Address?

Let’s add a new Person, a new Address and link them together.

INSERT INTO dbo.Address (AddressLine1, AddressLine2, PostCode, AddressTypeID)
VALUES ('AddressLine1', 'AddressLine2', 'PostCode', 2);
GO

INSERT INTO dbo.Person (Name, DOB)
VALUES ('My Name', '20000101');
GO

INSERT INTO dbo.PersonAddress (AddressID, PersonID)
VALUES (2,2);
GO

So now let’s mimic the long running lock on the Address table for the first address again

UPDATE dbo.Address
SET PostCode = 'AA11AA'
WHERE AddressID = 1;

and then try and update the Person record for the second person row

UPDATE dbo.Person
SET Name = 'Simon'
WHERE PersonID = 2;

This works without any blocking because there is no lock directly on the Person table and the corresponding row in the indexed view is not locked.

If we run both UPDATE statements within explicit transactions we can see which locks are taken. SPID 54 is the Address update and 56 is the Person

04

As you can see there are no two same resources being locked across the two SPIDs with the exception of the IX on the PeopleAddresses PAGE (1:309096). However, this does not block as IX locks are compatible with each other.

This is just showing what happens when row level locks are taken but how does lock escalation affect things?  Let’s insert lots more data and see

INSERT INTO dbo.Address (AddressLine1, AddressLine2, PostCode, AddressTypeID)
SELECT AddressLine1, AddressLine2, PostCode, AddressTypeID
FROM dbo.Address;
GO 20

INSERT INTO dbo.Person (Name, DOB)
SELECT Name, DOB
FROM dbo.Person;
GO 20

INSERT INTO dbo.PersonAddress (AddressID, PersonID)
SELECT PersonID, PersonID
FROM dbo.Person
WHERE PersonID > 2

This means we’ve now got 2,097,152 rows in each table

Now let’s try similar queries to before but updating 100,000 rows in the Address table in the first query

UPDATE dbo.Address
SET PostCode = 'AA11AA'
WHERE AddressID <= 100000;

05

So many row level locks were taken for this update that SQL Server has decided it’s better to escalate the lock to the whole table.  So rather than exclusive (X) locks on the rows (KEY) we now have exclusive locks on both the Address table and PeopleAddresses indexed view.  So now an update on any single row in the Person table, even one not linked to one of the 100,000 Address rows being updated, is blocked. For example, the following would be blocked.

UPDATE dbo.Person
SET Name = 'Simon'
WHERE PersonID = 2000000;

So finally what happens if first query is updating a column that is not included in the select list in the indexed view but is part of one of the tables in the view.

I purposely left out the Address2 column from the Address table in the PeopleAddresses indexed view.

If the first update just updates this Address2 column

UPDATE dbo.Address
SET AddressLine2 = 'New Address 1'
WHERE AddressID = 1;

no lock is placed on the PeopleAddresses indexed view at all

06

This means there are only locks on the Address table itself and any updates to the Person table would not be blocked.

For more about locking, blocking and isolation levels see my posts from a few years ago…

Locking, Blocking and Isolation Levels

Key-Range Locks in the Serializable Isolation Level and using sys.dm_tran_locks and DBCC PAGE to see what’s locked

 

Advertisements

Key-Range Locks in the Serializable Isolation Level and using sys.dm_tran_locks and DBCC PAGE to see what’s locked

In a previous post I talked about Locking, Blocking and Isolation Levels.  In this post I’m going to go into more detail about the key-range locks used in the SERIALIZABLE isolation level and also demonstrate to use the sys.dm_tran_locks DMV to view existing locks and also how to work out what resources the locks are placed on.

Key-range locks are only used in the SERIALIZABLE isolation level and their purpose is to stop phantom reads which are possible in the other three pessimistic concurrency isolation levels, namely read uncommitted, read committed and repeatable read.  A phantom read is where one transaction performs a read on a range of data, e.g. using a query with a WHERE clause, and another transaction adds data that will satisfy the conditions of the WHERE clause.  If the first transaction performs the same read it will now return the new data resulting in phantom data appearing in the second read within the same transaction.  The key-range locks are used to prevent phantom data from being added and they are placed on the keys of the index that is used to retrieve the data.  The following example using the Person.Person table in the AdventureWorks2012 database will demonstrate how these key-range locks are used.

The Person.Person table has a non-clustered index IX_Person_LastName_FirstName_MiddleName with the LastName column as the first column in the index key.  This means that any query where we are filtering on LastName should use this index to retrieve the data.

If we run a query to return a few rows from Person.Person filtering on a range of last names then we should see this index being used.  For example

SELECT *
FROM Person.Person
WHERE LastName BETWEEN 'Ware' AND 'Warthen'
ORDER BY LastName

Returns the following result set.

Key Range 2

And looking at the execution plan we can see that this index is used.

Key Range 3

Now if we set the isolation level to SERIALIZABLE, open a transaction and run the same select statement we should be able to see what locks are being held.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT *
FROM Person.Person
WHERE LastName BETWEEN 'Ware' AND 'Warthen'

Now from another connection we can query the DMV sys.dm_tran_locks to see the current locks being held.  We are only interested in the locks on the KEY resources so we filter accordingly.

SELECT *
FROM sys.dm_tran_locks
WHERE resource_type = 'KEY'
ORDER BY request_mode

Key Range 4

We get seven rows retuned with two different request_mode values.  The “S” request_mode corresponds to shared locks placed on the three rows that are returned.  Because we’re using the SERIALIZABLE isolation level these shared locks are held until the transaction commits or rolls back to prevent non-repeatable reads.  The “RangeS-S” request_mode corresponds to the key-range locks that have been granted.  But why are there four rows?  To answer this we can use the data returned from sys.dm_tran_locks and use this to work out what resources have actually been locked.

The columns we’re  interested in are resource_associated_entity_id and resource_description.  The data in these columns can vary based on the resource_type and for more details on this see TechNet. For KEY resources the resource_associated_entity_id corresponds to the partition_id from sys.partitions

SELECT *
FROM sys.partitions
WHERE partition_id in (72057594057523200, 72057594045595648)

Key Range 5

The object_id returned corresponds to the Person.Person table

SELECT object_name(1765581328)

Key Range 6

We can see that the two different partition_id values correspond to different indexes on Person.Person

SELECT *
 FROM sys.indexes
 WHERE object_id = 1765581328

Key Range 7

The sys.partitions view tells us that the partitions we’ve got locks on are for the indexes PK_Person_BusinessEntityID and IX_Person_LastName_FirstName_MiddleName.

We can combine this all in one query for clarity.

 SELECT
      L.resource_type
     ,L.resource_description
     ,L.resource_associated_entity_id
     ,L.request_mode
     ,L.request_type
     ,L.request_status
     ,OBJECT_NAME(I.object_id) AS TableName
     ,I.object_id
     ,I.index_id
     ,I.name AS IndexName
     ,I.type_desc
 FROM sys.dm_tran_locks L
 INNER JOIN sys.partitions P ON P.partition_id = L.resource_associated_entity_id
 INNER JOIN sys.indexes I ON I.object_id = P.object_id AND I.index_id = P.index_id
 WHERE L.resource_type = 'KEY'
 ORDER BY L.request_mode

Key Range 8

This shows us that three shared locks are placed on the rows themselves (the leaf level of the clustered index) and that four key-range locks are placed on keys within the non-clustered index. But this hasn’t answered the question of why we have four locks?

Before we do that I’m going to demonstrate two ways to prove that the three shared locks are placed on the rows that were returned by the original query and then use that to work out what resources the four “RangeS-S” locks are placed on.

Firstly we are going to use the %%physloc%% and %%lockres%% values along with the sys.fn_PhysLocFormatter() function to tell us the file, page and slot of the rows returned by the original query.  Secondly we use this information in the DBCC PAGE command.

We just need to update our original query as follows

SELECT sys.fn_PhysLocFormatter(%%physloc%%) AS FilePageSlot, %%lockres%% AS LockResource, *
FROM Person.Person
WHERE LastName BETWEEN 'Ware' AND 'Warthen'

Key Range 9

We can now see that the row for Dennis Ware is physically located on data file 1, on page 2806 and at slot 3. We can also see that it’s lock resource is (05c585e839d4).

If we look back to the data returned from sys.dm_tran_locks we can see that one of the shared locks has a resource_description of (05c585e839d4) so this tells us that that lock is placed on that row in the clustered index.

Now we can run the following DBCC PAGE command to actually see information about what is on the page but in order for this to return any results we need to turn on trace flag 3604.

DBCC TRACEON (3604);
DBCC PAGE (10, 1, 2806, 3); -- DB_ID, FileId, PageId, Format

There is a lot of textual information displayed here and you can specify a different number in the format parameter to return different amounts of data. In this case the first bit is the page header and then after that we have the Allocation Status. You’ll see the first slot used is Slot 0. There is an Offset value that tells us where on the page this slot starts and a Length values that tells us how long the record is.

Key Range 10

If we scroll down until we find Slot 3, and then scroll past the block of data we can see the physical data for each column and can confirm this is the row for Dennis Ware.

Key Range 11

Now, if we go back up to the end of Slot 3 we can see

KeyHashValue = (05c585e839d4)

This matches the resource_description in sys.dm_tran_locks and the value returned by %%lockres%%.

But how do we do something similar for the key-range locks? How do we find out what resources they are locking?  One way would be to use sys.dm_db_database_page_allocations to work out which pages belong to out non-clustered index. We pass in the database_id, the object_id and the index_id of the IX_Person_LastName_FirstName_MiddleName index and it will return a row for every page used within that index. We can then run DBCC PAGE with the supplied allocated_page_page_id values and find the page that contains Dennis Ware.

SELECT allocated_page_page_id, *
FROM sys.dm_db_database_page_allocations (DB_ID(N'AdventureWorks2012'), object_Id('AdventureWorks2012.Person.Person'), 4, null, 'DETAILED')
WHERE is_allocated = 1

We can then run DBCC PAGE with the supplied value and find the page that contains Dennis Ware.

DBCC PAGE (10, 1, 24745, 3); -- DB_ID, FileId, PageId, Format

The handy thing is that in this case DBCC PAGE returns the data in the page in a result set so we can quite quickly find the correct page, in this case page 24745.

Key Range 12

Now if we use the resource_description values for  our four key-range locks in sys.dm_tran_locks we can see that the “RangeS-S” locks have actually been granted to the rows that contain data for Dennis Ware, Victoria Ware, Clay Warten and Aaron Washington.  But Aaron Washington is not included in the result set, so why is this fourth key lock present?  It’s because when a key-range is taken on a key, it will prevent any data being added immediately before a key with a key-range lock.  In other words no data can be added between, for example, Victoria Ware and Clay Warthen, e.g. Jennifer Warner.  

INSERT INTO Person.Person (BusinessEntityID, PersonType, FirstName, LastName)
VALUES (888888, 'VC', 'Jennifer', 'Warner')
SELECT *
FROM sys.dm_tran_locks
WHERE resource_type = 'KEY'
ORDER BY request_mode

Key Range 13

Here we can see the waiting key-range lock.

It also means that no data can be added between Timothy Ward and Aaron Washington and means that the key-range locks actually lock more data than that which could satisfied the query in the open transaction.  For example, we could not insert a row for Simon Wartz which would be put between Clay Warthen and Aaron Washington, even though it would not be returned in the original query in the open transaction.

INSERT INTO Person.Person (BusinessEntityID, PersonType, FirstName, LastName)
VALUES (999999, 'VC', 'Simon', 'Wartz')
SELECT *
FROM sys.dm_tran_locks
WHERE resource_type = 'KEY'
ORDER BY request_mode

Key Range 14

Now we know how the key-range locks are used we can work out what resources have the key-range locks on them without the need for searching through the pages of the non-clustered index.  We can simply include a table hint to return part of the data we require.

SELECT sys.fn_PhysLocFormatter(%%physloc%%), %%lockres%%, *
FROM Person.Person WITH (INDEX = IX_Person_LastName_FirstName_MiddleName)
WHERE LastName BETWEEN 'Ware' AND 'Warthen'

Key Range 18

As you can see this gives us the same file, page and slot data and the same resource_description for the three rows returned by the query.  We now know that the fourth “RangeS-S” lock is placed on the key in the index immediately after the key for Clay Warthen, i.e. slot 140.  Again we can use DBCC PAGE to view the data on page 25065.

Now in the demonstration I’ve just run through we happen to have a non-clustered index with the first column in the clustering key being the one that we’re using in our WHERE clause.  But what happens if we remove the non-clustered index?  Or in other words, what happens if we ran a query against another table filtering on a column not included in an index key?  To easily demonstrate this we can just remove the non-clustered index.

DROP INDEX [IX_Person_LastName_FirstName_MiddleName] ON [Person].[Person]
BEGIN TRAN
SELECT *
FROM Person.Person
WHERE LastName BETWEEN 'Ware' AND 'Warthen'

We get the same rows back as expected but this time SQL Server has had to use the clustered index to get the data.

Key Range 15

So what locks have we got?  This time there is no key-range lock that can be applied, so a shared lock has to placed on the entire table.

SELECT *
FROM sys.dm_tran_locks
WHERE resource_type = 'OBJECT'
ORDER BY request_mode
SELECT object_name(1765581328)

Key Range 16

Because this shared lock is held until the transaction is closed, this means that no data in the table can be updated until the transaction completes, even though we’ve only read three rows.

INSERT INTO Person.Person (BusinessEntityID, PersonType, FirstName, LastName)
VALUES (777777, 'VC', 'Simon', 'ZZZZZ')
SELECT *
FROM sys.dm_tran_locks
WHERE resource_type = 'OBJECT'
ORDER BY request_mode

Key Range 17

This is something you need to be aware of if using the SERIALIZABLE isolation level.

Summary

If a key column is used to filter data in a query then key-range locks can be taken against the keys in that index.  They will be taken against all the keys that satisfy the query plus the sequentially next key outside of the query.  This then prevents data being added that would insert a new key immediately before a key with a key-range lock.

We also saw how to use sys.dm_tran_locks, sys.dm_db_database_page_allocations and DBCC PAGE to work out what resources have locks on them.

We only saw “RangeS-S” key-range locks but there are several more and they can be seen at TechNet.

Locking, Blocking and Isolation Levels

Last updated: 12/01/2014

As I mentioned in a previous post all actions performed against data or the SQL objects, i.e. any DML or DDL statements, are run inside transactions.  This post delves into more details into how transactions work…

SQL Server maintains data integrity and system stability by adhering to the ACID properties which is an acronym for Atomicity, Consistency, Isolation and Durability.

The isolation part of this acronym states that all transactions are run in isolation from each other and no two resources can update the same data or definition of a SQL object at the same time. In the pessimistic isolation levels (Read Uncommitted, Read Committed, Repeatable Read and Serializable) this is controlled by locking. In the optimistic isolation levels (Read Commited Snapshot and Snapsnot) this is ontrolled by row versioning.  In this post I’m only going to deal with the four pessimistic isolation levels.

The are several types of locks, including shared, exclusive, update, intent and schema. To start with we’ll just look at shared and exclusive as these are the critical ones for getting an understanding of how SQL Server uses locks when reading or writing data.

The locks can be placed on various resources and by default this is handled by SQL Server. For example, if you are just updating one row in a table then SQL Server will most likely just place a lock against that one row. If you are updating every row in a table then most likely the whole table will have a lock placed against it.

The way locks behave is controlled by the isolation level and the the default level is READ COMMITTED. At this level shared locks are requested when reading data and are released once the data has been read. They are not held on until the transaction commits. Exclusive locks are requested whenever data or SQL objects are updated. In the default level these are not released until the transaction commits.

If a resource has a lock held against it then another transaction that tries to access the same resource (by requesting a lock) may be blocked. Blocking will occur if the lock currently held is incompatible with the newly requested lock (see below for more on lock compatibility).  Exclusive locks are not compatible with other exclusive locks or with shared locks. Shared locked are compatible with each other. This means that two transactions can issue shared locks on a resource and can both read the data. It also means that one transaction cannot update data until another transaction has finished reading it and one transaction cannot read data that is being updated by another transaction (except in the READ UNCOMMITTED isolation level which is discussed in the next section).

Isolation Levels

There are four isolation levels that use locks and these allow you to control whether you want more consistency or more concurrency in your application.  The greater the consistency, the lower the concurrency and the greater the concurrency, the lower the consistency.  If you want more resources to be able to access the data quicker then you need to reduce locking which may reduce the consistency of the data.  If you want more consistency in your data then you may have to holds locks for longer which can cause more blocking and can reduce concurrency.

I’m going to demo how some of these different levels work so let’s set up a bit of test data quickly…

CREATE TABLE dbo.IsolationLevelTesting
(
 Id INT NOT NULL PRIMARY KEY
 ,SomeData VARCHAR(100) NOT NULL
);
INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (1, 'Simon');

READ UNCOMMITTED

This level allows the most concurrency but the least consistency.  When this level is set, shared locks are not requested whenever data is read.  This removes the blocking caused by the incompatibility between exclusive and shared locks and means that data can be read even if it has an exclusive lock against it if, for example, it is being updated.  The read request will read the new uncommitted version of the data that is currently held in the other transaction.  If the transaction is rolled back for some reason and the data is re-read then this time the original version of the data will be read and the uncommitted version is lost.  If you application has already read the uncommitted version and starting doing something with it then you may end up with data integrity issues if it suddenly disappears.  Reading uncommitted data is called a dirty read.  Here’s a quick demo…

From one connection begin a transaction and update the row in dbo.IsolationLevelTesting.

BEGIN TRAN
UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Peacock'
WHERE Id = 1;

This will leave the transaction open.  Now from a second connection set the isolation level to READ UNCOMMITTED and select the data from dbo.IsolationLevelTesting.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT *
FROM dbo.IsolationLevelTesting;

You should see the value Peacock returned which is the uncommitted version.  Now rollback the transaction from the first connection.

ROLLBACK TRAN

Re-run the select statement from the second connection and you should see the data is back to Simon.  

You can also use hints to override the existing session transaction isolation level and can use the following to run the select statement in the READ UNCOMMITTED level.

SELECT *
FROM dbo.IsolationLevelTesting WITH (NOLOCK);

In my opinion you should only use the WITH (NOLOCK) table hint in your production code in exceptional circumstances and I can think of no reason that you would use it.  If you are having performance problems and think using the WITH (NOLOCK) table hint will fix them then you probably need to first look at the code that is requesting the exclusive locks (or other incompatible locks) on the resource you are trying to access.  Remember that in the READ UNCOMMITTED level any select statements will read the dirty uncommitted data and not the original data before it was locked.

READ COMMITED

As mentioned above this is the default level.  This provides more consistency than READ UNCOMMITTED as read commands request shared locks which are incompatible with any exclusive locks held in open transactions that are updating data.  This prevents dirty reads.  Let’s try and re-run the code in the READ UNCOMMITTED level demo above…

Start in new transaction in one connection and update the data in dbo.IsolationLevelTesting.

BEGIN TRAN
UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Peacock'
WHERE Id = 1;

Now from a second connection set the isolation level to READ COMMITTED and then try and read from dbo.IsolationLevelTesting.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT *
FROM dbo.IsolationLevelTesting;

You’ll find that the read request doesn’t return and just hangs.  This request is blocked by the open transaction in the first connection and will just hang until the lock is released.  Now is an appropriate time to look at how we can view what locks are currently in place and whether any of them are blocking other requests.  The SPID of my first connection with the open transaction is 51 and the SPID of the second connection with the hanging read request is 52.

You can run the built-in sp_who2 stored proc to list all connections and their status.  When I run this I see the following two rows…

Locking1

You can see that SPID 52 is SUSPENDED and is blocked by SPID 51 (shown in column 5). This stored proc is quite handy for seeing if there is currently any blocking if a query you are running is hanging.  However if you want to see more information then download and run Adam Machanic’s sp_whoisactive. If I run this I can see the following two rows returned…

Locking2

This shows that SPID 51 is blocking 52 and also gives you a sql_text column that when you click on it shows you the statement being run. Here’s what I see when I click on the one of SPID 51…

Locking3

You can also delve deeper into what locks are being held by looking at the sys.dm_tran_locks view.  If I filter it by my two SPIDs as follows…

SELECT *
FROM sys.dm_tran_locks
WHERE request_session_id IN (51, 52)
ORDER BY request_session_id;

I get the following rows returned…

Locking5

Here you can see all the different locks that are placed on the various resources within each session.

The first four rows show the locks taken out for SPID 51 which is the one with the open transaction in.  You can see that there is an exclusive lock (request_mode = X) placed on the KEY resource_type.  The KEY value means a row in a clustered index as the table has a primary key.  You can also see that there is a shared lock (request_mode = S) on the DATABASE and intent exclusive locks (request_mode = IX) on the OBJECT and PAGE resources.  Intent locks are used to allow SQL Server to determine if there are any locks further down the resource chain that would prevent a lock being placed on a higher-level resource.  For example, if there was an exclusive lock on a single row in a table, there would be an intent exclusive lock on the page and the table.  If a query was run to select all data in the table, then a shared lock on the table would be requested. Because there is already an intent exclusive lock on the table, SQL Server knows that the requested shared lock on the table must wait until the exclusive lock further down the resource chain is released.  This removes the need for SQL Server to scan every page and row to check for locks.

The second four rows in sys.dm_tran_locks show the locks taken out for SPID 52 which is the hanging SELECT statement.  You can see there is a shared lock (request_mode = S) at the KEY level that has a request_status of WAIT.  This means this lock is waiting for the resource to become available.  You can also see the shared and intent shared locks on the DATABASE, PAGE and OBJECT resources.

There are also several other ways to identify and monitor locks and blocks but these you usually enough for me to work out what is going on.

Now back to the isolation levels…

If we now go and commit the open transaction in the first connection the new value is committed to the database, the exclusive lock is released and the shared lock from the second connection is allowed in and the newly committed data is read.

As mentioned above in the default READ COMMITTED level any shared locks are released as soon as the read has completed and does not care if there is still an open transaction for that batch.  Say, for example, you have a stored procedure that is run inside a transaction.  At the beginning you run a SELECT statement to check the state of some piece of data.  A shared lock is requested, the data is read and then the lock is released.  The stored proc may then do some processing and then re-run the original select statement to confirm the state of the data.  Because the shared lock is not held on the original SELECT statement another transaction could have updated the data while the other statements in the stored proc were running.  When the second SELECT happens the updated data will be returned and this could cause problems with data integrity if further processing, this time with the different data, is required. Here’s a simple demo…

From one connection set the level to READ COMMITTED, begin a transaction and select the data from dbo.IsolationLevelTesting.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

The transaction is left open but the shared locks are released and this returns the following data…

Locking6

If we check in sys.dm_tran_locks there are no open locks (except the ones at the database level – which is because we are connected to the database).  Now from a second connection run the following to update the data in the table…

UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Updated'
WHERE Id = 1;

Now back in the first connection with the open transaction re-run the SELECT statement and commit the transaction.

SELECT *
FROM dbo.IsolationLevelTesting;
COMMIT TRAN

The following is returned..

Locking7

The means that the same query executed within a single transaction can return different results.  This is called a non-repeatable read and can be prevented by using the next level up which is aptly named…

REPEATABLE READ

This level prevents non-repeatable reads and offers more consistency than READ COMMITTED by not releasing the shared locks until any open transactions are committed.  We’ll repeat the last demo…

Set the isolation level to REPEATABLE READ, begin a transaction and then select from the table.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

This returns the following…

Locking8

Now if we query sys.dm_tran_locks as follows…

SELECT *
FROM sys.dm_tran_locks
WHERE request_session_id IN (51, 52)
ORDER BY request_session_id

we get the following returned…

Locking9

We can now see the shared lock on the KEY that is still being held as the transaction is still open.

Now from a second connection if we try to update the data in that row as follows the query will just hang…

UPDATE dbo.IsolationLevelTesting
SET SomeData = 'Repeatable'
WHERE Id = 1;

If we again query sys.dm_tran_locks we get the following…

Locking10

Now we can see the exclusive lock at the KEY level waiting for the resource to become available.  If we go back to the first connection and re-run the SELECT statement we will still see the original value of Updated returned.  If we now commit the transaction the exclusive lock in the second connection is granted, the data will be updated and if we re-query from the first connection the updated value of Repeatable will be returned.

Obviously as locks are held for a longer time there is going to be more blocking on other requests and this could reduce concurrency.  However, please note, it is only incompatible locks that are blocked.

If we consider the previous example about a stored proc that opens a transaction, reads some data, does some processing and then re-read the same data, we’ve seen how to prevent dirty reads and non-repeatable reads.  But this only takes into consideration existing data.  What if another row is added after the first read that satisfies the criteria of the SELECT statement in the open transaction.  Even though the shared lock still exists in the open transaction a new row can be added and a second read of the same data will result in a different data set begin returned.  This can be demonstrated as follows.  First we need to add some more test data…

INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (2, 'Transaction'),
(4, 'Level');

Now if we use the REPEATABLE READ level, begin a transaction and SELECT from the table as follows…

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

This returns the following results…

Locking11

The transaction is still open and the shared locks still exist on these three rows.  This means these rows cannot be modified as an exclusive lock would be blocked.

However, from a second connection we can open another transaction which inserts a row into the table as follows…

BEGIN TRAN
INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (3, 'Isolation');

Now if we rerun the SELECT statement from the first transaction we get the following results returned…

Locking12

If the transaction that did the insert of the row with Id = 3 is rolled back and the SELECT  statement in the first transaction is re-run then only the original three rows are returned.

This is a phantom read and is where over the course of a transaction, two or more identical queries are run and the data sets returned are different.

To combat this we can use the serializable level.

SERIALIZABLE

This is the last of the four levels and is the strongest as far a consistency is concerned but will also cause the most blocking and is more likely to reduce concurrency. In this isolation level key-range locks are used which stop data being added if  that satisfies the conditions of a SELECT statement in an open transaction.

We can demonstrate this as follows (roll back any open transaction from previous examples)…

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN
SELECT *
FROM dbo.IsolationLevelTesting;

If we again query sys.dm_tran_locks we can see the range locks that have been granted…

Locking13

Now if we try to add a new row to the table (which would satisfy the conditions of the SELECT statement in the open transaction as we are just returning all rows) the query will just hang…

BEGIN TRAN
INSERT INTO dbo.IsolationLevelTesting (Id, SomeData)
VALUES (5, 'Locking');

The data in sys.dm_tran_locks for the INSERT statement is as follows…

Locking14

If we commit the first transaction the waiting lock will be granted and the insert will happen.

Lock Modes and Compatibility

I have already mentioned some of the lock modes that SQL Server uses and here is the complete list http://technet.microsoft.com/en-us/library/ms175519(v=sql.105).aspx.

Here is the list of the resources that SQL Server can place locks on http://technet.microsoft.com/en-us/library/ms189849(v=sql.105).aspx.

Here is the matrix that shows lock compatibility. http://technet.microsoft.com/en-us/library/ms186396(v=sql.105).aspx

Row-Version Transaction Isolation Levels

In addition to the four transaction isolations levels that use locking there are also the SNAPSHOT and READ COMMITTED SNAPSHOT levels.  Rather than placing locks on resources previous versions of data are stored in tempdb.  I hope to talk about these in a separate post at some point in the future but for further reading please see here http://technet.microsoft.com/en-us/library/ms177404(v=sql.105).aspx

Summary

In most cases the default READ COMMITTED transaction isolation level is the one to use.  It’s always good practise to keep you transaction as short as possible as this will prevent locks being held for a long time.

Transactions – the basics

I’m writing this blog because I’m currently working toward the 70-464 exam and transactions are bound to be covered. I wanted to consolidated my understanding and this is what I think I know…

A transaction is a logical unit of work that either completes as a whole or doesn’t complete at all. There is no way for a transaction to partially complete with only some data being updated. Anything that updates anything in SQL Server is handled in a transaction. Reading data out of SQL Server is also handled in a transaction.

SQL Server adheres to the ACID properties of transactions which ensure data integrity.

The ACID properties stand for Atomicity, Consistency, Isolation and Durability.

The atomicity property ensure that each transaction is handled as a single piece work that either succeeds in it’s entirety or not at all. If you’re trying to update 100 rows and one of them causes a error none of the 100 rows will be updated.

The consistency property ensures that the DB is left in a consistent state after the transaction completes. If not the transaction is rolled back. SQL Server uses constraints such as foreign keys to enforce this.

The isolation property ensures that transactions are isolated from each other during processing. There are various levels of isolation that make the locking of data behaving in different ways including a level that causes no locking and uses row versioning but this will be the subject of another blog.

The durability property ensures that even if there is a service interruption the data is in a usable state. SQL Server uses the transaction log to enforce durability.

By default all individual T-SQL statements are automatically committed. If you run two update statements in the same batch SQL Server will treat them as two single transactions. The first will either completely commit or not and then the second one will either completely commit or not.

You can configure SQL Server so that it does not automatically commit each statement. This forces you to issue a COMMIT or ROLLBACK command to finish the transaction. This is called implicit transaction mode. I have never used this mode but I think this is default behaviour in Oracle databases.

There is also explicit transaction mode which is where you issue a BEGIN TRAN command to start a transaction. You then execute your statement or batch or statements within this explicit transaction and either COMMIT or ROLLBACK all statements within that transaction. It’s a good idea to have error handling in your explicit transactions so that you can rollback the entire transaction completely if any statement fails.

Whenever a transaction starts the @@TRANCOUNT function gets increment by 1. If there are no transactions open the @@TRANCOUNT is 0. When a transaction starts it gets upped to 1. You can use this in your error handling to determine if there are any open transactions that need to be closed.

There is also a function XACT_STATE() that indicates the current state of any transaction. This is another useful function in error handling. A state of 0 means there are no open transactions. 1 means there is an uncommitted open transaction that can be committed. 2 means there is an open transaction that has caused a fatal error as cannot be committed. This must be rolled back.

You can turn on the XACT_ABORT property by running SET XACT_ABORT ON before starting a transaction. This will automatically rollback all statements within a transaction if any one of them fails. This property is enabled by default in triggers and causes not only the statement(s) in the trigger to rollback but also the statement that caused the trigger to fire.

You can nest transaction within each other, i.e. issue a BEGIN TRAN statement within another BEGIN TRAN statement. However be aware that nested transaction do not actually work in a completely nested way.

When you issue a BEGIN TRAN command the @@TRANCOUNT is upped to 1 (assuming it was originally 0). You can execute some statements within this transaction. If you then crated a nested transaction by issuing another BEGIN TRAN command the @@TRANCOUNT is upped to 2. You can execute some more statements and then issue a COMMIT command. All that actually happens here is the @@TRANCOUNT value is lowered to 1. Nothing is actually committed to the DB as the initial transaction is still open. If you now issue a ROLLBACK command the entire set of statements since the first BEGIN TRAN are rolled back. If you issue a COMMIT command the entire set of statements are committed. Nested transactions just force you to issue multiple COMMIT commands and it’s only the COMMIT that is executed when the @@TRANCOUNT is 1 that commits all the statements to the DB.

Rolling back statements in a nested transaction is handled slightly differently. If you’ve got five transactions nested within each other you do not have to issue five ROLLBACK commands to rollback the statements. As soon as you issue a ROLLBACK command the entire transaction is rolled back regardless of the nesting level.

You can rollback part of a transaction by using save points.  Here’s a little demo to show how save points work.

Create a table.

CREATE TABLE dbo.SavePointTest
(
 Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
 SomeData VARCHAR(10) NOT NULL
);

Check the @@TRANCOUNT and XACT_STATE.  Both should be 0 as we’ve not started a transaction yet.

SELECT @@TRANCOUNT;
SELECT XACT_STATE();

Start a transaction.

BEGIN TRAN;

Recheck the @@TRANCOUNT and XACT_STATE.  Both should now be 1.

Insert a row into the table.

INSERT INTO dbo.SavePointTest (SomeData)
VALUES ('Simon');

If you run a SELECT statement within the transaction you will be able to see the new row in the table.

SELECT *
FROM dbo.SavePointTest;

Transactions1

Now we can create a named save point by the running the following.

SAVE TRANSACTION t1;

@@TRANCOUNT and XACT_STATE are still set to 1.  We can now insert a second row and it gets returned.

INSERT INTO dbo.SavePointTest (SomeData)
VALUES ('Peacock');
SELECT *
FROM dbo.SavePointTest;

Transactions2

Now if we decide we don’t the second row we can now rollback to the point in the code where we issued our save point.  We just need to include the name of the save point in the ROLLBACK command as follows.

ROLLBACK TRAN t1;
SELECT *
FROM dbo.SavePointTest;

Transactions3

The transaction is still open and @@TRANCOUNT and XACT_STATE are still set to 1.  At this point we can either commit or rollback the open transaction.  If we commit the row is added to the table, if we rollback the table remains empty.

COMMIT TRAN;

@@TRANCOUNT and XACT_STATE will now be set to zero.

This is a good time to point out that not everything gets completely rolled back when we issue a ROLLBACK command.  All the DDL and DML statements will be completed rolled back but one thing that isn’t are identity values.  During the transaction above we added a row “Simon” to the table.  This upped the identity value on the Id to 1.  We then added a second row “Peacock”.  This upped the identity value to 2.  Even though we then issued a ROLLBACK only the data change was rolled back and the identity value remained at 2.  So if we now try and add the row “Peacock” again we see the new row if added but this time it’s been given an Id of 3.

INSERT INTO dbo.SavePointTest (SomeData)
VALUES ('Peacock');
SELECT *
FROM dbo.SavePointTest;

Transactions4

 

 

 

Transactions within Triggers

I’ve just spotted quite an interesting question of the day on SQL Server Central from Friday.  http://www.sqlservercentral.com/questions/T-SQL/98251/

It’s to do with how transactions behave inside a trigger.  I selected 4 as my answer as I thought the transaction only existed within the trigger itself and did not affect the calling transaction.  As the trigger is an “after” trigger, I assumed it would write the data in the second insert statement, throw the error and then continue to write the data in the third and fourth insert statements.  It turns out this is not the case and a rollback statement in a trigger will roll back the open transaction (in this case the second insert statement) and then kill the whole batch so no further statements are executed.