We have some functionality within our application that generates some search strings for an entity before they are compiled into a JSON document and sent up to Elasticsearch. We have one row in a table per entity and there is a StatusId column used to track the progress of the generation of the search strings. There are a few different stored procedures that change the values in the StatusId column and recently we’ve been using a standard piece of code in each of these stored procedures that declares some variables with descriptive names to hold the different StatusIds. These variables then get used within the stored procedure instead of hard coded values.
So instead of something like
UPDATE dbo.Test SET StatusId = 2 WHERE StatusId = 1;
DECLARE @Queue INT = 1; DECLARE @Pickup INT = 2; DECLARE @Process INT = 3; UPDATE dbo.Test SET StatusId = @Pickup WHERE StatusId = @Queue;
I’ve just been looking into whether SQL Server behaves differently when executing these two queries and have found that it does. There are potential performance issues where the optimiser gets the estimated number of rows very wrong when using the query with the parameters.
To demonstrate this lets create a table so follows:
CREATE TABLE dbo.Test ( Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY, StatusId INT NOT NULL, SomeText VARCHAR(100) NOT NULL ); GO
Now we can populate this with 100,00 rows as follows:
INSERT INTO dbo.Test (StatusId, SomeText) VALUES (1, 'Some test data'); GO 100000
So we now have 100,000 rows where the StatusId = 1. Let’s update 100 rows to have a StatusId = 2.
UPDATE TOP (100) dbo.Test SET StatusId = 2 WHERE StatusId = 1;
Let’s update the statistics on the table just to make sure we’re all up to date.
UPDATE STATISTICS dbo.Test;
Now we can run the two different types of query that yield the same results.
SELECT * FROM dbo.Test WHERE StatusId = 1; GO DECLARE @StatusId INT = 1; SELECT * FROM dbo.Test WHERE StatusId = @StatusId; GO
If we look at the execution plan we can see the first query without the parameter has managed to estimate the number of rows correctly.
The query with the parameters has estimated the number of rows as 50,000.
SQL Server has worked out that there are two possible values in the StatusId column and it’s parametrised the query based on either of those values. If we update another 100 rows to have a third StatusId will we see the estimated number of rows changes.
UPDATE TOP (100) dbo.Test SET StatusId = 3 WHERE StatusId = 1; GO UPDATE STATISTICS dbo.Test; GO SELECT * FROM dbo.Test WHERE StatusId = 1; GO DECLARE @StatusId INT = 1; SELECT * FROM dbo.Test WHERE StatusId = @StatusId; GO
As this is a simple SELECT from one table SQL Server has parametrised the query which can be seen in the execution plan. Notice the SELECT statement near the top has StatusId = @1.
This means that SQL Server will use the same execution plan if the query remains the same but with a different value for the StatusId. So if we change the query to return the 100 rows that are at StatusId = 2 the estimated number of rows is bad for each query.
SELECT * FROM dbo.Test WHERE StatusId = 2; GO DECLARE @StatusId INT = 2; SELECT * FROM dbo.Test WHERE StatusId = @StatusId; GO
For the first query we have the following:
This is only because SQL Server parametrised the query and used the same plan for StatusId = 1 and StatusId = 2 but there are lots of restrictions that stop SQL Server doing this by default. For example we can add another table that contains the status and create a foreign key to our first table as follows.
CREATE TABLE dbo.Status ( Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY, StatusDescription VARCHAR(100) NOT NULL ); GO INSERT INTO dbo.Status (StatusDescription) VALUES ('Queue'), ('Pickup'), ('Process'); GO ALTER TABLE dbo.Test ADD CONSTRAINT FK_Test_StatusId FOREIGN KEY (StatusId) REFERENCES dbo.Status (Id); GO
Using a JOIN prevent SQL Server from parametrising the query so we can run the following queries, one without using a parameter and one using a parameter.
SELECT * FROM dbo.Test T INNER JOIN dbo.Status S ON S.Id = T.StatusId WHERE S.Id = 2; GO DECLARE @StatusId INT = 2; SELECT * FROM dbo.Test T INNER JOIN dbo.Status S ON S.Id = T.StatusId WHERE S.Id = @StatusId; GO
We can see that both queries return the same 100 rows. If we look at the execution plans we can see they are very different. We can also see that the first query has not been parametrised and the plan contains the actual value of 2.
If we check the estimated and actual rows in the first execution plan we can see that the index seek and key lookup both have 100 for both values. The estimated number of rows for the clustered index scan is still 33333.33 which is the number of rows divided by the number of distinct values in those rows.
Each situation will be different but in most cases it looks like it is not a good idea to use a descriptive static variable in your code instead of using a hard coded value, i.e. one where the value of the parameter is always going to be the same. SQL Server doesn’t know that the value in the variable is going to be the same every time and generates an execution plan based on all possible values.