Internal Storage – Part 1 – DMVs

SQL Server has a bunch of system tables that it uses to keep track of the storage of data for any tables a user may have created.  SQL Server also provides DMVs (dynamic management views) that sit over the top of these system tables and we can use these to view how the storage engine is storing our data.  Being able to see this storage metadata is important in deciding how we set up our table, including what data types to use.

Whenever we create a table a row can be seen in sys.indexes.  The name of this is slightly confusing as it shows heaps (tables without a clustered index) as well as indexes.  For example, we can create two tables, one a heap and one a clustered index and query sys.indexes.

CREATE TABLE dbo.HeapTest
(
    Id INT,
    SomeCharData CHAR(100)
);

CREATE TABLE dbo.ClusteredIndexTest
(
    Id INT CONSTRAINT PK_ClusteredIndexTest PRIMARY KEY,
    SomeCharData CHAR(100)
);

SELECT OBJECT_NAME(object_id) AS TableName, object_id, name, index_id, type, type_desc
FROM sys.indexes
WHERE object_id = OBJECT_ID('dbo.HeapTest');

SELECT OBJECT_NAME(object_id) AS TableName, object_id, name, index_id, type, type_desc
FROM sys.indexes
WHERE object_id = OBJECT_ID('dbo.ClusteredIndexTest');

This returns the following results…

InternalStorage03

A table can be either a heap or a clustered index but can contain many non-clustered indexes.  For a heap the index_id is always 0 and for a clustered index it is always 1.  For non-clustered indexes index_id will be > 1.

A table can be split into up to 15,000 partitions and by default has just one partition.  There is another DMV named sys.partitions and we can run the following query against this for our two tables.

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.HeapTest');

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.ClusteredIndexTest');

This returns the following results…

InternalStorage04

A subset of a table on a single partition is called a hobt (Heap or B-Tree) and you can see the hobt_id is returned in the results above.  We can also see that there are no rows in these partitions.

A hobt is made up of allocation units and each allocation unit can be one of three types.  IN_ROW_DATA, LOB_DATA and ROW_OVERFLOW_DATA.  There is a third DMV we can look at that tell us what allocation units exist in our table.  If we just look at dbo.ClusteredIndexTest from now on (similar results would be returned for dbo.HeapTest) we can join sys.partitions to sys.allocation_units to tell us what allocation units we have.

SELECT AU.*
FROM sys.partitions P
INNER JOIN sys.allocation_units AU ON AU.container_id = P.partition_id
WHERE P.object_id = OBJECT_ID('dbo.ClusteredIndexTest');

This returns the following results…

InternalStorage05

There is an 8060 byte limit on each page for the in row data and the two columns in dbo.ClusteredIndexTest are 8 bytes for the INT and 100 bytes for the CHAR column.  This means that any rows inserted into the table will be 108 bytes which means it can be stored on a single page.  This is why we only see the IN_ROW_DATA allocation unit returned by sys.allocation_units.

If we now add a column with data type VARCHAR(8000) the total number of bytes in one row could now be up to 8108 which is greater than the 8060 limit.  We can also add another column with the VARCHAR(MAX) data type which replaces the deprecated TEXT data type. There are also NVARCHAR(MAX) and VARBINARY(MAX) data types and all three of these can store their data on LOB_DATA pages.

ALTER TABLE dbo.ClusteredIndexTest ADD TestRowOverflow VARCHAR(8000);
ALTER TABLE dbo.ClusteredIndexTest ADD TestLOB VARCHAR(MAX);

As the TestRowOverflow and TestLOB columns are of variable length the data will be stored on the IN_ROW_DATA page until the total row size exceeds the 8060 byte limit for in row data.  I will go into more detail on this in a future post but if we rerun our query looking at sys.allocation_units we can now see all three types of allocation unit

InternalStorage06

If we add a row to the table that is within the 8060 limit then we can see it fits on a page in the IN_ROW_DATA allocation unit.

INSERT INTO dbo.ClusteredIndexTest (Id, SomeCharData, TestRowOverflow)
VALUES (1, REPLICATE('B', 100), REPLICATE('B', 100));

This inserts an INT (8 bytes) into Id, a string of 100 “A” characters into SomeCharData and a string of 100 “B” characters into TestRowOverflow.  This gives a total row size of 8 + 100 + 100 = 208 which easily fits on a single data page.

If we rerun our query we can see that all the data is in one page within the IN_ROW_DATA allocation unit.  You will see the total_pages is 2 and I will go into more detail on this in a future post.

InternalStorage07

Now if we increase the size of the data in TestRowOverflow to 8000 bytes we can see that we now have some data pages in the ROW_OVERFLOW_DATA allocation unit

UPDATE dbo.ClusteredIndexTest
SET TestRowOverflow = REPLICATE('C', 8000)
WHERE Id = 1;

InternalStorage08

If we add a string of 100 characters to the column LOBTest and rerun the query we will see that the page count is unchanged meaning that the 100 characters are stored on the IN_ROW_DATA page.

UPDATE dbo.ClusteredIndexTest
SET TestLOB = REPLICATE('D', 100)
WHERE Id = 1;

If we update TestLOB to be a string of 8000 characters we can see the data is now stored on the LOB_DATA pages

UPDATE dbo.ClusteredIndexTest
SET TestLOB = REPLICATE('D', 8000)
WHERE Id = 1;

InternalStorage09

We can store more than 8000 characters in the MAX data types and the limit is set by SQL Server which in 2012 is 2GB.

UPDATE dbo.ClusteredIndexTest
SET TestLOB = REPLICATE('D', 20000)
WHERE Id = 1;

There is also undocumented version of sys.allocation_units that is more like the older DBCC IND command which also returns details of the first pages in the chain.

SELECT AU.*
FROM sys.partitions P
INNER JOIN sys.system_internals_allocation_units AU ON AU.container_id = P.partition_id
WHERE P.object_id = OBJECT_ID('dbo.ClusteredIndexTest');

InternalStorage10

Now that we have some data in our table we can look at one last DMF sys.dm_db_database_page_allocations which actually tells us what pages have been allocated to which allocation unit.

SELECT
    database_id,
    object_id,
    index_id,
    partition_id,
    rowset_id,
    allocation_unit_id,
    allocation_unit_type,
    allocation_unit_type_desc,
    extent_page_id,
    allocated_page_iam_file_id,
    allocated_page_iam_page_id,
    allocated_page_page_id,
    is_allocated,
    is_iam_page,
    page_type,
    page_type_desc
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('dbo.ClusteredIndexTest'), NULL, NULL, 'DETAILED')

InternalStorage11

Summary

The DMVs and DMF that we’ve seen here are useful tools in seeing how the storage engine is storing your data.  In a future post I will talk more about the data on the actual page and give examples of how to work out what page data is stored on.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s