Using $PARTITION to find the last time an event occured

In a previous post I showed how you can use the $PARTITION system function to force your query to only look at one or more partitions and today I have a semi-practical example of how to use this.

We have a very big table that holds all our financial transactions along with an integer indicating what type of transaction it is.  I wanted to find out when the last time we recorded a certain transaction type as I think it might have been removed quite some time ago.

This could obviously been done as follows:

SELECT MAX(T.MyDate)
FROM dbo.MyTable T (NOLOCK)
WHERE T.TransTypeId = X

and depending on how your table is structured and indexes this may be the best way to do it.

Our table is partitioned by month on our date column so I can use $PARTITION to search each partition, starting with the most recent, and search for the last date TransTypeID = X was used.  When I find a value we can stop.  This can be done as follows:

DECLARE @Counter INT = 200 -- this should be the maximum partition ID on our table
DECLARE @MaxDate DATETIME2(0);

WHILE @Counter > 0
BEGIN

  RAISERROR('%d', 0,1,@Counter) WITH NOWAIT;

  SELECT @MaxDate = MAX(T.MyDate)
  FROM dbo.MyTable T
  WHERE T.TransTypeId = X
  AND $partition.PF_MyTable(MyDate) = @Counter

  IF @MaxDate IS NOT NULL
  BEGIN
    SELECT @MaxDate, @Counter
    BREAK;
  END

  SET @Counter -= 1;

END;

If TransTypeID X has occurred recently then this code won’t need to loop through many partitions but if it last occurred in partition 1 then this is probably going to be much slower.  It all depends on how your table is set up but I think it’s always nice to have multiple ways to potentially speed up a query.

Advertisements

More Partitioning

In a previous post I wrote an introduction to partitioning with a simple demo.  In this post I’m going to use a larger data set to compare the performance of partitioned and non partitioned tables.  I’ll also show that you need make sure any queries where you want to use partition elimination have the correct search terms.

Firstly, I’ll create a new partition function and scheme that will partition across every month in 2016.

CREATE PARTITION FUNCTION BlogPartFunc1 (DATETIME2(0))
AS RANGE RIGHT FOR VALUES ('20160101', '20160201', '20160301', '20160401', 
 '20160501', '20160601', '20160701', '20160801', 
 '20160901', '20161001', '20161101', '20161201' );
GO
CREATE PARTITION SCHEME BlogPartScheme1
AS PARTITION BlogPartFunc1 ALL TO ([PRIMARY]);
GO

Next I’ll create two tables…

CREATE TABLE dbo.LargeVolumeNonPartitioned
(
 Id INT NOT NULL IDENTITY(1,1),
 SomeDate DATETIME2(0) NOT NULL,
 SomePadding1 NVARCHAR(4000) NOT NULL,
 CONSTRAINT PK_LargeVolumeNonPartitioned PRIMARY KEY CLUSTERED (Id, SomeDate)
) ON [PRIMARY];
GO
CREATE TABLE dbo.LargeVolumePartitioned
(
 Id INT NOT NULL IDENTITY(1,1),
 SomeDate DATETIME2(0) NOT NULL,
 SomePadding1 NVARCHAR(4000) NOT NULL,
 CONSTRAINT PK_LargeVolumePartitioned PRIMARY KEY CLUSTERED (Id, SomeDate)
) ON BlogPartScheme1 (SomeDate);
GO

This has created two tables with the same columns, one partitioned and one not, and both with a clustered key on the Id and the SomeDate  columns.  Note that, I’ve purposely not put the SomeDate column as the first column in the clustered index as just doing this would bring the performance of querying the non partitioned table close to the partitioned table when filtering on a date range.  I’ll explain more about this further down.  There is also a column SomePadding1 where we can insert a large amount of text data (this will ensure we don’t get too many rows stored on the same data page and highlight the performance improvement even further)

I will run the following to stick some data in the two tables

SET NOCOUNT ON
DECLARE @DateTime DATETIME2(0) = '20151201'

WHILE @DateTime < '20170101'
BEGIN

 INSERT INTO dbo.LargeVolumePartitioned (SomeDate, SomePadding1)
 SELECT @DateTime, REPLICATE('A', 1000);

 INSERT INTO dbo.LargeVolumeNonPartitioned (SomeDate, SomePadding1)
 SELECT @DateTime, REPLICATE('A', 1000);

 SET @DateTime = DATEADD(MINUTE, 1, @DateTime)

END
GO

This will add a row for every minute between 01/12/2015 and 01/01/2017.  This takes a few minutes to run so feel free to cut it down if you’re using it.

This gives us 571,680 rows in each table, 1440 for each day and spread over just under 145,000 data pages

If we run a query to return all rows for 01/01/2016 from both tables in the same batch we obviously get the same data returned but we get a slightly different execution plan and different info returned by setting statistics IO and time on.

SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT *
FROM dbo.LargeVolumeNonPartitioned LVP
WHERE LVP.SomeDate >= CAST('20160101' AS DATETIME2(0))
AND LVP.SomeDate < CAST('20160102' AS DATETIME2(0))

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE LVP.SomeDate >= CAST('20160101' AS DATETIME2(0))
AND LVP.SomeDate < CAST('20160102' AS DATETIME2(0))

We get the following in the Messages tab from statistics IO and time

(1440 row(s) affected)
Table 'LargeVolumeNonPartitioned'. Scan count 5, logical reads 162373, physical reads 0, read-ahead reads 79043, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
 CPU time = 374 ms, elapsed time = 9401 ms.
SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 0 ms.

(1440 row(s) affected)
Table 'LargeVolumePartitioned'. Scan count 1, logical reads 11219, physical reads 0, read-ahead reads 264, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
 CPU time = 16 ms, elapsed time = 243 ms.
SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 0 ms.

There is obviously a large factor in the difference between the page reads and the query time differs from over 9 seconds to 0.243.

Here is the execution plans for both

01

If we look at the properties of the Clustered Index Scan on the partitioned table we see that we’ve only accessed 1 partition (Actual Partition Count) and partition 2 is the one that’s been accessed (Actual Partitions Accessed), i.e. the optimiser has used partition elimination to just look at the subset of data in that single partition and just scanned that. Whereas, the Clustered Index Scan on the non partitioned table has had to scan then entire table.

02

As I mentioned above I purposely didn’t put the SomeDate column as the first column in the clustered index.  If I had this would have meant that a clustered index seek operation would probably have been performed instead of a scan as shown in the execution plans above.  This means that on the non partitioned table, the query engine has had to read every page in the table.  On the partitioned table, the optimser has already told the query engine to just use partition 2 so it just has to do a full scan of this subset of data.

So like everything in SQL Server it depends on how you want to store your data and how you read and write that determines whether you partition the table or not.  If you have a large table and several of the your queries are causing long clustered index scans you might benefit from trying to partition the table (or you clustered index needs to be changed or new non clustered indexes added).

We have a table with a composite clustered key spread across 6 different columns.  We query this table in our code in lots of different ways and it was best to have a particular column as the first column in the clustered index as most queries used this one.  However, we have a date column that stored the date that the event for that row of data occurred.  This is part of the clustered key and we have partitioned the table on this column by month.  This allows us to quickly reduce the data set to the current month or last 6 months in lots of other queries and then perform additional filtering on the smaller data sets.

One thing that you do have to be wary of is ensuring your query allows partition elimination to occur.  In the example above the SomeDate column is a DATETIME2(0) data type.  You might have noticed that when filtering my query on the SomeDate column I explicitly cast the textual date values to a DATETIME2(0) data type.  If I hadn’t have done this we would have seen the following results in the statistics IO and Clustered Index Scan operation

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE LVP.SomeDate >= '20160101'
AND LVP.SomeDate < '20160102'

(1440 row(s) affected)
Table 'LargeVolumePartitioned'. Scan count 13, logical reads 143680, physical reads 0, read-ahead reads 132006, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
 CPU time = 483 ms, elapsed time = 2157 ms.
SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 0 ms.

03

The page reads has gone up from 11,219 to 143,680 and the Cluster Index Scan has now accessed all 13 partitions.

Even if we cast the date value to a DATETIME2 but with a different precision we still don’t get partition elimination.

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE LVP.SomeDate >= CAST('20160101' AS DATETIME2(1))
AND LVP.SomeDate < CAST('20160102' AS DATETIME2(1))

04

The same is true for other data types such as INT, BIGINT etc.  It’s something you need to be wary of when querying partitioned tables.

One last thing I want to demonstrate is a way of forcing the query engine to just look at one or more partitions.  We can do this with the $PARTITION system function.  We can use this to work out what partition a certain value sits on and when we know that we can ue that to filter our data.

Firstly, let’s use to should how many rows exist on each partition as follows…

SELECT $PARTITION.BlogPartFunc1(SomeDate) AS PartitionNumber, COUNT(*) as RowCount
FROM dbo.LargeVolumePartitioned LVP
GROUP BY $PARTITION.BlogPartFunc1(SomeDate)
ORDER BY 1

10

Here PartitionNumber = 1 is all the data before 01/01/2016, PartitionNumber = 2 is all the data in January 2016 and so on.  As you can remember we’ve got a row for every minute between 01/12/2015 and 01/01/2017 and you can see that all the months with 31 days have 44,640 rows, all the months with 30 days have 43,200 and February (PartitionNumber = 3) has 41,760 rows.

The query above comparing the partitioned and non-partitioned tables was looking at data just on 01/01/2016.  We know that this data sits on PartitionNumber = 2 but we can confirm this as follows…

SELECT $PARTITION.BlogPartFunc1('20160101');

11

Let’s say we want to return all the data for February.  We know this is on partition 3 so we can run the following…

SELECT *
FROM dbo.LargeVolumePartitioned LVP
WHERE $PARTITION.BlogPartFunc1(SomeDate) = 3

12

If we check the properties of the Clustered Index Scan operator in the execution plan we can see that only partition 3 was accessed.

13

Partitioning is a powerfully optimisation tool and is something that definitely should be considered during design time, particularly on wide tables where you might want to filter in several different ways but are only usually just interested in the latest data.

There is one final part of partitioning that I want to demonstrate and this is Partition Switching which I will show in the next post.

 

Introduction to Partitioning

Introduction

Table and index partitioning is something that has been around for a while and it’s something we use a lot on some of our large tables, particularly where we only really want to access the newest data and can do without having to think about older data.  There are two parts to partitioning that I think are very useful.  One is the ability to switch partitions between tables.  This is handy, for example, if you have a “live” table and an “archive” table.  You may only want to keep the last 12 months worth of data in your “live” table but still keep the older data in the “archive” table for various reasons including auditing.  You might want a process that archives the oldest month of data from the “live” table to the “archive” table.  Without partitioning you’ll likely have to copy the data from the “live” table into the “archive” table and delete the “live” data.  This means that all that data gets moved from one bunch of data pages to another.  If your “live” and “archive” tables are both partitioned by month then you can just switch the oldest partition from “live” to “archive”.  This leaves the data in the same pages and just updates that metadata that says those pages belong to the “archive” table now.  I’m not going to demo partition switching here but here is some more detail if you are interested: https://technet.microsoft.com/en-us/library/ms191160(v=sql.105).aspx

The second useful part is the bit I use all the time and this is a performance improvement where we only touch a small number of the partitions in a table rather than the whole table.  This is called “partition elimination” and is where the optimiser works out it doesn’t have to bother with certain partitions so eliminates them from the execution plan.  An example of this is a bunch of reporting tables we have with a website that calls a stored proc to read the report data.  The reports are all displayed by month and you chose the month you want to view from a drop down list at the top.  These reporting tables are partitioned by month and only one partition is ever accessed at a time.

You can store each partition on a different filegroup so you can have data that is accessed more frequently on disks with better I/O and put stuff rarely touched on less efficient disks.  In the following demo I’m showing you how to put the partitions on different filegroups but the main point for this is to show the performance improvement with partition elimination.

Demo

I’ve already got a database “SimonBlog” with a single PRIMARY filegroup.  I’m now going to add two new filegroups and files.

USE master
GO

ALTER DATABASE SimonBlog
ADD FILEGROUP BlogFG1;
GO

ALTER DATABASE SimonBlog
ADD FILEGROUP BlogFG2;
GO

ALTER DATABASE SimonBlog 
ADD FILE 
(
 NAME = BlogFG1F1,
 FILENAME = 'D:\Data\MSSQL11.MSSQLSERVER\MSSQL\DATA\BlogFG1F1.ndf',
 SIZE = 50MB,
 MAXSIZE = 1000MB,
 FILEGROWTH = 50MB
)
TO FILEGROUP BlogFG1;
GO

ALTER DATABASE SimonBlog 
ADD FILE 
(
 NAME = BlogFG2F1,
 FILENAME = 'D:\Data\MSSQL11.MSSQLSERVER\MSSQL\DATA\BlogFG2F1.ndf',
 SIZE = 50MB,
 MAXSIZE = 1000MB,
 FILEGROWTH = 50MB
)
TO FILEGROUP BlogFG2;
GO

Before we can create a partitioned table we need to create a partition function and a scheme that maps the partitions to one of more filegroups.  The partition function is the thing that specifies how you want your table partitioned and this can be used by more than one table.

So let’s create a partition function that partitions the first quarter of this year into different months.  The partitioning data type is DATETIME2(0) which means we can only use this partition on a table with a DATETIME2(0) data type column.

USE SimonBlog
GO

CREATE PARTITION FUNCTION BlogPartFunc (DATETIME2(0))
AS RANGE RIGHT FOR VALUES ('20160101', '20160201', '20160301');

We can see this partition function and it’s values in the following DMVs

SELECT *
FROM sys.partition_functions;

partitioning01

SELECT *
FROM sys.partition_range_values;

partitioning02

Even though we’ve only specified three  boundaries, ‘20160101’, ‘20160201’ and ‘20160301’, we will actually have four possible partitions: first, everything up to ‘20160101’; second, everything from ‘20160101’ up to ‘20160201; third, everything from ‘20160201’ up to ‘20160301’: and fourth, everything from ‘20160301 onward.

The “RIGHT” keyword in the “…AS RANGE RIGHT FOR VALUES…” part can either be “LEFT” or “RIGHT” and specifies whether the boundary value is in the left partition or the right.  In this case we’ve used “RIGHT” and this means the first partition is everything up to but not including ‘20160101’ and the second partition is everything after and including ‘20160101’, i.e, the boundary value of ‘20160101’ is included in the second partition.  If we used “LEFT” then ‘20160101’ would be in the first partition, i.e. it would be everything up to and including ‘20160101’.

So anyway, we have four possible partitions and now we need to map these to one or more filegroups.  We can map all partitions to one filegroup in the following way

CREATE PARTITION SCHEME BlogPartScheme
AS PARTITION BlogPartFunc ALL TO ([PRIMARY]);

or we can specify where each of the four partitions goes

CREATE PARTITION SCHEME BlogPartScheme
AS PARTITION BlogPartFunc TO (BlogFG1, [PRIMARY], BlogFG2, [PRIMARY]);

This puts the first partition on the filegroup BlogFG1, the third partition on BlogFG2 and the second and fourth both on PRIMARY – you probably wouldn’t do this but this just highlights that it’s completely flexible.

We can now create our partitioned table.  We just have to specify that the table is being created on our partition scheme and what column is going to be used for partitioning.

CREATE TABLE dbo.BlogPartitionTest
(
 Id INT NOT NULL IDENTITY(1,1),
 SomeDate DATETIME2(0) NOT NULL
) ON BlogPartScheme (SomeDate);

You might notice that this table is currently a heap.  There is a requirement where the partitioning column must be part of any unique index on the table so I couldn’t use the Id column as a PRIMARY KEY.  If I tried to create a new table on the same partition scheme with a PRIMARY KEY on the Id column then I get an error

CREATE TABLE dbo.BlogPartitionTest2
(
 Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
 SomeDate DATETIME2(0) NOT NULL
) ON BlogPartScheme (SomeDate);

Msg 1908, Level 16, State 1, Line 1
Column 'SomeDate' is partitioning column of the index 'PK__BlogPartitionTes__5EA4867D'. Partition columns for a unique index must be a subset of the index key.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

The table doesn’t have to be a heap, it can be a clustered index, it just can’t be unique unless the partitioning column is included.  For example we can add a clustered index to the first table

CREATE CLUSTERED INDEX IX_BlogPartitionTest ON dbo.BlogPartitionTest (Id)

We can now query another DMV to view our four partitions

SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'BlogPartitionTest'

partitioning03

Please note, if you query sys.partitions with no WHERE clause you might be surprised to see several rows returned.  This DMV considers every table to contain at least one partition, so all non-partitioned tables are actually tables with a single partition.

Now we can insert some data across the various partitions and see where it all sits

INSERT INTO dbo.BlogPartitionTest(SomeDate)
VALUES ('20151203'),
('20160101'),
('20160131'),
('20160215'),
('20160302');

So we’ve inserted five rows.  Looking at the dates it should be obvious that the first rows will be on the first partition, the second and third rows will be on the second partition, the fourth row will be on the third partition and finally the fifth row on the last partition. If we re-query sys.partitions we can see the rows counts confirm this

SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'BlogPartitionTest'

partitioning04

Now we can query the table and try to pull back the row for ‘20160101’ as follows

SELECT *
FROM dbo.BlogPartitionTest
WHERE SomeDate = CAST('20160101' AS DATETIME2(0))

partitioning05

If we include the execution plan we can view the properties of the data retrieval operation and confirm that partition elimination has been used and only one partition has been accessed

partitioning06

partitioning07

The “Actual Partition Count” tells us that on 1 partition was accessed and “Actual Partitions Accessed” tells us that partition 2 was the only one accessed.

If we run a select over the whole table and view the properties of the data retrieval operation we see the following

partitioning08

As expected we can see that all 4 partitions were accessed.

This demo only contains a very small amount of data but in the next post I’ll show another demo with a larger volume of data and compare performance of partitioned versus non-partitioned.  I’ll also show that you need make sure any queries that you want to use partition elimination use the correct search terms.

Internal Storage – Part 1 – DMVs

SQL Server has a bunch of system tables that it uses to keep track of the storage of data for any tables a user may have created.  SQL Server also provides DMVs (dynamic management views) that sit over the top of these system tables and we can use these to view how the storage engine is storing our data.  Being able to see this storage metadata is important in deciding how we set up our table, including what data types to use.

Whenever we create a table a row can be seen in sys.indexes.  The name of this is slightly confusing as it shows heaps (tables without a clustered index) as well as indexes.  For example, we can create two tables, one a heap and one a clustered index and query sys.indexes.

CREATE TABLE dbo.HeapTest
(
    Id INT,
    SomeCharData CHAR(100)
);

CREATE TABLE dbo.ClusteredIndexTest
(
    Id INT CONSTRAINT PK_ClusteredIndexTest PRIMARY KEY,
    SomeCharData CHAR(100)
);

SELECT OBJECT_NAME(object_id) AS TableName, object_id, name, index_id, type, type_desc
FROM sys.indexes
WHERE object_id = OBJECT_ID('dbo.HeapTest');

SELECT OBJECT_NAME(object_id) AS TableName, object_id, name, index_id, type, type_desc
FROM sys.indexes
WHERE object_id = OBJECT_ID('dbo.ClusteredIndexTest');

This returns the following results…

InternalStorage03

A table can be either a heap or a clustered index but can contain many non-clustered indexes.  For a heap the index_id is always 0 and for a clustered index it is always 1.  For non-clustered indexes index_id will be > 1.

A table can be split into up to 15,000 partitions and by default has just one partition.  There is another DMV named sys.partitions and we can run the following query against this for our two tables.

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.HeapTest');

SELECT *
FROM sys.partitions
WHERE object_id = OBJECT_ID('dbo.ClusteredIndexTest');

This returns the following results…

InternalStorage04

A subset of a table on a single partition is called a hobt (Heap or B-Tree) and you can see the hobt_id is returned in the results above.  We can also see that there are no rows in these partitions.

A hobt is made up of allocation units and each allocation unit can be one of three types.  IN_ROW_DATA, LOB_DATA and ROW_OVERFLOW_DATA.  There is a third DMV we can look at that tell us what allocation units exist in our table.  If we just look at dbo.ClusteredIndexTest from now on (similar results would be returned for dbo.HeapTest) we can join sys.partitions to sys.allocation_units to tell us what allocation units we have.

SELECT AU.*
FROM sys.partitions P
INNER JOIN sys.allocation_units AU ON AU.container_id = P.partition_id
WHERE P.object_id = OBJECT_ID('dbo.ClusteredIndexTest');

This returns the following results…

InternalStorage05

There is an 8060 byte limit on each page for the in row data and the two columns in dbo.ClusteredIndexTest are 8 bytes for the INT and 100 bytes for the CHAR column.  This means that any rows inserted into the table will be 108 bytes which means it can be stored on a single page.  This is why we only see the IN_ROW_DATA allocation unit returned by sys.allocation_units.

If we now add a column with data type VARCHAR(8000) the total number of bytes in one row could now be up to 8108 which is greater than the 8060 limit.  We can also add another column with the VARCHAR(MAX) data type which replaces the deprecated TEXT data type. There are also NVARCHAR(MAX) and VARBINARY(MAX) data types and all three of these can store their data on LOB_DATA pages.

ALTER TABLE dbo.ClusteredIndexTest ADD TestRowOverflow VARCHAR(8000);
ALTER TABLE dbo.ClusteredIndexTest ADD TestLOB VARCHAR(MAX);

As the TestRowOverflow and TestLOB columns are of variable length the data will be stored on the IN_ROW_DATA page until the total row size exceeds the 8060 byte limit for in row data.  I will go into more detail on this in a future post but if we rerun our query looking at sys.allocation_units we can now see all three types of allocation unit

InternalStorage06

If we add a row to the table that is within the 8060 limit then we can see it fits on a page in the IN_ROW_DATA allocation unit.

INSERT INTO dbo.ClusteredIndexTest (Id, SomeCharData, TestRowOverflow)
VALUES (1, REPLICATE('B', 100), REPLICATE('B', 100));

This inserts an INT (8 bytes) into Id, a string of 100 “A” characters into SomeCharData and a string of 100 “B” characters into TestRowOverflow.  This gives a total row size of 8 + 100 + 100 = 208 which easily fits on a single data page.

If we rerun our query we can see that all the data is in one page within the IN_ROW_DATA allocation unit.  You will see the total_pages is 2 and I will go into more detail on this in a future post.

InternalStorage07

Now if we increase the size of the data in TestRowOverflow to 8000 bytes we can see that we now have some data pages in the ROW_OVERFLOW_DATA allocation unit

UPDATE dbo.ClusteredIndexTest
SET TestRowOverflow = REPLICATE('C', 8000)
WHERE Id = 1;

InternalStorage08

If we add a string of 100 characters to the column LOBTest and rerun the query we will see that the page count is unchanged meaning that the 100 characters are stored on the IN_ROW_DATA page.

UPDATE dbo.ClusteredIndexTest
SET TestLOB = REPLICATE('D', 100)
WHERE Id = 1;

If we update TestLOB to be a string of 8000 characters we can see the data is now stored on the LOB_DATA pages

UPDATE dbo.ClusteredIndexTest
SET TestLOB = REPLICATE('D', 8000)
WHERE Id = 1;

InternalStorage09

We can store more than 8000 characters in the MAX data types and the limit is set by SQL Server which in 2012 is 2GB.

UPDATE dbo.ClusteredIndexTest
SET TestLOB = REPLICATE('D', 20000)
WHERE Id = 1;

There is also undocumented version of sys.allocation_units that is more like the older DBCC IND command which also returns details of the first pages in the chain.

SELECT AU.*
FROM sys.partitions P
INNER JOIN sys.system_internals_allocation_units AU ON AU.container_id = P.partition_id
WHERE P.object_id = OBJECT_ID('dbo.ClusteredIndexTest');

InternalStorage10

Now that we have some data in our table we can look at one last DMF sys.dm_db_database_page_allocations which actually tells us what pages have been allocated to which allocation unit.

SELECT
    database_id,
    object_id,
    index_id,
    partition_id,
    rowset_id,
    allocation_unit_id,
    allocation_unit_type,
    allocation_unit_type_desc,
    extent_page_id,
    allocated_page_iam_file_id,
    allocated_page_iam_page_id,
    allocated_page_page_id,
    is_allocated,
    is_iam_page,
    page_type,
    page_type_desc
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('dbo.ClusteredIndexTest'), NULL, NULL, 'DETAILED')

InternalStorage11

Summary

The DMVs and DMF that we’ve seen here are useful tools in seeing how the storage engine is storing your data.  In a future post I will talk more about the data on the actual page and give examples of how to work out what page data is stored on.