Columnstore Indexes – The Very Basics

Columnstore indexes are briefly mentioned in chapter 10, lesson 1 on the 70-462 exam guide so I don’t think there will be many questions on them in the exam.

There are a new feature in 2012 and can provide performance improvement for data warehouse type queries that have to perform aggregations over large data sets, e.g. a query that groups on a few columns along with an AVG() and SUM() and an ORDER BY.

The columnstore index organises data by columns rather than rows which allows SQL Server to compress repeating data values which, according to the exam guide, is approximately double the compression rate of PAGE compression.

Obviously, like everything else in SQL Server you need to test what works best for your particular scenario.

You can check the query plan to ensure the column store index is being used. There are two modes it can run under: batch and row. Batch mode should be quick but row mode means it is not using the full benefits of the column store index.

The column store index is split into segments of around 1 million rows and the optimiser can use segment elimination in a similar way to how it uses partition elimination in a partitioned table. The segments store the min and max values in their metadata so the optimiser can quickly deduce which segments it needs to use.

If your query uses only a small number columns from a table then SQL Server only gets this column data from disk rather than the entire row. This can significantly reduce disk I/O and memory cache usage.

However if your query uses several columns it might be better to just use row based indexes as SQL Server has to go and get the row data anyway. This is why column store indexes are suitable for data warehouse queries.

The syntax to create the index is as follows:

ON dbo.table (col1, col2, ...);

However, there are limitations on columnstore indexes: there are several data types that cannot be used; and the table must be read-only.

This second limitation is probably going to be a problem on most data warehouse systems as I expect you’d want to load new data in on a regular basis. There are a few ways around this: disable the index (making the table read/write), load in the new data and re-enable the index (this is probably a good option for data warehouses where data is loaded from OLTP systems at regular intervals but you wouldn’t do this if the table is updated frequently from, say, a client app); use partition switching to add the new data into the read-only table; use. UNION ALL in your query to include an active data set in with the columnstore data and then add the active data into the columnstore later on.

For more information…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s