Index Types

The following types of indexes are available in SQL Server 2012

  • Clustered
  • Non-clustered (including covering and filtered indexes)
  • XML (primary and secondary)
  • Spatial
  • Full-text
  • Columnstore

It’s obvious when you know about the different types of indexes but the 70-462 guide is the first place I’ve seen a list of which indexes are best used in which situations so I thought I’d list them here.

Clustered indexes are best when querying…

  • a large percentage of columns in the table
  • a single row based on the clustered index key
  • range-based data

Non-clustered indexes are best when querying…

  • only a few rows in a large table
  • data that is covered by the index

Filtered indexes are best…

  • when combined with sparse columns (columns with lots of NULL values) to find specific non-NULL rows
  • with queries where a small subset of rows are queried often

More information on indexes can found in Chapter 10, Lesson 1 of the exam guide.

Columnstore Indexes – The Very Basics

Columnstore indexes are briefly mentioned in chapter 10, lesson 1 on the 70-462 exam guide so I don’t think there will be many questions on them in the exam.

There are a new feature in 2012 and can provide performance improvement for data warehouse type queries that have to perform aggregations over large data sets, e.g. a query that groups on a few columns along with an AVG() and SUM() and an ORDER BY.

The columnstore index organises data by columns rather than rows which allows SQL Server to compress repeating data values which, according to the exam guide, is approximately double the compression rate of PAGE compression.

Obviously, like everything else in SQL Server you need to test what works best for your particular scenario.

You can check the query plan to ensure the column store index is being used. There are two modes it can run under: batch and row. Batch mode should be quick but row mode means it is not using the full benefits of the column store index.

The column store index is split into segments of around 1 million rows and the optimiser can use segment elimination in a similar way to how it uses partition elimination in a partitioned table. The segments store the min and max values in their metadata so the optimiser can quickly deduce which segments it needs to use.

If your query uses only a small number columns from a table then SQL Server only gets this column data from disk rather than the entire row. This can significantly reduce disk I/O and memory cache usage.

However if your query uses several columns it might be better to just use row based indexes as SQL Server has to go and get the row data anyway. This is why column store indexes are suitable for data warehouse queries.

The syntax to create the index is as follows:

ON dbo.table (col1, col2, ...);

However, there are limitations on columnstore indexes: there are several data types that cannot be used; and the table must be read-only.

This second limitation is probably going to be a problem on most data warehouse systems as I expect you’d want to load new data in on a regular basis. There are a few ways around this: disable the index (making the table read/write), load in the new data and re-enable the index (this is probably a good option for data warehouses where data is loaded from OLTP systems at regular intervals but you wouldn’t do this if the table is updated frequently from, say, a client app); use partition switching to add the new data into the read-only table; use. UNION ALL in your query to include an active data set in with the columnstore data and then add the active data into the columnstore later on.

For more information…