Tuesday, December 31, 2013

Covering Index

Covering Index
The query we’ve chosen to optimize gives us the opportunity to explore a special kind of index, the covering index. A covering index is one that contains all the columns referenced in the query for a single table. We will demonstrate just how useful this can be.

Here is our example query:

SET STATISTICS IO ON;
GO
USE AdventureWorks;
SELECT ss.Name, COUNT(*) as OrderCount, SUM(sd.LineTotal) as TotalPurchases
FROM Sales.SalesOrderDetail sd
JOIN Sales.SalesOrderHeader sh on sd.SalesOrderID = sh.SalesOrderID
JOIN Sales.Customer sc on sh.CustomerID = sc.CustomerID
JOIN Production.Product p on sd.ProductID = p.ProductID
LEFT JOIN Sales.Store ss on sc.CustomerID = ss.CustomerID
WHERE ss.Name is not null
-- and sd.SpecialOfferID = 13
GROUP BY ss.Name
ORDER BY SUM(sd.LineTotal) desc
This query joins on several tables but almost all of the cost of the query is in the full scan on the largest table, SalesOrderDetail, at a cost of 1175 page reads. the query returns 633 rows. (We find this information in the Messages tab of the query window in Management Studio after setting STATISTICS IO ON.) When we examine the query, there seems to be no way to reduce the cost of the table scan through indexing. In this query, SalesOrderDetail is joined on the SalesHeaderId and ProductID columns, but those columns are already indexed. The only other place that an index might improve things is on a column in the WHERE clause, but there are no search arguments in the where clause that apply to this table. The only search argument is on the Sales.Store table. There is no obvious column that we can index to improve the performance on SalesOrderDetail.
But notice that there are only three columns from SalesOrderDetail referenced in this query. They are SalesOrderID, ProductID and LineTotal. If we create an index that includes all those columns it is said to “cover the query”. That means that the query engine can get all the data it needs from the index without touching the table at all. This can lead to performance improvements that range from moderate to absolutely amazing.
CREATE NONCLUSTERED INDEX nc_cover on Sales.SalesOrderDetail(productid) INCLUDE (linetotal) Notice that we use the INCLUDE clause to add the LineTotal column to this index. INCLUDE was a new feature of the CREATE INDEX statement in SQL 2005. It allows us to include columns that are not keys without incurring the substantial cost of adding more key values to the index. Columns that have no value for filtering, joining or ordering results can be added to the index at lower cost than simply naming them in the traditional CREATE INDEX syntax.
You might also notice that we have not mentioned SalesOrderID in this CREATE INDEX statement. How can this be a covering index if we leave out a column that is used in the query? The answer is that we have not left it out.
In a table having a clustered index, all nonclustered indexes include the clustering key. SalesOrderID is in the clustering key, therefore it is automatically added to any nonclustered index we create. There is no need to explicitly name it. When we create this index, the cost of accessing SalesOrderDetail drops from 1175 data pages to 486 pages. This is not an earthshaking improvement, but it does illustrate the principle we are discussing.

No comments: