Introduction

If you own a car, you got to take it for service every 90 days; if you don’t, you are in for trouble someday. To be able to enjoy the luxuries, you need to shell a little time and money in a timely manner or someday the car shall be nothing more than a piece of junk. The same goes with indexes. In my last article, we have seen how creating clustered and non-clustered indexes on the tables optimizes the data retrieval and modification activities. In this article, we will see how to maintain table indexes.

One of the leading causes of poor query performance is poorly maintained indexes. As indexes are updated, they become fragmented. This occurs because indexes are a collection of contiguous, sorted data. To maintain the sorted order of indexes, SQL Server splits full data pages to make room for more data.

For example, Extent 24 contains a clustered index defined on the FirstName column of the fictitious dbo.Student table. Each data page in the extent is completely full.

EXTENT 24

The following DML is executed to insert a new row into the dbo.Student table:

INSERT INTO dbo.Student
(FirstName, LastName,Class, EmailAddress, ResidentialAddress, DateModified)
VALUES
('Ben','Thomas','X','benT@gmail.com','407/1,120 S-Avenue 2nd Street LA',GETDATE())

An immediate page split occurs. This is because there is no room on the data page for a new record. To maintain the order of the rows, SQL Server splits page 15 and moves approximately 50 percent of the rows to a new unallocated data page. The new record is inserted into page 15 while the records marked are moved to page 243 of Extent 72. The page 15 in Extent 24 looks like below:

EXTENT 24

EXTENT 72

As a result of this page split, when SQL Server reads the data pages to retrieve the contents of the dbo.Student table, it will have to switch from Extent 24 to Extent 72 and then back to Extent 24 again to continue the scanning of rows. After many more students are added, additional page splits will occur. These page splits cause index fragmentation. The fragmentation of the indexes will eventually cause SQL Server to perform an excessive amount of reads to retrieve data, resulting in poor query performance.

To check for fragmentation level on all the indexes of a table or specific indexes, the dynamic management function sys.dm_db_index_physical_stats is used. This function returns a great deal of information about the indexes on a table, including the amount of data on each data page, the amount of fragmentation at the leaf and non-leaf level of the indexes, and the average size of records in an index.

When querying this table-value function, the information one is most often interested in is the levels of fragmentation and the average percentage that each page is full. The fragmentation level will let us know if the indexes need to be rebuilt, and the amount of fullness of the data pages will tell you how soon we can expect more page splits to occur. To query the sys.dm_db_index_physical_stats dynamic management view, the following syntax can be used:

DECLARE @DB_ID SAMLLINT, @OBJ_ID INT
SET @DB_ID = DB_ID('TestDB')
SET @OBJ_ID = OBJECT_ID('dbo.Student')
SELECT *
FROM sys.dm_db_index_physical_stats(@DB_ID, @OBJ_ID, NULL, NULL , 'DETAILED')

This DMV returns the complete info about the indexes on the table, but mostly we will be interested in the avg_fragmentation_in_percent, avg_page_space_used_in_percent columns.

SELECT avg_fragmentation_in_percent, avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(@@DB_ID, @OBJ_ID, NULL, NULL , 'DETAILED')
WHERE INDEX_LEVEL=0

Results:

avg_fragmentation_in_percent avg_page_space_used_in_percent
---------------------------- ------------------------------
1.40845070422535             99.7014949345194

(1 row(s) affected)

Alternatively, fragmentation of any table can also be determined by the DBCC command.

USE TestDB
GO
DBCC SHOWCONTIG ('dbo.Student');
GO
--------------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'Student' table...
Table: 'Student' (821577965); index ID: 0, database ID: 8
TABLE level scan performed.
- Pages Scanned................................: 1126
- Extents Scanned..............................: 142
- Extent Switches..............................: 141
- Avg. Pages per Extent........................: 7.9
- Scan Density [Best Count:Actual Count].......: 99.30% [141:142]
- Extent Scan Fragmentation ...................: 1.41%
- Avg. Bytes Free per Page.....................: 24.2
- Avg. Page Density (full).....................: 99.70%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

This query only returns the fragmentation level and the page space used for the leaf level of the index, which is where the worst fragmentation (as far as performance is concerned) will occur. Down the line when there are more updates, inserts happening there will be more page splits and hence the fragmentation level will increase. Upon a time when the avg_fragmentation_in_percent approaches 70% and avg_page_space_used_in_percent depletes to 70%, it is high time that the DBA will need to rebuild the index. It’s so because the the query performance will deteriorate because of the Extent switches. We find that for the Student table, there were 141 Extent switches.

The DBCC SHOWCONTIG command shows that SQL Server scanned 142 extents to retrieve all the data in the Student table, but to scan those 142; it had to switch between them 141 times!

It has already been explained that SQL Server uses indexes to find rows in data tables for reading, updating, or deleting. However, if all one ever did was insert data in tables, there would be no need of an index. The general rule is that indexes help data read performance and hurts data insert performance. Here is an analogy. I take my tool box for repairing my car. I am just incapable of putting all the tools back where they belong after my repair work is over. As a result, when I am finished, I invariably grab all the tools I have used and throw them into my drawer. Putting stuff away never takes me very long. However, when I need to repair my car for the next time, I invariably spend a huge amount of time just trying to find my hammer. Out of desperation, I sometimes just go buy another tool box. Had I just spent the extra time required to put things back where they belong, I could have saved time and money.

The same goes for databases. Planning and building indexes takes time and effort, so does maintaining the indexes once they are built. However, even the most insert- and update-intensive databases can usually be found to perform five reads for every write. That means that maintaining indexes at peak performance is going to pay off five-fold. With that firmly in mind, let us take a look at how to mitigate index fragmentation and correct it once it has occurred.

Fill Factor

To cut down fragmentation caused by page splits, the database administrator has the option to design or rebuild the indexes so that they are not completely full. This option is called fill factor. When building or rebuilding the index, a fill factor percentage can be specified. Specifying the fill factor will enable the pages to be filled by the specified percentages. If a fill factor of 90% is specified, then the pages will be filled by 90%, and if an indexed page is only filled 90 percent of the way, it will take more inserts to the index to cause page splits and fragmentation to occur.

Now that the data pages are not completely full, adding additional contacts will not cause the pages to split as quickly. The fill factor is only effective when the indexes are built or rebuilt.

After a few inserts, the indexes will again fill and page splits will occur. However, the page splits will not occur immediately, and the amount of time between index rebuilds can be lengthened.

The index fill factor option has a down side. Only filling up the index pages partially increases the amount of page reads required to retrieve the data. As a result, there is a definite point of decreasing returns when setting a fill factor. Hence fill factor should not be specified more than 80 to 90%, with 10 to 20% space free for inserts. Keep in mind that at an 80 percent fill factor, SQL Server will have to perform 20 percent more reads than is strictly required to retrieve the records at a 100 percent fill factor.

Fixing Fragmentation

When an index is heavily fragmented, the following choices are available for reducing fragmentation:

  • Drop and re-create a clustered index.
  • Reorder or re-organize the leaf-level pages of the index in a logical order.
  • Rebuild the index.

Each method has its advantages and disadvantages. The drop and re-create option is used with the CREATE INDEX command. The rebuild and reorganize options are used with the ALTER INDEX command.

Drop and Re-create Index

The main advantage of dropping and re-creating an index is that almost everything about the index can be changed. For example, the columns that the index is defined on can be changed, the FILLFACTOR of the index can be modified, or the index can be changed from a non-clustered index to a clustered index, if a clustered index does not already exist. However, when using the DROP_EXISTING option with the CREATE INDEX command, a specific index must be specified. When using the rebuild or reorganize options of the ALTER INDEX command, all the indexes on a table can be specified at once.

Rebuilding an index with the DROP_EXISTING option removes index fragmentation by rebuilding all the index pages in the indexed order. It also compresses the index pages so that empty space created by page splits is filled. Both the leaf level and the non-leaf level of the indexes are rebuilt.

Here is an example for dropping the index CL_ID on the Sales table:

DROP INDEX Sales.CL_ID

Alternatively, the Create query would automatically drop and recreate the index.

CREATE CLUSTERED INDEX CL_ID 
ON SALES(ID)
WITH (Fillfactor=80,Drop_Existing=ON);

CREATE NONCLUSTERED INDEX NONCI_PC 
ON SALES(ProductCode)
WITH (Fillfactor=80,Drop_Existing=ON);

Re-organizing Indexes

Reorganizing indexes consumes the least amount of system resources, but doesn’t do as thorough a job as an index rebuild. When SQL Server reorganizes an index, it rearranges and compresses the data pages so that their logical order matches their physical order. Index reorganization only affects the leaf level of the index.

The guideline on when to perform index reorganization versus when to perform a rebuild is the 30 percent fragmentation level. If the level of fragmentation is less than or equal to 30 percent, a reorganization will take less time than an index rebuild, and consume much less system resources. If the fragmentation is greater than 30 percent, index reorganization will most likely take longer than a rebuild, but it will still consume less resources.

In general, if the indexes are rebuilt periodically with an appropriate FILLFACTOR, the need for index reorganization between those periods is reduced. However, intervals of high transaction activity may necessitate an intervening reorganization to prevent fragmentation from exceeding 30 percent, and potentially causing performance issues.

ALTER INDEX CL_ID 
ON dbo.Sales
REORGANIZE;

Remember that with reorganizing indexes, we do not have the options of specifying FILLFACTOR or ONLINE. The FILLFACTOR shall remain whatever it was at the time of the creation of the index.

Rebuilding an Index

When an index is rebuilt using the ALTER INDEX command, SQL Server actually drops and re-creates the index much like the CREATE INDEX command. The difference is that the columns of the existing index cannot be changed, nor can the type of index. However, the FILLFACTOR can be modified as well as the very useful ability to execute the command only once on an entire table to rebuild all the indexes on that table.

Another very useful feature is the ONLINE option. If ONLINE is on, SQL Server will not place any long term locks on the table being indexed, resulting in a much lower impact on user performance. The ONLINE option, however, is only available with the Enterprise edition of SQL Server.

Like the DROP_EXISTING option, the REBUILD option of ALTER INDEX rebuilds both the leaf and non-leaf levels of the index. The following is an example of rebuilding an individual index and then all the indexes on a table with a FILLFACTOR of 90 percent and the ONLINE option on:

USE TestDB
GO
ALTER INDEX CL_ID ON dbo.Sales
REBUILD WITH (FILLFACTOR=90,ONLINE=ON)
USE TestDB
GO
ALTER INDEX ALL ON dbo.Sales
REBUILD WITH (FILLFACTOR=90,ONLINE=ON)

The ONLINE indexing option leverages the TestDB database for index creation and maintenance. Indexes are created or rebuilt in the TestDB database and then moved to the appropriate database. This decreases the impact on users in the database, but it can cause unanticipated growth of the TestDB database. The ONLINE index option is only available with the Enterprise and Developer editions of SQL Server.

Alternatively, DBCC REINDEX rebuilds one or more indexes for a table in the specified database.

DBCC DBREINDEX 
    (    'database.owner.table_name'    
            [ , index_name 
                [ , fillfactor ] 
            ] 
    )    [ WITH NO_INFOMSGS ] 

DBCC DBREINDEX ('TestDB.dbo.Sales', 'NONCI_PC', 70)
WITH NO_INFOMSGS

Here, the CL_ID index has been rebuilt with 70% fill factor on the Sales table on the TestDB database.

Before I complete this article, one tip.

How about a scenario where there is a table in which there has to be a BCP or BULK INSERT of say 100,000,000 records, the table has around 50 non-clustered indexes and a clustered index? How would we approach it?

First, drop all the indexes and BCP or BULK INSERT the data. But why would we drop them in first place when we need them after the BCP? Hence disabling the index is a sound approach.

USE TestDB
GO
----Diable Index
ALTER INDEX CL_ID ON dbo.Sales DISABLE
GO
----Enable Index
ALTER INDEX CL_ID ON dbo.Sales REBUILD
GO

But if there are 51 indexes, how about disabling them all in one go?

So..

ALTER INDEX ALL ON dbo.Sales DISABLE
GO

**Out Of experience, I have an interesting observation to share: when you disable clustered index, all the indexes on the table are disabled; the reverse in not true.

Next, we BCP or BULK INSERT the data.

BCP TestDB.dbo.Sales
IN Sales.TXT -c -U<login_id> -S<server_name\instance_name>

Or:

BULK INSERT TestDB.dbo.Sales
   FROM 'c:\sales\sales20110321.txt'
   WITH 
      (
         FIELDTERMINATOR ='|',
         ROWTERMINATOR ='\n'
      )

Now once the inserts are complete, rebuild all the indexes on the table.

ALTER INDEX ALL ON dbo.Sales REBUILD
GO

Another scenario is something like this: we have been able to build all the indexes on the Sales table, how about if we are asked to perform a rebuild on all the indexes of a particular database? Creating a Stored Procedure and looping for each of the tables seems like the correct option, right?

Let’s explore.

There is an undocumented Stored Procedure supported by SQL Server which helps us: sp_MSforeachtable.

USE TestDB
GO
EXEC sp_MSforeachtable @command1="DBCC DBREINDEX ('?', '', 70)"

This query runs on all the tables of the current database and rebuilds the index on all of them; simply great, isn’t it? Similarly, you could use sp_MSforeachdb when you need a SQL to be run on all the databases of the current server.

In my next article, we will see how partitioning a table will be helpful in performance tuning for extremely large tables.

推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架
新浪微博粉丝精灵,刷粉丝、刷评论、刷转发、企业商家微博营销必备工具"