Monday, June 30, 2014

TSQL Tutorial : Index in SQL Server

Table Scans and Index Lookups

When you perform a query, SQL Server can identify the rows that should be returned using one of two methods. The most basic method is called a table scan. As the name suggests, when a table scan is performed, the querying engine scans through every row in the tables that are included in the query. When a row matches the queries WHERE clause, the relevant information is added to the results that will be returned. This process is generally only acceptable for small tables or those where all of the rows are usually returned unfiltered. In almost all other scenarios, reading every table row is very inefficient.

The second method that a query can use to find information is an index lookup. This type of query uses the contents of an index that has previously been defined for a table. An index on a database table performs a similar function to an index in a book. Rather than reading through the entire book to find the word or phrase that you are looking for, you can examine the book's index to find all of the important references to that item. A database index contains the contents of one or more columns in the table. If the query filters its results based only upon columns within the index, the correct database rows can be found much more quickly.

Indexes are stored as sorted, balanced tree structures, also known as b-trees. This type of tree is very fast to access. Several of these index structures can be added to a single table to improve the performance of queries that use that table. When a query is executed, SQL Server's query optimizer determines which of the indexes are suitable for the query and which of these will provide the best performance. Once selected, the b-tree is searched to find the first entry where the data that matches the query. The initial search is fast because the data in the index is sorted. Furthermore, the sorting minimizes the range of index entries that must be scanned.

Index lookups are almost always quicker and much more efficient than table scans when retrieving information. However, this improvement in reading speed has an associated impact upon the performance when creating and updating information. In addition to changing the data in the table, each index must also be modified. These two factors should be considered together when designing your tables. If a table is often updated and seldom read, there may not be an overall benefit in adding multiple indexes. However, when certain columns are queried often, such as with columns included in foreign keys, it is often beneficial to index them.

NB: Foreign keys are not indexed by default in SQL Server so these are often ideal candidates for indexing.

Index Types

SQL Server 2005 permits the creation of several types of index. In this article, we will examine four of the more common types. These are:
  • Clustered Indexes
  • Non-Clustered Indexes
  • Unique Indexes
  • Indexes with Included Columns

Clustered Indexes
One of the most common types of index is the clustered index. When a clustered index is added to a table, the physical ordering of the rows is changed; they are sorted to match the index definition. This prevents you from creating more than one clustered index per clustered table.

The leaf nodes of a clustered index's balanced tree contain the actual data rows from the table. This gives very good performance for queries that are assisted by a clustered index for two key reasons. Firstly the data is immediately available once located in the index with no further lookups required. Secondly, the sorting of the data means that items that are adjacent in the index are likely to be in the same page of data on disk and thus can be read efficiently.

There are several design considerations when using clustered indexes:

Clustered indexes are particularly useful for queries that use relational operators such as =, <, >, <=, => and BETWEEN or for queries that return large result sets.

Clustered indexes are often used where the columns in the index will be used in queries that join multiple tables together. Typically, such an index is added to a primary key or foreign key. When you create a primary key using SQL Server Management Studio, a clustered index is added by default.

Clustered indexes give good performance to queries that include ORDER BY clauses or GROUP BY clauses. Grouping will be described in a future article in this tutorial.

Ideally, the values held in clustered index columns should be unique or include low levels of duplication. They should be accessed and created sequentially, making identity columns ideal.

Clustered indexes are not advised to be used for columns that are updated frequently, as each update could potentially require a re-ordering operation of the table data.

The composite values of the columns in a clustered index are used by non-clustered indexes on the same table. For this reason, the combined size of the columns should be minimised to prevent other indexes from becoming too large.

Non-Clustered Indexes

Non-clustered indexes are held separately from the data in a table and do not cause any changes to the ordering of the physical data. You can therefore create multiple non-clustered indexes on a single table.

In a non-clustered index, the leaf nodes of the b-tree structure contain pointers to the rows of data they represent, rather than the data itself. The pointer is either a reference to a location in the table's clustered index or, when the table is not clustered, to the table row. This adds an extra lookup to the process when reading from an index, lowering the performance of queries. A further reduction in speed can occur because adjacent rows in the index are less likely to be in the same data pages increasing the required disk access.

Non-clustered indexes are ideal for tables that will be updated rarely and queried often. If there are large volumes of data in the table, the querying speed can be drastically improved.

Non-clustered indexes are useful where a clustered index would be preferred but one already exists for the table.

If a table is updated often and queried less so, you should carefully consider the benefits and drawbacks of adding such an index.

Non-clustered indexes provide further performance gains when they cover the query's returned columns. Coverage means that all of the columns in the query are also included in the index. In this case, the query engine can obtain its results entirely from the index without visiting the table data.

When creating a non-clustered index, you should not add a large number of columns just to achieve coverage of the returned data. The columns in the index should be those that are likely to be included in a WHERE clause only. To achieve additional coverage, you should consider using indexes with included columns, described later in this article.

If a non-clustered index is included in a clustered table, the columns from the clustered index are automatically included in the non-clustered index as they are the pointer that will ultimately find the data. You can obtain coverage of some queries from this combined set of columns.

Unique Indexes

A unique index may be clustered or non-clustered. This type of index prevents duplication of data in the combined set of defined columns. If you attempt to create duplicate data in the defined index columns the command will fail. A unique index is created automatically when adding a unique constraint.

When creating an index where the data will be unique, always specify a unique index. This allows additional efficiency in query execution plans.

There is no significant difference between unique indexes and unique constraints. If you wish to force uniqueness, you should use a constraint, as the purpose will be clearer to other developers and database administrators.


Indexes with Included Columns

Indexes are limited to a maximum of sixteen key columns with a combined size of nine hundred bytes. To help obtain coverage of queries without using a large number of columns, you can use included columns. Indexes with included columns are a new feature of SQL Server 2005.

Included columns are columns from the underlying table that are added to the index but that are not used to optimize queries. They are not included in the maximum number or size of key columns and can include data types that are not normally permitted in an index. During a query, once a row in the index has been identified as a result to be returned, the extra columns may provide coverage so that reading the physical table data is unnecessary.

This type of index can improve query performance. However, you should avoid adding too many columns or very large items as this can lower performance and can result in the creation of very large indexes.

No comments:

Post a Comment