Table Scans and Index
Lookups
When you perform a query, SQL Server can
identify the rows that should be returned using one of two methods. The most
basic method is called a table scan. As the name suggests, when a table scan is
performed, the querying engine scans through every row in the tables that are
included in the query. When a row matches the queries WHERE clause, the
relevant information is added to the results that will be returned. This
process is generally only acceptable for small tables or those where all of the
rows are usually returned unfiltered. In almost all other scenarios, reading every
table row is very inefficient.
The second method that a query can use to
find information is an index lookup. This type of query uses the contents of an
index that has previously been defined for a table. An index on a database
table performs a similar function to an index in a book. Rather than reading
through the entire book to find the word or phrase that you are looking for,
you can examine the book's index to find all of the important references to
that item. A database index contains the contents of one or more columns in the
table. If the query filters its results based only upon columns within the
index, the correct database rows can be found much more quickly.
Indexes are stored as sorted, balanced tree
structures, also known as b-trees. This type of tree is very fast to access.
Several of these index structures can be added to a single table to improve the
performance of queries that use that table. When a query is executed, SQL
Server's query optimizer determines which of the indexes are suitable for the
query and which of these will provide the best performance. Once selected, the
b-tree is searched to find the first entry where the data that matches the
query. The initial search is fast because the data in the index is sorted.
Furthermore, the sorting minimizes the range of index entries that must be
scanned.
Index lookups are almost always quicker and
much more efficient than table scans when retrieving information. However, this
improvement in reading speed has an associated impact upon the performance when
creating and updating information. In addition to changing the data in the
table, each index must also be modified. These two factors should be considered
together when designing your tables. If a table is often updated and seldom read,
there may not be an overall benefit in adding multiple indexes. However, when
certain columns are queried often, such as with columns included in foreign
keys, it is often beneficial to index them.
NB: Foreign keys are not indexed by default
in SQL Server so these are often ideal candidates for indexing.
Index Types
SQL Server 2005 permits the creation of
several types of index. In this article, we will examine four of the more
common types. These are:
- Clustered Indexes
- Non-Clustered Indexes
- Unique Indexes
- Indexes with Included Columns
Clustered Indexes
One of the most common types of index is
the clustered index. When a clustered index is added to a table, the physical
ordering of the rows is changed; they are sorted to match the index definition.
This prevents you from creating more than one clustered index per clustered
table.
The leaf nodes of a clustered index's
balanced tree contain the actual data rows from the table. This gives very good
performance for queries that are assisted by a clustered index for two key
reasons. Firstly the data is immediately available once located in the index
with no further lookups required. Secondly, the sorting of the data means that
items that are adjacent in the index are likely to be in the same page of data
on disk and thus can be read efficiently.
There are several design considerations when
using clustered indexes:
◾Clustered indexes are particularly useful
for queries that use relational operators such as =, <, >, <=, =>
and BETWEEN or for queries that return large result sets.
◾Clustered indexes are often used where the
columns in the index will be used in queries that join multiple tables
together. Typically, such an index is added to a primary key or foreign key.
When you create a primary key using SQL Server Management Studio, a clustered
index is added by default.
◾Clustered indexes give good performance to
queries that include ORDER BY clauses or GROUP BY clauses. Grouping will be
described in a future article in this tutorial.
◾Ideally, the values held in clustered index
columns should be unique or include low levels of duplication. They should be
accessed and created sequentially, making identity columns ideal.
◾Clustered indexes are not advised to be
used for columns that are updated frequently, as each update could potentially
require a re-ordering operation of the table data.
◾The composite values of the columns in a
clustered index are used by non-clustered indexes on the same table. For this
reason, the combined size of the columns should be minimised to prevent other
indexes from becoming too large.
Non-Clustered Indexes
Non-clustered indexes are held separately
from the data in a table and do not cause any changes to the ordering of the
physical data. You can therefore create multiple non-clustered indexes on a
single table.
In a non-clustered index, the leaf nodes of
the b-tree structure contain pointers to the rows of data they represent,
rather than the data itself. The pointer is either a reference to a location in
the table's clustered index or, when the table is not clustered, to the table
row. This adds an extra lookup to the process when reading from an index,
lowering the performance of queries. A further reduction in speed can occur
because adjacent rows in the index are less likely to be in the same data pages
increasing the required disk access.
◾Non-clustered
indexes are ideal for tables that will be updated rarely and queried often. If
there are large volumes of data in the table, the querying speed can be
drastically improved.
◾Non-clustered
indexes are useful where a clustered index would be preferred but one already
exists for the table.
◾If a table is
updated often and queried less so, you should carefully consider the benefits
and drawbacks of adding such an index.
◾Non-clustered
indexes provide further performance gains when they cover the query's returned
columns. Coverage means that all of the columns in the query are also included
in the index. In this case, the query engine can obtain its results entirely
from the index without visiting the table data.
◾When creating a
non-clustered index, you should not add a large number of columns just to
achieve coverage of the returned data. The columns in the index should be those
that are likely to be included in a WHERE clause only. To achieve additional
coverage, you should consider using indexes with included columns, described
later in this article.
◾If a non-clustered index
is included in a clustered table, the columns from the clustered index are
automatically included in the non-clustered index as they are the pointer that
will ultimately find the data. You can obtain coverage of some queries from
this combined set of columns.
Unique Indexes
A unique index may be clustered or
non-clustered. This type of index prevents duplication of data in the combined
set of defined columns. If you attempt to create duplicate data in the defined
index columns the command will fail. A unique index is created automatically
when adding a unique constraint.
◾When creating an
index where the data will be unique, always specify a unique index. This allows
additional efficiency in query execution plans.
◾There is no
significant difference between unique indexes and unique constraints. If you
wish to force uniqueness, you should use a constraint, as the purpose will be
clearer to other developers and database administrators.
Indexes with Included
Columns
Indexes are limited to a maximum of sixteen
key columns with a combined size of nine hundred bytes. To help obtain coverage
of queries without using a large number of columns, you can use included
columns. Indexes with included columns are a new feature of SQL Server 2005.
Included columns are columns from the
underlying table that are added to the index but that are not used to optimize
queries. They are not included in the maximum number or size of key columns and
can include data types that are not normally permitted in an index. During a
query, once a row in the index has been identified as a result to be returned,
the extra columns may provide coverage so that reading the physical table data
is unnecessary.
This type of index can improve query
performance. However, you should avoid adding too many columns or very large
items as this can lower performance and can result in the creation of very
large indexes.
Refrence:http://www.blackwasp.co.uk/SQLIndexes_2.aspx
No comments:
Post a Comment