Sunday, November 23, 2014

SELECT Performance Tips

 Match query search columns with those leftmost in the index when possible. An index on stor_id,
ord_num will not be of any help to a query that filters results on the ord_num column.

• Construct WHERE clauses that the query optimizer can recognize and use as search arguments. See
the "SARGs" section later for more information.

• Don't use DISTINCT or ORDER BY "just in case." Use them if you need to remove duplicates or if you
need to guarantee a particular result set order, respectively. Unless the optimizer can locate an index
to service them, they can force the creation of an intermediate work table, which can be expensive in
terms of performance.

• Use UNION ALL rather than UNION when you don't care about removing duplicates from a UNIONed
result set. Because it removes duplicates, UNION must sort or hash the result set before returning it.
Obviously, if you can avoid this, you can improve performance—sometimes dramatically.

• As mentioned earlier, you can use SET LOCK_TIMEOUT to control the amount of time a connection
waits on a blocked resource. At session startup, @@LOCK_TIMEOUT returns –1, which means that
no timeout value has been set yet. You can set LOCK_TIMEOUT to a positive integer to control the
number of milliseconds a query will wait on a blocked resource before timing out. In highly contentious
environments, this is sometimes necessary to prevent applications from appearing to hang.

• If a query includes an IN predicate that contains a list of constant values (rather than a subquery),
order the values based on frequency of occurrence in the outer query, if you know the bias of your
data well enough. A common approach is to order the values alphabetically or numerically, but that
may not be optimal. Since the predicate returns true as soon as any of its values match, moving those
that appear more often to the first of the list should speed up the query, especially if the column being
searched is not indexed.

• Give preference to joins over nested subqueries. A subquery can require a nested iteration—a loop
within a loop. During a nested iteration, the rows in the inner table are scanned for each row in the
outer table. This works fine for smaller tables and was the only join strategy supported by SQL Server
until version 7.0. However, as tables grow larger, this approach becomes less and less efficient. It's
far better to perform normal joins between tables and let the optimizer decide how best to process
them. The optimizer will usually take care of flattening unnecessary subqueries into joins, but it's
always better to write efficient code in the first place.

• Avoid CROSS JOINs if you can. Unless you actually need the cross product of two tables, use a more
succinct join form to relate one table to another. Returning an unintentional Cartesian product and
then removing the duplicates it generates using DISTINCT or GROUP BY are a common problem
among beginners and a frequent cause of serious query performance problems.

• You can use the TOP n extension to restrict the number of rows returned by a query. This is
particularly handy when assigning variables using a SELECT statement because you may wish to see
values from the first row of a table only.

• You can use the OPTION clause of a SELECT statement to influence the query optimizer directly
through query hints. You can also specify hints for specific tables and joins. As a rule, you should
allow the optimizer to optimize your queries, but you may run into situations where the execution plan
it selects is less than ideal. Using query, table, and join hints, you can force a particular type of join,
group, or union, the use of a particular index and so on. The section on the Transact-SQL SELECT
statement in the Books Online documents the available hints and their effects on queries.

• If you are benchmarking one query against another to determine the most efficient way to access data,
be sure to keep SQL Server's caching mechanisms from skewing your test results. One way to do this
is to cycle the server between query runs. Another is to use undocumented DBCC command verbs to
clear out the relevant caches. DBCC FREEPROCCACHE frees the procedure cache; DBCC
DROPCLEANBUFFERS clears all caches.

No comments:

Post a Comment