Tips on Constructing an Efficient WHERE Clause

When you create a SQL statement, it is important to construct an efficient WHERE clause.

Indexes and WHERE Clauses

The performance of a SQL statement that includes a WHERE clause depends partly on the way you construct your WHERE clause and partly on the definition of table indexes by the data grid administrator. In general, you want to construct a WHERE clause that results in the primary key or a secondary index to be selected for evaluating the WHERE clause predicates against a subset of the rows of a table.

Omitting the WHERE clause or using a WHERE clause that results in no key or index being selected might end up in a full table scan. Full table scans iterate over every row in a table to see if the row must be used by the SQL statement or whether they must be avoided as they are very inefficient.

For example, if the primary key for a table is defined on the empid column, the following query can directly access one row of the table:

SELECT * FROM mytable WHERE empid=’ID-1234’
If there are no secondary indexes defined for the table, the following query performs a full table scan:
SELECT * FROM mytable WHERE lastname LIKE ‘B%’

However, if there is a secondary index defined on the lastname column, then the query above would reduce the number of rows scanned to only those rows that begin with a 'B' or 'b'.

Programmers: consult your data grid administrator for information about the definition of indexes.

Programmers and administrators can use the rules of thumb in the following sections to help promote efficiency and high performance of queries.

Primary Key and Secondary Indexes

Rule of Thumb: Construct WHERE clauses in which every predicate refers to columns of the primary key or a secondary index.

A WHERE Clause is composed of one or more predicates. A predicate is a condition expression that evaluates to either true or false. Each predicate must be composed of columns from the primary key or a secondary index. This allows the primary key or a secondary index to be selected to find the rows of the table to use for evaluating the predicate. If a primary key or secondary index cannot be selected for even one predicate in the WHERE clause, a full table scan is performed even if the primary key or a secondary index is selected for all of the other predicates in the WHERE clause.

Left-Most Columns

Rule of Thumb: Construct WHERE clauses that reference to the left-most columns of a primary key or secondary index using the operators =, ==, <=, >=, <, >, IN, IS, or BETWEEN.

Not all columns of the primary key or secondary indexes need to be referenced by a WHERE clause predicate. When a primary key or secondary index includes more than one column, the administrator has defined them from left to right. For the primary key or a secondary index to be considered for use in evaluating a predicate, at least the left-most column of the key or index must be referenced by the predicate. The primary key or a secondary index is not selected, if the WHERE clause skips the left-most column but refers only to columns defined further to the right.

Similarly WHERE clauses that refer to the left-most two columns can be even more efficient. Queries can achieve maximum efficiency when they use WHERE clauses that refer to all of the columns of the primary key or a secondary index.

When a WHERE clause does not contain the left-most column of the primary key or any of the secondary indexes, a full table scan results. A query that does a full table scan is least efficient and must be avoided.

The order in which columns appear in the WHERE clause does not affect query efficiency. Only the order of columns when defining the index matters.

Avoid Left-Most NOT

Rule of Thumb: Do not construct WHERE clauses that reference the left-most columns of the primary key or secondary index using the operators NOT, IS NOT, !=, <>, ISNULL, IS NULL, NOTNULL, NOT NULL, and IS NOT NULL.

In contrast to the rule of left-most columns, a WHERE clause that references the left-most columns with these operators have the opposite effect: to guarantee a full table scan, which is the least efficient.

This rule does not imply that operators in the NOT family are always inefficient. For example, a query can still be efficient if it obeys the left-most columns rule and also tests columns further to the right using NOT. For example, if the administrator defined an index on the columns lastname and firstname, then this WHERE clause can be efficient:

WHERE lastname='Smith' AND firstname IS NOT
'Dan'

Limit Range Queries From Both Ends

Rule of Thumb: When using the operators > or >=, which specify a lower limit on a column's value, also include the opposite operators, < or <=, to specify an upper limit on the same column.

A query searches an index from its lower limit to its upper limit. If you omit the upper limit, the query continues searching to the end of the index. If you omit the lower limit, the query begins with the first row of the index.