Can you rewrite
WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL) so that each column name is only used once without using the substitution method (that is, without doing something like
WHERE ISNULL(Col1, '-2,147,483,648') = ISNULL(Col2, '-2,147,483,648'))?
WHERE EXISTS(SELECT Col1 INTERSECT SELECT Col2)
What…what?! That can’t be possible! I thought
INTERSECT compares two result sets, not two columns from the same row of data.
True, but a correlated subquery has access to the outer query’s columns. In the subquery, we can turn a column from the outer query into a single column, single row result set by putting
SELECT in front of its name. Do this to two columns and
INTERSECT is happy to compare between them. If they’re identical,
INTERSECT returns a single row containing the (distinct) identical value; if not,
INTERSECT returns no rows.
Okay, you win that point. I see how the
INTERSECT subquery could compare between the two values but don’t you need to add
IS NULL checks since a comparison between two
NULLs does not evaluate to true?
That would be correct if we were using a comparison operator (for example,
<>). When a comparison operator encounters a
NULL, the operator evaluates to unknown. If we were using one of these operators and wanted
NULLs to be treated as equal, explicit
IS NULL checks would be required.
INTERSECT is a set operator, not a comparison operator. Out of the box, set operators treat
NULLs as equivalent, so
IS NULL checks are unnecessary.
Wow! That’s interesting.
Hum…let me make sure I understand the
EXISTS part of the
INTERSECT passes its output to
EXISTS, which is kind of playing the role of a true or false check, right?
INTERSECT returns a row,
EXISTS evaluates to true which makes the
WHERE evaluate to true. If
INTERSECT returns no rows,
EXISTS returns false which makes the
Okay, you win. Your query works. But what about performance? Your query looks a lot more complex than the simple ‘equality comparison plus
IS NULL checks’ we started with. Won’t that complexity slow the query down?
That’s a fair question. As far as execution performance goes, remember that the execution plan and the efficiency with which it is executed is what matters, not the apparent simplicity or complexity of the SQL statement.
Let’s look at the execution plan generated for our original
WHERE statement used in the context of a simple
SELECT * FROM TestTable WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL);
Notice that the
WHERE clause’s three tests are processed using a single compare predicate on the table scan. The database server determined it could satisfy all three logical conditions in the
WHERE clause using a single predicate.
Wait a minute! It looks like the
IS NULL checks disappeared. The predicate appears to only be performing an equals comparison.
Indeed, that’s what it looks like. Interestingly, if you run the same query with the
IS NULL checks removed, the graphical execution plan is identical.
SELECT * FROM TestTable WHERE Col1 = Col2;
Both queries have the same graphical execution plan, even though they are not logically equivalent and produce different result sets.
Whoa! What’s going on? How can both queries use the same predicate but produce different result sets?
Ready for something fascinating, a kind of Microsoft SQL Server secret? The compare predicate you see on the table scan can run under multiple operation modes. However, the graphical execution plan doesn’t show the mode being used. Thankfully, this detail is revealed in the XML execution plan.
Notice that the only non-whitespace difference between the two predicates is the value of
CompareOp. The first query was executed using
CompareOp="IS" while the second query used
IS considers values equal if they are the same or if both are
EQ considers values equal if they are the same but not when both are
Wow! So even though the one
WHERE clause has three checks (
Col1 = Col2,
Col1 IS NULL and
Col2 IS NULL), SQL Server used a single predicate to evaluate all three. Then, if we run the same query without the
checks, the same predicate is used—but it executes under a different mode. Okay, that’s cool!
But what about the performance of
Let’s take a look:
SELECT * FROM TestTable WHERE EXISTS (SELECT Col1 INTERSECT SELECT Col2);
Wait…let me guess…if we look at the execution plan, we’ll find that the comparison predicate is operating in
SELECT * FROM TestTable WHERE Col1 = Col2 OR (Col1 IS NULL AND Col2 IS NULL); and
SELECT * FROM TestTable WHERE EXISTS (SELECT Col1 INTERSECT SELECT Col2); were executed using the same execution plan.
Keep in mind that SQL Server looks at the entire query when building execution plans. In our case, the two
WHERE clauses produced identical execution plans. However, in other situations, switching between comparison operator +
IS NULL checks and
EXISTS+INTERSECT could result in vastly different execution plans which might differ significantly in performance (for better or for worse). With query writing—at least when performance is important—always test and compare!
This has been fascinating! However,
EXISTS+INTERSECT’s syntax feels a bit clunky and verbose. Are you suggesting it’s better to use than comparison operator +
IS NULL checks?
I agree with your assessment. In fact, I think both approaches are awkward. Both require a lot of syntax to do something that seems like it should be simple.
EXISTS+INTERSECT avoids repeating column names multiple times (a violation of the DRY principle). However, the intention of its syntax may not be immediately clear. Even an experienced query author may need to take a moment to figure out what it’s doing.
Comparison operator +
IS NULL checks may be repetitious but has the advantage of being straightforward and so easier to understand.
Since neither approach is ideal, it’s nice to have options. You can choose which seems best in the context of a particular query.
Hey…wait…before you go: Can this technique be used for not equals comparisons, as well?
Absolutely! Just add
NOT in front of
EXISTS. For example, the
Col1 <> Col2 OR (Col1 IS NULL AND Col2 IS NOT NULL) OR (Col1 IS NOT NULL AND Col2 IS NULL) can be rewritten as
NOT EXISTS (SELECT Col1 INTERSECT SELECT Col2).
- ANSI/ISO SQL bypasses the need to use either of these approaches with its
IS [NOT] DISTINCT FROMclause. There’s a Connect item asking Microsoft to add support for this syntax to T-SQL. If you like this idea, please consider voting for the Connect item.
- For additional reading on this technique, check out Paul White’s post Undocumented Query Plans: Equality Comparisons. Paul’s article introduced me to the
CompareOpsproperty. Thank you, Paul!