SQL Refactoring – Comparing Result Sets with EXCEPT

,

I often refactor SQL, cleaning up convoluted statements using language features such as views, common table expressions and nested joins. It’s one thing to rework a gnarly query into something legible; it’s another thing to verify that the rewritten query returns the same data as the original.

The starting point for verification is comparing the result sets returned by the old and new queries. I’ve recently found Microsoft T-SQL’s EXCEPT operator quite handy for this purpose.

EXCEPT compares two queries’ distinct results, returning only those distinct rows contained in the first (left) query which are not contained in the second (right) query. EXCEPT is used similar to UNION, by placing it between two SELECT statements whose result sets have the same number of columns and whose columns are data-type compatible.

/* Only the distinct rows that are returned by the first SELECT 
   and not by the second SELECT will be output. */
SELECT FirstName, LastName FROM ....
EXCEPT
SELECT FirstName, LastName FROM ....

I’ll typically use EXCEPT to perform two tests: compare the original query to the rewritten and compare the rewritten query to the original. The first test checks if all distinct rows returned by the original query are also returned by the new query; the second test checks if all distinct rows returned by the new query are also returned by the original query. If my refactoring is correct, both queries will return identical result sets so both tests will return empty result sets.

SELECT ... /* Original */
EXCEPT
SELECT ... /* Refactored */

SELECT ... /* Refactored */
EXCEPT
SELECT ... /* Original */

Beware! EXCEPT compares distinct rows from each result set. The number of times an identical row appears in either result sets is ignored. The above tests will pass so long as the original query outputs at least one row matching each row in the refactored query’s result set and vice versa.

In order for a SQL refactoring to be judged accurate, these tests must pass. However, their success does not guarantee that the refactoring is accurate. These tests do nothing to verify that the two queries are logically equivalent. They only verify that the distinct rows returned by the two queries are identical when those queries are run against the source data existing at the moment the tests are run. If the underlying data changes, the two queries’ results might diverge, so these tests (i.e. EXCEPT) should not be used as the only means of verifying SQL refactoring.

Leave a Reply

Your email address will not be published. Required fields are marked *