T-SQL Ordering Myth #1: Clustered Indexes Determine Default Sort Order

, ,

If I query a table that has a clustered index without specifying an ORDER BY clause, the resultset will be sorted according to the clustered index” may sound reasonable. After all, since “a clustered index determines the physical order of data in a table” (per TechNet), isn’t it logical to assume that a query with no ORDER BY clause returns data sorted this way?

This misbelief relies on two false premises: that Microsoft SQL Server defaults to using the clustered index whenever a query lacks an ORDER BY clause (it doesn’t) and that data is always retrieved from an index in sorted order (also incorrect). Let’s examine each of these.

SQL Server’s query optimizer considers both clustered and non-clustered indexes when building query execution plans, regardless of whether the query contains an ORDER BY clause. A data request may be answered using the clustered index, a non-clustered index or a one or more non-clustered indexes possibly combined with the clustered index. Unless we can somehow guarantee that SQL Server will always use the clustered index, we can’t assume any correlation between that index—including how it is sorted—and the resultset.

Even when we know that a certain index will be the only index used (as would be the case for a query involving a single table having a clustered index and no non-clustered indexes), we still can’t presume how that index will be processed. Parallelism could occur, with execution split between multiple threads and results from each combined in unpredictable order. Within the context of a single index scan operation run on a single thread, SQL Server could choose to process leaf pages in logical order, jumping around in the database file as necessary (for example, reading records A, B & C from leaf page one then jumping to page three to read D, E & F before backing up to page two for records G, H & I), or in allocation order, accessing the pages in the order they are laid out on disk (reading leaf page one [records A, B & C], then page two [G, H & I], followed by page three [D, E & F]). Since SQL Server can choose from multiple methods when processing an index and those methods can output the same data with different orderings, we can’t assume a correlation between an index’s sort order and the resultset even when we know that that particular index will be used.

SQL Server makes no promises that it will default to using the ordering of the clustered index (or any index, for that matter). If a sorted resultset is desired, an explicit ORDER BY is required.

From the performance perspective, this is a good thing. Outputting resultsets with a default sort order would either mandate that the database engine include a sort operation in every query execution (incurring a performance cost) or would restrict it from using any operation that might read or process rows in any order other than the relevant index’s logical sort order (potentially resulting in less-than-optimal performance as this would disallow various execution optimizations). By assuming that sorting is not desired unless requested, the database engine avoids unnecessarily expending extra effort.

See also part 2 of T-SQL Ordering Myths: ORDER BY in Views.

Leave a Reply

Your email address will not be published. Required fields are marked *