Euan Garden recently said goodbye to the SQL Server team at MS. One of the themes of Euan's blog has been to correct a few myths about SQL Server, but one topic he didn't address was clustered indexes.
There seem to be at least two common misconceptions about SQL Server clustered indexes.
MYTH 1: In a clustered index, data rows are physically sorted in the cluster key order.
The idea of the data being "physically" sorted at all is a very nebulous concept. Data in a single table or index could exist in several data files, each of which could be striped over multiple disk platters and drive spindles. But even if we have a single data file that is physically contiguous the data still won't necessarily be sorted in cluster key order at file system level. Here's the evidence:
CREATE TABLE tbl
(foo INT NOT NULL PRIMARY KEY CLUSTERED,
bar VARCHAR(50) NOT NULL);
GO
/* Rows not inserted in cluster key order: */
INSERT INTO tbl (foo, bar) VALUES (1, 'aaaaaaaaaa');
INSERT INTO tbl (foo, bar) VALUES (3, 'bbbbbbbbbb');
INSERT INTO tbl (foo, bar) VALUES (2, 'cccccccccc');
GO
The following uses the undocumented DBBC PAGE command to show what has actually been stored. If you want to reproduce this, Kalen Delaney's book "Inside SQL Server 2000", p255 explains how to derive the correct file and page numbers from the sysindexes table.
DBCC TRACEON(3604)
GO
DBCC PAGE('junk', 1, 127,2) -- These are the file and page values on my system
Result (abbreviated):
Memory Dump @0x5D07C000
5D07C060: 30000800 01000000 0200fc01 00190061 †0..............a
5D07C070: 61616161 61616161 61300008 00030000 †aaaaaaaaa0......
5D07C080: 000200fc 01001900 62626262 62626262 †........bbbbbbbb
5D07C090: 62623000 08000200 00000200 fc010019 †bb0.............
5D07C0A0: 00636363 63636363 63636300 00000000 †.cccccccccc.....
5D07C0B0: 00000000 00000000 00000000 00000000 †................
[...]
OFFSET TABLE:
Row - Offset
2 (0x2) - 121 (0x79)
1 (0x1) - 146 (0x92)
0 (0x0) - 96 (0x60)
In this case the data is stored in insertion order on the page. The page offset table defines the correct order according to the cluster key. (Thanks due to Hugo Kornelis for inspiring this example).
MYTH 2: A query without ORDER BY will retrieve rows sorted by clustered index order.
Sometimes that's true but there are absolutely no guarantees. A query against a clustered table won't always scan the clustered index. Even when it does, the scan isn't necessarily ordered. One reason is that the query may be executed in parallel over multiple processors which then merge their results. Another is that the optimizer may choose to perform an unordered scan using the Index Allocation Map rather than scanning the leaf pages in order. Then there is the feature of Enterprise Edition called Advanced Scan (also called merry-go-round scan) which means a scan may not start at the first row in a clustered index.
In short, the only reliable way to order a query is to use ORDER BY.
The good news is that the first two volumes of Inside SQL Server 2005 are out now. The Querying volume explains in considerable detail how clustered indexes are used in queries. So if any of the above surprised you don't just take my word for it!