First of all I'd like to assure
non-developers accidentally reading this post that it has nothing to do with lysergic
acid diethylamide, an A-class substance commonly known as LSD or "acid". It's
all about ACID properties of database transactions in general (boring stuff),
and the "I" property and scalability in particular. Now that we have the
legalities behind us let's get back to business.
According to Wikipedia the "I" in the
acronym has the following meaning:
Isolation
refers to the ability of the application to make operations in a transaction
appear isolated from all other operations. This means that no operation outside
the transaction can ever see the data in an intermediate state; a bank manager
can see the transferred funds on one account or the other, but never on both -
even if he ran his query while the transfer was still being processed. More
formally, isolation means the transaction history (or schedule) is
serializable. This ability is the constraint which is most frequently relaxed
for performance reasons.
The last sentence is of particular interest
as it implies that isolation comes at a cost (concurrency vs. consistency) and
this basic fact prompted me to do some experiments and in result write this post.
The
Problem
The main problem with maintaining isolation
is that resources which are supposed to be isolated have to be locked. As I am
sure you are aware of it locking in general, and in databases in particular, means
lower concurrency and throughput.
To see exactly how database performance is
affected by concurrent reads and updates I devised a simple "bank" scenario:
while one thread moves money between hypothetical accounts the other tries to
get the total amount of money held in the bank (which has to remain constant).Various
approaches to this problem are the subject of the rest of this post and as
you'll hopefully see they produce very different results. Although the topic
may seem trivial (and rightly so) it represents a wider class of problems where
the same table is read and updated concurrently.
All
that's needed to test the "bank" scenario is a table defined as follows:

With two stored procedures, one for moving
the money and the other which gets the total accumulated:

Once the table has been populated with
10000 rows (and equal amount of money in each account) I ran the test app to
see how many times I could concurrently execute both stored procedures within
20 seconds and what results would I get (see attachment for the test app and
SQL setup script).
First
run
The first run of the program with default
SQL Server settings (isolation level set to READ_COMMITTED) produced following
results:
|
Operation
|
Total
# of executions (average of 3 runs)
|
|
Reads (totals retrieved)
|
1656
|
|
Writes (transfers executed)
|
19300
|
|
Inconsistent results
|
1120
|
The interesting thing in this case is the
number of inconsistent results, i.e. the database reported incorrect total of
all balances held in our "bank". In spite of the fact that the movement of
money happens within a transaction, the "reader" is clearly able to see "half-committed"
data. In case you were wondering why this happens have a look at the following
table which illustrates the cause of the problem.
|
Time
|
Reader
|
Writer
|
|
1
|
Reads row ID=1, gets the balance = 100
and releases the lock.
|
|
|
2
|
|
Updates row ID=1 sets the
balance=balance-10 (90) and exclusively locks the row.
|
|
3
|
|
Updates row ID=2 sets the
balance=balance+10 (110) and exclusively locks the row.
|
|
4
|
|
Commits and releases both locks.
|
|
5
|
Reads the row ID=2, gets the balance of
110 producing total of 210!
|
|
Due to different locking strategies the
reader hits rows updated by the other statement (and quite possibly the other
way around). The reader never reads uncommitted data so in principle the
database obeys the READ_COMMITED isolation level, the end result however may be
far from desired and many people will find it surprising.
Approach
no 1
The first possible approach to produce
consistent results which came to my mind was to set the isolation level for the
GetTotal stored procedure to REPEATABLE READ. In this case the locks are held
on the data for the duration of the transaction in order to prevent updates.
The outcome however was an immediate and somewhat surprising deadlock which the
following table explains:
|
Time
|
Reader
|
Writer
|
|
1
|
Reads row ID=1, gets the balance = 100
and holds shared lock on the row.
|
|
|
2
|
|
Updates row id=2 with balance = balance -
10 and holds the lock
|
|
3
|
Tries to read the row ID = 2 but it's
already exclusively locked by the writer so the statement is waiting for the
lock to be released.
|
|
|
4
|
|
Tries to update row id=1 but the row is
locked by the other statement. Transaction deadlocks.
|
|
5
|
Reader is chosen as the deadlock victim
as there is less work to "rollback".
|
|
Approach
no 2
The reason for the deadlock above is the
"progressive" locking of the rows as the SELECT query continues along the
table. The simple solution is to use the
TABLOCK hint in the query to make sure that the entire table is locked in one
"go".

This approach works exactly as expected (no
inconsistent results) the downside however is an immediate drop in "writer" performance
as the following table illustrates:
|
Operation
|
Total
# of executions (average of 3 runs)
|
|
Reads (totals retrieved)
|
3823
|
|
Writes (transfers executed)
|
3072
|
|
Inconsistent results
|
0
|
Approach
no 3
The third approach was something I was very
keen to test: in SQL Server 2005 there is a magic option (actually a couple of
them) which enables row versioning.
Row versioning is not a new thing (in fact
Oracle RDBMS used it "by definition" for as far as I care to remember) and the
whole concept works (very roughly) as follows:
|
Time
|
Reader
|
Writer
|
|
1
|
|
Writer transaction starts
|
|
2
|
Reader transaction starts
|
|
|
3
|
|
Row ID=1 Updated with balance+10=110, new
version number applied to the row and the
old version of the row gets stored in the "version store"
|
|
4
|
Tries to retrieve row ID=1, the engine
realizes that the row version is different than it was at T=2 so retrieves
previous version of the row from the "version store" with balance = 100.
|
|
|
5
|
|
Row ID=2 Updated with balance-10=90, new
version "number" applied and the old
version of the row gets stored in the "version store"
|
|
6
|
Tires to retrieve row ID = 2, the engine
realizes again that the row has been updated after the statement started so
retrieves previous version from the "version store" with balance = 100.
|
|
|
7
|
Produces correct total of 200 exactly as
it was at the time the statement started.
|
|
The simplest approach to enable statement
level row versioning is to use READ_COMMITTED_SNAPSHOT database option.

Executing the above statement enables row
versioning for statements (not transactions) executing at READ_COMMITED
isolation level which incidentally is the default isolation level. This means
that no changes in the application are usually necessary to benefit from this
type of row versioning. After setting the option the test app produced
following results:
|
Operation
|
Total
# of executions (average of 3 runs)
|
|
Reads (totals retrieved)
|
3103
|
|
Writes (transfers executed)
|
24855
|
|
Inconsistent results
|
0
|
As you can see this approach not only
solves the problem of inconsistent result but also substantially increases
throughput of the app. This is because one of the interesting side effects of row
versioning is the fact that no locks are applied during execution of SELECT
statements. When READ_COMMITTED_SNAPSHOT is active only "writers" block "writers"
which is in stark contrast to default SQL Server behaviour where everybody
locks just about everybody else (only readers do not block readers).
Approach
no 4
READ_COMMITTED_SNAPSHOT works at the
statement level i.e. consistent view of the data is maintained relative to the
start of the statement. In case of our test application this is perfectly
sufficient because GetTotal () stored procedure executes a single select
statement.
If we
wanted to maintain consistency at transaction level, the
ALLOW_SNAPSHOT_ISOLATION option has to be set to ON. The downside is that in
such a case row versioning has to be explicitly requested by setting the transaction
isolation level to SNAPSHOT.

Once the reader procedure has been modified
the test app produced following results:
|
Operation
|
Results
(average)
|
|
Reads (totals retrieved)
|
3156
|
|
Writer (transfers executed)
|
22697
|
|
Inconsistent results
|
0
|
The differences in performance of both
approaches using row versioning should be considered negligible as the
performance varied quite widely between runs.
Wrap
Up
As these results hopefully illustrate row
versioning not only solves some annoying result "consistency" problems but also
substantially increases performance of our sample app (see the following
graph).

I would not recommend blindly applying row
versioning strategies in every case as they put additional stress o the TEMPDB
(all updates have to save old row versions in there) and have some other
surprising properties but it's clearly an option worth considering for
scenarios where concurrency (locking and/or deadlocks) becomes an issue. Following
two graphs illustrate number of lock requests with row versioning disabled and
enabled. As you can clearly see the differences are substantial and so will be
the scalability of a system with frequent and concurrent reads and updates.
Lock requests and waits with READ_COMMITTED isolation level.

Lock requests and waits with READ_COMMITTED_SNAPSHOT isolation level (read line
illustrates processor usage during the test as the number of lock requests and is
minimal)
