sw1nn

Adventures. Stories. Code.

Clojure STM - Ref Consistency

| Comments

In my previous post, Clojure STM - What? Why? How? I gave a high level run through of Software Transactional Memory in clojure via Multiversion concurrency control. The first example demonstrated how a transfer between two bank accounts could be achieved transactionally. A second example talked about a report over all the bank accounts in the system, whilst transactions like those in the first example were proceeding.

This second example prompted some discussion on reddit.com. The main question was ‘are the values of the refs in all the accounts consistent’

This post is an investigation of the points raised

Lets look again at the code of that example…

1
2
3
4
5
6
7
8
9
10
11
12
13
; 50k entries, each a ref to a value 0-100
(def *all-accounts* (take 50000 (repeatedly #(ref (rand-int 100)))))

(defn total
   "Sum all accounts"
   [accounts]
    (dosync 
        (reduce + (map deref accounts))))

;=> (total *all-accounts*) ; confident that this had a 
                           ; consistent view  over 
                           ; *all-accounts*
2478427

The key point to notice here that the current value of each account is retrieved via the deref in the call (map deref accounts). This makes the current value of the ref the in-transaction value for the remainder of the transaction. Since there is a call to deref for each account and traversing the list takes finite time (especially since map returns a lazy sequence) is there a potential for a change to a value in the accounts collection while we’re traversing the list?

All reads of Refs will see a consistent snapshot of the ‘Ref world’ as
of the starting point of the transaction (its ‘read point’). The
transaction will see any changes it has made. This is called the
in-transaction-value.

How can we validate that this is the case? Here are two functions

  • A function that takes a ref and increments it after a given delay.
  • A function that derefs a given ref, delays a given amount then derefs a different ref.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(defn delay-then-inc-ref [id ref delay]
  (.start (Thread. #((println id "start")
                     (Thread/sleep delay)
                     (dosync
                      (alter ref inc))
                     (println id "end")))))

(defn deref-delay-deref [ref1 ref2 delay]
        (.start
           (Thread.
              #((println "READ start")
                (dosync
                  (println "transaction starting")
                  (let [a @ref2]
                     (Thread/sleep delay)
                     (println "S r1=" @ref1))) ; should be consistent with @ref2
                (println "READ end")))))

Given these two methods we can engineer a situation where we (deref ref2) in one transaction, update ref1 a few times in other transactions, then (deref ref1) in the first transaction. If the assertion about a consistent snapshot is correct then the updates to ref1 shouldn’t be visible in our first transaction.

So what happens when we run this?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(def r1 (ref 0))
(def r2 (ref 0))

(deref-delay-deref r1 r2 2000)
(delay-then-inc-ref "1" r1 500)
(delay-then-inc-ref "2" r1 1000)
(delay-then-inc-ref "3" r1 1500)

READ start
transaction starting
1 start
2 start
3 start
1 end
2 end
3 end
transaction starting
READ r1= 3
READ end
nil

The r1=3 result is not what we’re expecting. If the read of ref2 creates a ‘read point’ that all other refs are aligned with, the updates to ref1 should not be visible here. Also we see that the READ transaction is unexpectedly re-starting.

Remember that the ‘READ’ transaction here is read-only.

It seems that clojure has something of an inconsistency here. When the ‘READ’ transaction is complete, the runtime looks for an entry in the history of all the refs from before the transaction started. If it finds an entry then the transaction completes, otherwise the transaction is aborted and retried.

UPDATE

In this case this means that it (incorrectly in my opinion)picks up the latest version of ref1 and completes, returning r1=3.

So it seems that the concerns raised on reddit were valid, because clojure doesn’t strictly follow the behaviour described in the documentation.

Luckily there is a way to get the desired behaviour - add some history to the refs…

In this case this means that the transaction picks up the latest version of ref1 and completes, returning r1=3. This is a ‘correct’/consistent result, but the system has done unnecessary work.

So it seems that the concerns raised on reddit were valid, because clojure doesn’t strictly follow the behaviour described in the documentation.

There is a way to get the desired behaviour, but it feels like a hack. By adding history to the ref, the read-only transaction completes first time. To me, this change in behaviour, which isn’t incorrect exactly, is wasteful of resources…

END UPDATE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(def r1 (ref 0 :min-history 5)) ; force history to be preserved
(def r2 (ref 0 :min-history 5)) ; force history to be preserved

; artificially create a history entry
(dosync
 (ref-set r1 0)
 (ref-set r2 0))

(deref-delay-deref r1 r2 2000)
(delay-then-inc-ref "1" r1 500)
(delay-then-inc-ref "2" r1 1000)
(delay-then-inc-ref "3" r1 1500)

READ start
transaction starting
1 start
2 start
3 start
1 end
2 end
3 end
READ r1= 0
READ end
nil

Although this works, and would apply to our original model for accounts, I don’t really like it.

A better use of ref

There’s a better way to model this; Rather than make each account a ref in it’s own right, store the collection of accounts under a single ref and update the whole collection for each transfer.

Here’s how a simple implementation of that might look:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
; single ref to hold all accounts
(def *all-accounts* (ref []))

; 50k entries, each a value 0-100
(dosync 
  (ref-set *all-accounts*
           (take 50000 (repeatedly #(rand-int 100)))))

(defn total
   "Sum all accounts"
   [accounts]
    (dosync 
        (reduce + @accounts))))

(total *all-accounts*) ; confident that this had a 
                           ; consistent view  over 
                           ; *all-accounts*
;=> 2478427

The transfer function would be updated accordingly…

1
2
3
4
5
6
7
8
9
10
(defn transfer [ammount from to]
  "ammount => amount to transfer
   from    => index into accounts collection for from account
   to      => index into accounts collection for to account"
  (dosync
   (alter acc
          (fn [accounts]
            (assoc accounts
               to (- (accounts to) ammount)
               from (+ (accounts from) ammount))))))

In this post we’ve investigated consistency between refs, found a potential problem with the clojure runtime and eventually improved out model to avoid the problem.

UPDATED: Text above updated at 15:45GMT 17 Apr 2012 following comments from Antti-Ville Tuunainen

In the next post we’ll discuss a subtle requirement when dealing with refs. Namely - How do you ensure that a ref hasn’t changed during a transaction?

Comments