Monday, June 04, 2012

The Flavours of MVar

Update: These functions have been cleaned up, improved with respect to error conditions and async exceptions, and put in the extra package.

The MVar is a flexible and powerful locking primitive, used extensively in Haskell. An MVar is like a box which is empty (has zero elements inside) or full (has one element inside). You block when trying to take from an empty MVar or put to a full MVar. On top of MVars, lots of interesting concurrent programs can be written. However, with such a flexible mechanism, there is scope for confusion. Every MVar can block on either a take or a put, but for any individual MVar it is likely you expect it to block on only one of those operations. In my programs I usually restrict my MVars to one of three flavours, each of which is described below.

Lock


The Lock guarantees single-threaded access, typically to some system resource.

type Lock = MVar ()

newLock :: IO Lock
newLock = newMVar ()

withLock :: Lock -> IO a -> IO a
withLock x = withMVar x . const

And as an example:

lock <- newLock
let output = withLock . putStrLn
forkIO $ do ...; output "hello"
forkIO $ do ...; output "world"

Here we are creating a lock to ensure that when writing output our messages do not get interleaved. This use of MVar never blocks on a put. It is permissible, but rare, that a withLock contains a withLock inside it - but if so, watch out for deadlocks.

Var


The Var operates on a mutable variable in a thread-safe way.

type Var a = MVar a

newVar :: a -> IO (Var a)
newVar = newMVar

modifyVar :: Var a -> (a -> IO (a, b)) -> IO b
modifyVar = modifyMVar

modifyVar_ :: Var a -> (a -> IO a) -> IO ()
modifyVar_ = modifyMVar_

readVar :: Var a -> IO a
readVar = readMVar

And as an example:

hits <- newVar 0
forkIO $ do ...; modifyVar_ hits (+1); ...
i <- readVar hits
print ("HITS",i)

Here we have a variable which we modify atomically, so modifications are not interleaved. This use of MVar never blocks on a put. No modifyVar operation should ever block, and they should always complete in a reasonable timeframe. A Var should not be used to protect some external resource, only the variable contained within. Information from a readVar should not be subsequently inserted back into the Var.

Barrier


A barrier starts with no value, is written to once, and read one or more times.

type Barrier a = MVar a

newBarrier :: IO (Barrier a)
newBarrier = newEmptyMVar

signalBarrier :: Barrier a -> a -> IO ()
signalBarrier = putMVar

waitBarrier :: Barrier a -> IO a
waitBarrier = readMVar

And as an example:

bar <- newBarrier
forkIO $ do ...; val <- ...; signalBarrier bar val
print =<< waitBarrier bar

Here we create a barrier which will contain some computed value. A thread is forked to fill the barrier, while the main thread waits for it to complete. A barrier has similarities to a future or promise from other languages, has been known as an IVar in other Haskell work, and in some ways is like a manually managed thunk. It is an error to signal a barrier more than once and a deadlock to never signal it. Since the barrier goes from empty to full, it never blocks on a put, unless you incorrectly call signal more than once.

Combining MVar Flavours - Once


The previous three MVar wrappers are the flavours of MVar which I use regularly. These can be combined into higher-level abstractions specific to certain situations. I now give two examples, intended to show how to combine these primitives.

The once function takes an action, and returns a new action. If the action is never called the argument action will never be executed, but if it is called more than once, it will only be executed once. We can write this function as:

once :: IO a -> IO (IO a)
once act = do
    var :: Var (Maybe (Barrier a)) <- newVar Nothing
    return $ join $ modifyVar var $ \v -> case v of
        Nothing -> do b <- newBarrier; return (Just b, do x <- act; signalBarrier b x; return x)
        Just b -> return (Just b, waitBarrier b)

Here we create a variable to store the result, whose state is either Nothing (we have not yet started computing) or Just a barrier (we have started computing, use this barrier to get the result out). I have found 'join $ modifyVar' is a common idiom, used to defer a blocking action (often waitBarrier) until after a modifyVar has completed, ensuring we preserve our invariant of not blocking inside a modifyVar. When running the resulting action, if the variable is a Nothing we create a new barrier, store it, and then start an action (after leaving the modifyVar) to compute the result, signal the barrier and return. If we already have a barrier, we just wait for this barrier.

[Note that you can implement once in terms of MVar directly, using only one MVar, but that violates the simple rules of the restricted MVars - rightly so, you have to use the MVar empty state to mean both atomic access to shared state, and to mean computation in progress.]

Combing MVar Flavours - Queue


As another practical example of using these restricted MVars, let us consider a special kind of queue. Message arrive individually, but are collected in bulk. When someone tries to retrieve message, if there are any messages waiting they are sent immediately. If there are no messages, the read blocks until either a message arrives or until a new reader arrives, in which case the old reader is sent away with nothing. This can be implemented as:

type Queue a = Var (Either [a] (Barrier [a]))

arrive :: Queue a -> a -> IO ()
arrive q x = modifyVar_ q $ \q -> case q of
    Left xs -> return $ Left $ xs ++ [x]
    Right b -> do signalBarrier b [x]; return $ Left []

collect :: Queue a -> IO [a]
collect q = join $ modifyVar q $ \q -> case q of
    Left xs@(_:_) -> return (Left [], return xs)
    _ -> do
        case q of Right b -> signalBarrier b []; _ -> return ()
        b <- newBarrier
        return (Right b, waitBarrier b)

The type of Queue tells us most of what we need to know about the invariants - Queue has a mutable state, which is either Left (zero or more messages waiting) or a Right (someone waiting to collect messages). If we had used MVar instead of both Var and Barrier, the invariant and design would be far less clear. With these invariants clearly stated, the code just follows directly.

Creating New Flavours


I find the three MVar wrappers (Lock, Var, Barrier) much easier to understand since the rules are simpler, making maintenance easier. I have also found that most projects benefit from higher-level abstractions in some places. As an example, I defined Queue in one recent project, and Shake defines a Resource type, on top of which the resources feature is implemented. Concurrency is hard, but robust abstractions split the complexity, and thus simplify the programs.

7 comments:

Unknown said...

Salve Neil! I included a link to this in my resource post on learning concurrency and parallelism on http://monoid.se/haskell/a-resource-post-on-concurrent-and-parallel-haskell/

If you find anything erroneous or the like, drop me a note. My email is available there, under the heading "Contact".

I *would* have written this as a mail to you instead, but I was unable to find an address of yours at this particular blog.

Cheers /Fredrik

Neil Mitchell said...

Hi Fredrik,

That resource looks useful, and you are quite right about Real World Haskell being wrong on this point. My contact email is at http://community.haskell.org/~ndm/contact/, but it should be on this blog - I need to update some of this blog anyway and have raised a bug to include my email (http://code.google.com/p/ndmitchell/issues/detail?id=554)

Thanks, Neil

Dave Turner said...

Hi Neil,

Not sure I understand your statement "It is an error to signal a barrier more than once and a deadlock to never signal it." They both look like deadlocks to me in the given implementation. (Of course, a deadlock is a kind of error so maybe that's what you meant?)

I also use the Barrier pattern, but typically use tryPutMVar. Sometimes multiple signals are acceptable, in which case I just discard the return value; other times they're not acceptable in which case throwing an exception if it returns False seems better than blocking forever.

Neil Mitchell said...

Dave: I mean "the user is in error if they signal it twice" - and hence the function can respond with undefined behaviour (deadlock, error message, raiding your fridge).

I totally agree that a more robust implementation would be useful for the real world. That's why I wrote the extra library, which provides signalBarrier: https://hackage.haskell.org/package/extra-1.2/docs/src/Control-Concurrent-Extra.html#signalBarrier . In particular, it raises an exception if you try and signal it twice, requires two MVar's to guarantee consistency and has a use of mask. I was a bit surprised how far the distance was between the code above and a totally robust implementation.

Dave Turner said...

Hi Neil,

You're right, that is surprisingly intricate. What's the problem with just using tryPutMVar and doing the appropriate thing with the return value (with a mask_ around it)?

When it starts to get that fiddly I tend to give up and use STM.

Cheers,

Neil Mitchell said...

Dave: Before base 4.7 readMVar is not atomic, it's a takeMVar then a putMVar, so waitBarrier and signalBarrier can race. With base 4.7 you could implement it as a single MVar, I think. I've added a note. There's a perverse fun in trying to write all these operations on top of MVar, but STM is very nice.

Dave Turner said...

Oh yes, you're right. Yuck!

Cheers,