Wednesday, October 01, 2014

Why Traversable/Foldable should not be in the Prelude

Summary: For GHC 7.10, Traversable and Foldable are going to be in the Prelude. I missed the original discussion, but I suspect it's a bad idea.

Types are how Haskell programmers communicate their intentions to each other. Currently, the Haskell Prelude contains:

mapM :: Monad m => (a -> m b) -> [a] -> m [b]

As of GHC 7.10, as part of something known as the Burning Bridges Proposal (ticket, discussion, I can't actually find a full proposal...), that will become:

mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)

Surely that's a good thing? Aren't more general types always better? Isn't the Prelude an archaic beast from the time before? I'd argue functions which are highly polymorphic are hard to use, and hard to think about, especially for beginners. I'd also argue the Prelude is remarkably well designed, not perfect, but quite an impressive feat.

What makes a type signature complex?

I've been thinking recently about what makes type signatures complex, both to practitioners, and to relative beginners. My rough metric is:

  • Fully concrete types are usually simple, as long as they aren't too long. The longer a type gets, the more complex it gets.
  • Types with functions in them aren't too bad (order-1 types), but as you go up to order-2 types things start to get more complex.
  • Fully polymorphic functions can be simpler than concrete functions, since they declare what you don't need to worry about.
  • Functions with type classes are more complex, since you need to read the type signature while looking at the context, and need to know each class being used.
  • Simple type classes (Eq, Show) aren't too bad, but custom type classes impose more of a burden.
  • As you add more type classes, the complexity grows faster than linearly. Three type classes are not three times as complex as one, but quite a bit harder.
  • Higher kinded type classes are significantly more complex than kind * type classes, e.g. Monad, Functor. The reason is that instead of having a hole you fill in, you now have a hole which itself has a hole.
  • The higher-kinded type classes Monad and Functor aren't as bad as the others, since Functor is really the "simplest" higher-kinded type class, and Monad is required knowledge for IO.
  • As you have more higher kinded type classes, the complexity burden grows even worse than for kind * type classes. Two is significantly more complex than one.

By that metric, the old mapM isn't too bad, but the new mapM is quite complex. It has two higher-kinded type classes, and one of them is not one of the common ones. I appreciate that making Foldable and Traversable key to Haskell will probably lead to them being more used, but now all beginners are going to have to wade through the Monad tutorial, their Foldable tutorial and their Traversable tutorial before they start programming (or just give up).

Why generality hurts

There are two main reasons why generality hurts:

Reading type signatures becomes difficult/impossible. We already have that problem with the Control.Arrow module, which (as far as most people use it), is just a pile of tuple combinators. But unlike other tuple combinators, these are ones whose type signature can't be understood. When I want to use &&& or *** I just pick randomly, see if it type checks, then try again. When other people I know what to use these functions they just use an explicit lambda. No one thinks of referring to the documentation, since the documentation presents a unification problem (which most of the people I know could solve), not an intuition.

Reading code becomes difficult. Haskell is brilliant for letting you write a composable pipeline of code that takes some input, does some processing, and produces some output. But that only works if you have enough concrete pieces in each function to read each piece in isolation. As an example:

test = foo . mapM baz . bar

Using the current mapM definition I can, in a fraction of a second, know the approximate shape of what foo consumes, and what bar produces. With the new mapM I don't, and have to keep more context in my head to reason about the code.

Who it hurts

Generality of this nature tends to hurt two types of people:

Beginners are hurt because they need to know more concepts just to get going. As a beginner I read through Data.List regularly to build up weapons in my arsenal to attack larger problems. The new Data.List will be generalised, and reading it won't give the insights I enjoyed. Maybe the beginner can instantiate all Foldable things to [], but that adds a mental burden to exactly those people who can bear it least.

Practitioners, those who are paid to code for a living, will have greater problems with maintenance. This isn't an unsubstantiated guess... I have taken over a project which made extensive use of the generalised traverse and sequence functions. Yes, the code was concise, but it was read-only, and even then, required me to "trust" that the compiler and libraries snapped together properly.

Who it benefits

The benefit probably comes from those who are already using the Applicative/Traversable classes regularly. For these people, they can probably avoid an import Prelude(). I am not against ever changing the Prelude, but I do think that for changes of this magnitude the ideas should probably be prototyped as a separate package, widely accepted, and only then should significant surgery be attempted on the Prelude. The classy-prelude work has gone in that direction, and I wish them luck, but the significant changes they've already iterated through suggest the design space is quite large.

Concluding remarks

I realise that I got to this discussion late, perhaps too late to expect my viewpoint to count. But I'd like to leave by reproducing Henning Thielemann's email on the subject:

David Luposchainsky wrote:

+1. I think the Prelude should be a general module of the most commonly
needed functions, which (generalized) folds and traversals are certainly
part of; right now it feels more like a beginner module at times.

It is certainly a kind of beginner module, but that's good. Experts know
how to import. Putting the most general functions into Prelude does not
work because:

1. There are often multiple sensible generalizations of a Prelude
function.

2. You have to add more type annotations since types cannot be infered
from the functions.

There is simply no need to change Prelude and all packages that rely on
specific types. Just don't be lazy and import the stuff you need!

I should change my vote to:

-10

Saturday, September 27, 2014

GHCID - a new GHCi based IDE (ish)

Summary: I've just released ghcid, which interactively shows the errors in your project on each save.

I'm please to announce ghcid, which is either "GHCi as a daemon" or "GHC + a bit of an IDE". I've been using it for my development for the last few days, and already find it quite valuable. Unlike other Haskell development tools, ghcid is intended to be incredibly simple. In particular, it doesn't integrate with any editors, doesn't depend on GHC the library and doesn't start web servers. It's under 200 lines of fairly dull Haskell, which talks over pipes to your existing ghci.

Using it

Run cabal update && cabal install ghcid to install it as normal. Then run ghcid --height=10 "--command=ghci Main.hs". The height is the number of lines you are going to resize your console window to (defaults to 8), and the command is how you start this project in ghci. Personally, I always create a .ghci file at the root of all my projects, which usually reads something like:

:set -fwarn-unused-binds -fwarn-unused-imports
:set -isrc
:load Main

With that you can pass --command=ghci (or nothing, since that is the default).

After that, resize your console and make it so you can see it while working in your editor. On Windows the ghcid console will automatically sit on top of all other windows. On Linux, you probably want to use your window manager to make it topmost or use a tiling window manager.

What you get

On every save you'll see a list of the errors and warnings in your project. It uses a single ghci under the hood, so even relatively large projects should update their status pretty quickly. As an example:

Main.hs:23:10:
    Not in scope: `verbosit'
    Perhaps you meant `verbosity' (imported from System.Console.CmdArgs)
Util.hs:18:1: Warning: Defined but not used: `foo'

Or, if everything is good, you see:

All good

This project is only a few days old, so please report any bugs you find.

What you want

I regularly use an IDE to develop in a Haskell-like language. I find that with the IDE I'm about 25% more productive than without it. While an IDE can provide lots of neat features (go to definition, search, type tooltips) I think most of the productivity gains come from:

  1. Syntax coloring.
  2. A list of errors and warnings...
  3. ...which is updated as you type...
  4. ...and are highlighted in the text (red squiggles).

Every text editor already provides syntax coloring. With ghcid you get the list of errors and warnings. To get the final two features you need to integrate with an editor. I'm hoping that ghcid can do some of the heavy lifting by taking a directory of files to treat as overrides, and producing a list of warnings/errors to a file. The rest is editor specific, and I hope to attempt integration with Sublime Text at some point (although would love some help).

Wednesday, September 24, 2014

Generating Open 3D Viewer Models

Summary: It's not obvious how to generate suitable Open 3D Viewer models, but with the right tools it isn't hard.

The Open 3D Viewer project is very cool, as an example here is a demo of a spinning cow rendered in the browser. For my wife's work I wanted to generate a 3D model of a fossil bedding plane. Effectively, given some x/y/z coordinates, produce a pretty rendering. It took a fair bit of guess work, so I wrote down the steps.

Step 1: Generate an OBJ file

Generate an OBJ file. You probably want an MTL file too, but it seems the 3D viewer only uses the Kd field. There are a few ways to get an OBJ file:

  • There are many samples on the web, including a snail in the Open 3D Viewer repo.
  • You can create OBJ files in a tool such as Blender, but the Blender interface confused me a lot (I am definitely not their intended audience).
  • You can generate an OBJ file using a Haskell script. I picked this method, and I'll write a blog about the script later, once I have some pretty pictures to show.

Step 2: Get the tools

There are some tools in the WebGL Loader project. Alas, that project says "for now I recommend r50 as the last stable revision". So now there are two tools to try, the latest and r50. I tried both. I had some limited success with r50 (it didn't seem to render properly, but it did run) while the latest revision segfaulted. Fortunately I found the tools in a Google Groups post, and have mirrored them in my repo (with trivial tweaks to support Python 2.7).

Step 3: Run objcompress

You need to run:

objcompress mymodel.obj mymodel.utf8 > mymodel.js

This will generate lots of mymodel*.utf8 files and mymodel.js.

Step 3: Run part_grouping.py

You need to run:

py part_grouping.py

(The file in the email is part&grouping.py, but I renamed my copy.) This script will interactively ask a really long list of questions. I generate the correct inputs into a file and pipe it in:

py part_grouping.py < response.txt

This generates the files groupings.txt and part_info.txt.

Step 4: Run make_viewer_metadata.py

Run:

py make_viewer_metadata.py

This generates the file entity_metadata.json.

Step 5: Get the viewer source

You can get the viewer source from the Open 3D Viewer repo. I have mirrored it in my repo, but I may tweak the viewer over time to match my wife's needs - you should get the original.

Step 6: Copy your files

Copy all the files from steps 1 to 4 to a directory inside the viewer named models/mymodel.

Step 7: Update the model list

Open up scripts/models.js and edit it to point at your model. For example:

o3v.MODELS = [{
  name:'mymodel.obj',
  scriptName:'mymodel.js',
  modelPath:'models/mymodel/',
  metadataFile:'entity_metadata.json',
  numLayers:12
}];

Step 8: View the result

You can view the result by opening index.html. In Chrome you may need to pass the flag --allow-file-access-from-files.

Tuesday, September 16, 2014

Towards Shake 1.0

Summary: I've just released a new version of Shake, with a --demo feature and an underlying continuation monad. I want to release v1.0 in the near future.

I've just released a new version of the Shake build system, version 0.13.3. While the version number is only 0.0.1 higher, the changelog lists a large number of improvements. In particular, two changes are:

  • The Action monad is now based on continuations, which allows Shake to suspend threads without requiring a GHC RTS thread. The result is significantly less memory used on thread stacks. I still find it quite amazing that Haskell has such powerful and robust abstraction mechanisms that a huge change doesn't even break the API.
  • The shake binary now features a --demo mode, invoked by running shake --demo. This mode generates a Shake project, compiles it, and shows off some of the features of Shake. You can the output of --demo here.

Version 1.0

With the two features above, I'm now looking towards Shake version 1.0. I'm not looking to make any significant backwards-incompatible change, or necessarily any code/API changes at all. However, if you have anything you think should be addressed before reaching such a milestone, please comment on the issue tracker or email the mailing list.

Shake website

The one thing I still want to finish before releasing version 1.0 is to have a proper website for Shake. I've registered shakebuild.com which will host the content, and have set up GitHub pages to serve it up. I have some rough content in the docs directory and a prototype generator in the website directory - as an example it currently generates something a bit like this for the user manual, but with a table of contents when run through the latest generator. I'd appreciate any help with the content, the generator, or the styling - just email the mailing list.

Sunday, September 07, 2014

Shake in the wild

Summary: I spotted a few things using Shake, which I had nothing to do with.

In the past few days I have come across several things using the Shake build system. I wasn't involved in any of them, and haven't (yet) tried any of them out, but they certainly look cool.

ToolCabal

Tibor Bremer from Utrecht University gave a talk at the Haskell Implementors Workshop 2014 about his ToolCabal project. This project replaces the "build a package" part of Cabal with something more flexible, supporting multiple simultaneous targets and more flexible preprocessors - all built on top of Shake. It doesn't attempt to tackle dependency resolution yet. There is a video of the talk:


Samplecount

The folks at Samplecount have written several Shake based things. None are yet on Hackage, so I suspect they are somewhat prototypes, but they look like they're already used quite seriously.

  • shake-cabal-build to make it easier to build your Shake build systems with Cabal. Shake build systems need to be compiled with GHC, for which I usually use ghc --make, but this project explains how to get things building with Cabal - important if your build system pulls in other libraries.
  • shake-language-c is a project to simplify building C/C++ projects with Shake. From the docs:

shake-language-c is a cross-platform build system based on the Shake Haskell library. The focus is on cross-compilation of C, C++ and Objective C source code to various target platforms. Currently supported target platforms are iOS, Android NDK, Google Portable Native Client, MacOS X, Linux and Windows (MinGW). Supported host platforms are MacOS X, Linux and Windows.

  • methcla is their mobile sound engine, which is built using this Shake script, which (unsurprisingly) uses shake-language-c and shake-cabal-build.

Wednesday, August 20, 2014

Continuations and Exceptions

Summary: In moving Shake to continuations, exceptions were the biggest headache. I figured out how to somewhat integrate continuations and exception handling.

The git repo for Shake now suspends inactive computations by capturing their continuation instead of blocking their thread, based on the continuations I described in a previous blog post. The most difficult part was managing exceptions. I needed to define a monad where I could capture continuations and work with exceptions, requiring the definitions:

data M a = ... deriving (Functor, Applicative, Monad, MonadIO)
throwM :: SomeException -> M a
catchM :: M a -> (SomeException -> M a) -> M a
captureM :: ((a -> IO ()) -> IO ()) -> M a

I'm using M as the name of the monad. I want equivalents of throwIO and catch for M, along with a function to capture continuations.

The first observation is that since catchM must catch any exceptions, including those thrown by users calling error, then throwM can be defined as:

throwM = liftIO . throwIO

Using throwIO gives better guarantees about when the exception is raised, compared to just throw.

The second observation is that sometimes I want to raise an exception on the continuation, rather than passing back a value. I can build that on top of captureM with:

captureM' :: ((Either SomeException a -> IO ()) -> IO ()) -> M a
captureM' k = either throwM return =<< captureM k

The third observation (which I observed after a few weeks trying not to follow it) is that the continuation may never be called, and that means you cannot implement a robust finallyM function. In particular, if the person who was intending to run the continuation themselves raises an exception, the continuation is likely to be lost. I originally tried to come up with schemes for defining the function passed the continuation to guarantee the continuation was called, but it became messy very quickly.

The properties we expect of the implementation, to a rough approximation, include:

  • catchM (x >> throwM e) (\_ -> y) >> z === x >> y >> z -- if you throw an exception inside a catchM, you must run the handler.
  • captureM (\k -> x) >>= y === x -- if you execute something not using the continuation inside captureM it must behave like it does outside captureM. In particular, if the captureM is inside a catchM, that catchM must not catch the exception.
  • captureM (\k -> k x) >>= y === x >>= y -- if you capture the continuation then continue that must be equivalent to not capturing the continuation.
  • captureM (\k -> k x >> k x) >>= y === (x >>= y) >> (x >>= y) -- if you run the continuation twice it must do the same IO actions each time. In particular, if the first gets its exceptions caught, the second must do also.

These properties are incomplete (there are other things you expect), and fuzzy (for example, the second property isn't type correct) - but hopefully they give an intuition.

The implementation was non-trivial and (sadly) non-elegant. I suspect a better implementation is known in the literature, and I'd welcome a pointer. My implementation defines M as:

type M a = ContT () (ReaderT (IORef (SomeException -> IO ())) IO) a

Here we have a continuation monad wrapping a reader monad. The reader contains an IORef which stores the exception handler. The basic idea is that whenever we start running anything in M we call the Haskell catch function, and the exception handler forwards to the IORef. We can define catchM as:

catchM :: M a -> (SomeException -> M a) -> M a
catchM m hdl = ContT $ \k -> ReaderT $ \s -> do
    old <- liftIO $ readIORef s
    writeIORef s $ \e -> do
        s <- newIORef old
        hdl e `runContT` k `runReaderT` s `catch`
            \e -> ($ e) =<< readIORef s
    flip runReaderT s $ m `runContT` \v -> do
        s <- ask
        liftIO $ writeIORef s old
        k v
  • We store the previous exception handler as old, and insert a new one. After the code has finished (we have left the catchM block) we restore the old exception handler. In effect, we have a stack of exception handlers.
  • When running the handler we pop off the current exception handler by restoring old, then since we have already used up our catch, we add a new catch to catch exceptions in the handler.

We then define captureM as:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
    old <- readIORef s
    writeIORef s throwIO
    f $ \x -> do
        s <- newIORef old
        flip runReaderT s (k x) `E.catch`
            \e -> ($ e) =<< readIORef s
        writeIORef s throwIO
  • We make sure to switch the IORef back to throwIO before we start running the users code, and after we have finished running our code and switch back to user code. As a result, if the function that captures the continuation throws an exception, it will be raised as normal.
  • When running the continuation we create a new IORef for the handler, since the continuation might be called twice in parallel, and the separate IORef ensures they don't conflict with each other.

Finally, we need a way to run the computation. I've called that runM:

runM :: M a -> (Either SomeException a -> IO ()) -> IO ()
runM m k = do
    let mm = do
            captureM $ \k -> k ()
            catchM (Right <$> m) (return . Left)
    s <- newIORef throwIO
    mm `runContT` (liftIO . k) `runReaderT` s

The signature of runM ends up being the only signature the makes sense given the underlying mechanisms. We define mm by using the facilities of captureM to insert a catch and catchM to ensure we never end up in an exception state from runM. The rest is just matching up the types.

Stack depth could potentially become a problem with this solution. If you regularly do:

captureM (\k -> k ())

Then each time a catch will be wrapped around the function. You can avoid that by changing captureM to throw an exception:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
    old <- readIORef s
    writeIORef s $ \_ ->
        f $ \x -> do
            s <- newIORef old
            flip runReaderT s (k x) `E.catch`
                \e -> ($ e) =<< readIORef s
    throwIO anyException

Here we unwind the catch by doing a throwIO, after installing our exception handler which actually passes the continuation. It is a bit ugly, and I haven't checked if either the catch is a problem, or that this solution solves it.

The implementation in Shake is a bit different to that described above. In Shake I know that captured continuations are never called more than once, so I
can avoid creating a new IORef in captureM, and I can reuse the existing one. Since I never change the handler, I can use a slightly less powerful definition of M:

type M a = ReaderT (IORef (SomeException -> IO ())) (ContT () IO) a

The resulting code is Development.Shake.Monad, which implements the RAW monad, and also does a few extra things which are irrelevant to this post.

The cool thing about Haskell is that I've been able to completely replace the underlying Shake Action monad from StateT/IO, to ReaderT/IO, to ReaderT/ContT/IO, without ever breaking any users of Shake. Haskell allows me to produce effective and flexible abstractions.

Tuesday, August 12, 2014

Safe library - generalising functions

Summary: The Safe library now has exact versions of take/drop, with twelve functions implemented on top of a generalised splitAt.

The Safe library is a simple Haskell library that provides versions of standard Prelude and Data.List functions that usually throw errors (e.g. tail), but wrapped to provide better error messages (e.g. tailNote), default values (e.g. tailDef) and Maybe results (e.g. tailMay).

I recently released version 0.3.5, which provides a new module Safe.Exact containing crashing versions of functions such as zip/zipWith (which error if the lists are not equal length) and take/drop/splitAt (which error if there are not enough elements), then wraps them to provide safe variants. As an example, the library provides:

takeExact    :: Int -> [a] -> [a]
takeExactMay :: Int -> [a] -> Maybe [a]

These are like take, but if the Int is larger than the length of the list it will throw an error or return Nothing. Some sample evaluations:

takeExactMay 2 [1,2,3] == Just [1,2]
takeExact    2 [1,2,3] == [1,2]
takeExactMay 2 [1] == Nothing
takeExact    2 [1] ==
    1:error "Safe.Exact.takeExact, index too large, index=2, length=1"
take 1 (takeExact 2 [1]) == [1]

So takeExactMay computes up-front whether the whole computation will succeed, and returns a Nothing if it will fail. In contrast, takeExact produces elements while they are present, but if you demand an additional element that is missing it will result in an error. All the exceptions in the Safe library are designed to provide the maximum level of detail about what went wrong, here telling us the index we were after and the length of the list.

The library provides takeExact, dropExact and splitAtExact, plus Def/May/Note versions, resulting in twelve similar functions. While the implementation of any one function is reasonably short (although not that short, once proper error messages are provided), I didn't want to write the same code twelve times. However, generalising over functions that check up-front and those that check on-demand requires a bit of thought. In the end I settled for:

splitAtExact_ :: (String -> r) -> ([a] -> r) -> (a -> r -> r) -> Int -> [a] -> r
splitAtExact_ err nil cons o xs
    | o < 0 = err $ "index must not be negative, index=" ++ show o
    | otherwise = f o xs
    where
        f 0 xs = nil xs
        f i (x:xs) = x `cons` f (i-1) xs
        f i [] = err $
            "index too large, index=" ++ show o ++ ", length=" ++ show (o-i)

Here the splitAtExact_ function has a parameterised return type r, along with three functional arguments that construct and consume the r values. The functional arguments are:

  • err :: String -> r, says how to convert an error into a result value. For up-front checks this produces a Nothing, for on-demand checks this calls error.
  • nil :: [a] -> r, says what to do once we have consumed the full number of elements. For take we discard all the remaining elements, for drop we are only interested in the remaining elements.
  • cons :: a -> r -> r, says how to deal with one element before we reach the index. For take this will be (:), but for functions producing a Maybe we have to check the r parameter first.

With this generalisation, I was able to write all twelve variants. As a few examples:

addNote fun msg = error $ "Safe.Exact." ++ fun ++ ", " ++ msg

takeExact = splitAtExact_ (addNote "takeExact") (const []) (:)

dropExact = splitAtExact_ (addNote "dropExact") id (flip const)

takeExactMay = splitAtExact_ (const Nothing) (const $ Just []) (\a -> fmap (a:))

dropExactMay = splitAtExact_ (const Nothing) Just (flip const)

splitAtExact = splitAtExact_ (addNote "splitAtExact")
    (\x -> ([], x)) (\a b -> first (a:) b)

splitAtExactMay = splitAtExact_ (const Nothing)
    (\x -> Just ([], x)) (\a b -> fmap (first (a:)) b)

Normally I would have defined takeExact and dropExact in terms of fst/snd on top of splitAtExact. However, in the Safe library error messages are of paramount importance, so I go to additional effort to ensure the error says takeExact and not splitAtExact.