Saturday, June 11, 2016

js-jquery 3.0.0 now out

The js-jquery Haskell library bundles the minified jQuery Javascript code into a Haskell package, so it can be depended upon by Cabal packages and incorporated into generated HTML pages. It does so in a way that doesn't require each Haskell package to bundle its own extra data files, and in a way that meets the licensing requirements of distributions such as Debian.

I've just released version 3.0.0, following on from jQuery 3.0.0 a few days ago. This release breaks compatibility with IE6-8, so if that's important to you, insert an upper bound on the package.

Friday, May 20, 2016

Another space leak: QuickCheck edition

Summary: QuickCheck had a space leak in property, now fixed (in HEAD).

Using the techniques described in my previous blog post I found another space leak, this time in QuickCheck, which has now been fixed. Using QuickCheck we can chose to "label" certain inputs, for example:

$ quickCheck $ \p -> label (if p > 0 then "+ve" else "-ve") True
+++ OK, passed 100 tests:
54% -ve
46% +ve

Here we label numbers based on their value, and at the end QuickCheck tells us how many were in each set. As you might expect, the underlying QuickCheck implementation contains a Map String Int to record how many tests get each label.

Unfortunately, the implementation in QuickCheck-2.8.1 has a space leak, meaning that the memory usage is proportional to the number of tests run. We can provoke such a space leak with:

quickCheckWithResult stdArgs{maxSuccess=10000} $
    \(p :: Double) -> label "foo" True

When running with ghc --make Main.hs -rtsopts && Main +RTS -K1K we get the error:

Main: Stack space overflow: current size 33624 bytes.

Using -K1K we have detected when we evaluate the space leak, at the end of the program, when trying to print out the summary statistics. The approach taken by QuickCheck for label is to generate a separate Map String Int per run, then at each step merge these Map values together using unionWith (+). As such, there are two likely culprits for the space leak:

  • Perhaps the Map is not evaluated, so in memory we have unionWith (+) x1 $ unionWith (+) x2 $ unionWith (+) x3 $ ....
  • Perhaps the values inside the Map are not evaluated, so in memory we have Map {"foo" = 1 + 1 + 1 + ...}.

QuickCheck avoids the first space leak by keeping its intermediate state in a record type with a strict field for the Map. QuickCheck suffers from the second problem. As usual, actually fixing the space leak is easy - just switch from importing Data.Map to Data.Map.Strict. The Strict module ensures that the computations passed to unionWith are forced before it returns, and the memory usage remains constant, not linear in the number of tests.

I detected this space leak because the Shake test suite runs with -K1K and when running one particular test on a Mac with GHC 8.0 in profiling mode it caused a stack overflow. I did not diagnose which of those factors was the ultimate cause (it may have even been the random seed at a particular point in time - only certain inputs call label).

Many space leaks are now easy to detect (using -K1K), moderate difficulty to debug (using the -xc technique or just by eye) and usually easy to fix.

Tuesday, April 19, 2016

New Shake with better wildcard patterns

Summary: The new version of Shake supports ** patterns for directory wildcards.

I've just released Shake 0.15.6. Don't be mislead by the 0.0.1 increment of the release, it's got over 50 entries in the changelog since the last release. There are quite a few bug fixes, documentation improvements and optimisations.

One of the most user visible features is the new wildcard patterns. In the previous version Shake supported // for matching any number of directories and * for matching within a path component, so to match all C source files in src you could write:

src//*.c

In the new version of Shake you can also write:

src/**/*.c

The // patterns remain supported, but I intend to encourage use of ** in new code if these patterns don't end up having any unforeseen problems. The advantages of the patterns in the new version are:

  • The ** patterns seem to be the defacto standard nowadays, being used by rsync, Ant, Gradle, Jenkins etc.
  • People would often write "src" </> "//*.c", which gives the unexpected result of //*.c. With ** you aren't overloading directories at the same time so everything works out as expected.
  • ** patterns only match relative files, not absolute ones, which is what you usually want in a build system. If you want to match absolute files use */**.
  • The semantics of patterns were a bit confusing for things like /// - I've now given them precise semantics, but ** avoids this confusion.
  • I've optimised the pattern matching for both flavours, so there is more precomputation and less backtracking (in practice I don't think that makes any difference).
  • I've optimised directory traversal using a file pattern, so it doesn't list directories that can't possibly match, which gives a significant speedup.

For this release I've also improved the website at shakebuild.com with more documentation - hopefully it is useful.

Monday, April 11, 2016

GHCid 0.6 Released

Summary: I've released a new version of GHCid, which can interrupt running tests.

I've just released version 0.6.1 of GHCid. To a first approximation, ghcid opens ghci and runs :reload whenever your source code changes, formatting the output to fit a fixed height console. Unlike other Haskell development tools, ghcid is intended to be incredibly simple - it works when nothing else does. This new version features:

Much faster: Since version 0.5 GHCid passes -fno-code to ghci when it makes sense, which is about twice as fast.

Interruptible test commands: Since version 0.4 ghcid has supported a --test flag to pass a test command (e.g. :main) which is run whenever the code is warning free. As of version 0.6 that command will be interrupted if it needs to :reload, allowing long running tests and persistent "tests" - e.g. spawning web servers or GUIs. Thanks to Reid Draper for showing it was possible as part of his ordeal project and Luigy Leon for merging that with GHCid.

Stack integration: If you have a stack.yaml function and a .stack-work directory it will use stack commands to run your project. Thanks to the Stack Team, in particular Michael Sloan, for helping get through all the hoops and providing the necessary functionality in Stack.

More restart/reload flags: It's been possible for a while to pass --restart to restart ghci if certain files change (e.g. the .cabal file). Now there is a separate --reload flag to cause :reload instead of a full restart, and both flags can take directories instead of individual files.

Major relayering: For 0.6 I significantly refactored much of the GHCid code. There has always been an underlying Language.Haskell.Ghcid API, and GHCid was built on top. With the new version the underlying library has been given a significant refactoring, mostly so that interruptions are handled properly without race conditions and with a sane multithreading story. On top of that is a new Session layer, which provides a session abstraction - a ghci instance which tracks more state (e.g. which warnings have been emitted for already loaded files). Then the Ghcid module builds on top, with much less state management. By simplifying and clarifying the responsibility of each layer certain issues such as leaking old ghci processes and obscure race conditions disappeared.


I've been making use of many of these features in the Shake website generator, which I invoke with:

ghcid --test=":main debug" --reload=parts --reload=../docs

This project uses Stack, so relies on the new stack integration. It runs :main debug as the test suite, which generates the website whenever the code reloads. Furthermore, if any of the parts (template files) or docs (Markdown pages) change the website regenerates. I can now edit the website, and saving it automatically regenerates the web pages within a second.

Wednesday, April 06, 2016

Github Offline Issues with IssueSync

For a while I've been looking for something to download the GitHub issues for a project. I do a lot of development work on a train with no internet, so referring to the tickets offline is very useful. I've tried lot of tools, in a very wide variety of languages (Ruby, Python, Perl, Javascript, PHP) - but most of them don't seem to work - and the only one I did manage to get working only gave a curses UI.

Finally, I've found one that works - IssueSync. Installing it worked as described. Running it worked as described. I raised tickets for the author and they fixed them. I even sent a pull request and the author discussed and merged it. It downloads all your issues to Markdown files in an issues directory. I then "list" my issues using:

head -n1 -q issues/*.md | grep -v CLOSED

It's simple and works nicely.

Thursday, March 03, 2016

Compiling GHC on Windows

Summary: How to compile GHC on Windows using Stack and the new Shake-based GHC build system.

Regularly updated instructions are at https://github.com/snowleopard/hadrian/blob/master/doc/windows.md - use them.

Here are a list of instructions to compile GHC, from source, on Windows. I tested these instructions on a clean machine using the free Windows 10 VirtualBox image (I bumped the VM CPUs to 4, and RAM to 4096Mb).

The first step is to install Stack (I just accepted all the defaults), then open a command prompt and run:

stack setup
stack install happy alex
stack exec -- pacman -S gcc binutils git automake-wrapper tar make patch autoconf --noconfirm
stack exec -- git clone --recursive git://git.haskell.org/ghc.git
cd ghc
stack exec -- git clone git://github.com/snowleopard/hadrian
stack build --stack-yaml=hadrian/stack.yaml --only-dependencies
stack exec --stack-yaml=hadrian/stack.yaml -- hadrian/build.bat -j

The entire process (after the VM has downloaded) takes a bit less than an hour. These steps use the Stack supplied tools (MinGW, Git), and the new Shake-based build system. The hope is that by using the isolation Stack provides, combined with the portability improvements from writing the build system in Haskell, these instructions will work robustly on many Windows machines.

I have not tried these instructions on other platforms, but suspect that removing the pacman line might be sufficient to get it to work.

Update: Instructions simplified following improvements to the build system.

Wednesday, February 17, 2016

Selling Haskell in the pub

Summary: A Haskell sales pitch without any code.

I often find myself in a pub, without pen or paper, trying to persuade someone to try Haskell. In these situations I tend to use three arguments:

Low cost abstractions: In Haskell the cost of creating a helper function is low, because the syntax for functions is very concise (a single line) and the optimiser often removes all overhead of the function (by inlining it). The power of such helper functions is greatly enhanced by higher-order functions. In many languages each function must be top-level (within a file or class), but Haskell permits functions local to a small block, providing encapsulation and less necessity for good names. In most languages there is a much higher cost per function, and thus trivial functions are insufficiently valuable. By reducing the cost, Haskell encourages both less repetition and describing problems in a more abstract way, which is very helpful for taming program complexity.

Refactoring works: In Haskell, refactoring is easy, safe and common. Most projects involve writing a chunk of code, then continually changing it as the project evolves. In other languages most refactorings have complex side conditions that must be met. In Haskell, most refactorings are simple, and even refactoring tools can have complex refactorings mechanically proven correct. Any code which violates the expected side-conditions is considered "dangerous" and libraries are expected to provide robust abstractions. The static type checker ensures that most refactorings have been carried correctly. It is common to change a fundamental type in the middle of millions of lines of code, quickly make the changes required and have a high degree of confidence that it still works. The ability to easily refactor means that code can evolve with the project, without accumulating technical debt.

Language polygots: There are few programmers who know only Haskell. I know Haskell programmers who are also experts in Perl, PHP, C, Javascript, C++, Fortran, R, Algol - pretty much any language you care to name. In contrast, when my wife has attended other programming language meetups, many of the participants knew only that language. There are many reasons Haskell programmers know lots of languages, not least because Haskell has rarely been taught as a first language. I find it interesting that many people who are experts in both Haskell and other languages typically prefer Haskell.

In response to these arguments, if people are starting to get convinced, they usually ask:

  • Are there lots of libraries? Yes, 9000+. There are more R libraries for statistics, more Perl libraries for Bioinformatics etc - but most of the standard stuff is covered. Wrapping a C library with the foreign function interface (FFI) isn't too hard.
  • What's the performance like? It's usually somewhere between C and Python. If you carefully optimise you can get close to the performance of C. The profiling tools are reasonable. Performance is not usually a problem, but can be solved with C and FFI if it is.

Many of the above arguments are also supportive of other statically typed functional languages. I tend to find my pub interventions are usually aimed at Python/Java/C++/PHP programmers, where I'd consider it a win if they tried O'Caml or Scala instead.