Tuesday, January 27, 2015

Hoogle 5 is coming

Summary: I'm working on Hoogle 5. If you like unfinished software, you can try it.

For the last month I've been working on the next version of Hoogle, version 5. It isn't finished, and it isn't the "official" version of Hoogle yet, but it's online, you can try it out, and anyone who wants to hack on Hoogle (or things like hoogle-index) should probably take a look now.

How do I try it?

The purpose of this blog post isn't to solicit beta testers, it's not ready for that - I'm mostly reaching out to Hoogle developers. That said, I know some people will want to try it, so it's online at hoogle.haskell.org. Beware, it isn't finished, and haskell.org/hoogle remains the "official" version to use (but hey, use whichever you want, or both, or neither).

What's new for users of the website?

  • It isn't finished. In particular, type search hasn't been implemented. But there are also lots of other pieces in all the corners that don't work properly.
  • It searches all the packages on Stackage by default.
  • There is a drop-down to allow you to restrict focus to only a single package, or an author, or a tag.

What's new for everyone else?

  • There is no library, it's not on Hackage and the command line tool isn't designed for users. These things will be coming over time.
  • It's hosted on a haskell.org virtual machine, served through Warp rather than as a CGI program. Thanks to the Haskell Infrastructure team for all their help. As a result, I'm free to experiment with Hoogle without accidentally making the Haskell homepage say "moo".
  • Generating database for all of Stackage (minus download time) takes about 40s and uses < 1Gb of memory. The databases are built directly out of the .tar.gz files, without unpacking them.

What's next?

Hoogle 5 is a complete rewrite of Hoogle, stealing bits as they were useful. At the moment Hoogle 4 is hosted at https://github.com/ndmitchell/hoogle while version 5 is at https://github.com/ndmitchell/hogle. The next step is to rename and move github repos so they are sensible once again. My best guess is that I should rename hoogle to hoogle4, then rename hogle to hoogle and move the issue tickets. I'm open to other suggestions.

Once that's resolved, I need to continue fixing and improving Hoogle so all the things that aren't finished become finished. If you are interested in helping, I recommend you create a github ticket describing what you want to do and we can take it from there.

Tuesday, January 13, 2015

User Interfaces for Users

Summary: When designing a user interface, think about the user, and what they want to know. Don't just present the information you know.

As part of my job I've ended up writing a fair amount of user interface code, and feedback from users has given me an appreciation of some common mistakes. Many user interfaces are designed to present information, and one of the key rules is to think about what the user wants to see, not just showing the information you have easily to hand. I was reminded of this rule when I was expecting a parcel. On the morning of Saturday 10/1/2015 I used a track my parcel interface to find:

The interface presents a lot of information, but most of it is interesting to the delivery company, not to me. I have basically one question: when will my parcel arrive. From the interface I can see:

  • The parcel is being shipped with "Express AM" delivery. No link to what that means. Searching leads me to a page saying that guarantees delivery before 1pm on a business day (some places said noon, some said 1pm). That is useful information if I want to enter into a contract with the delivery service, but not when I'm waiting for a parcel. What happens on Saturday? Do they still deliver, just not guarantee the time? Do they wait until the next business day? Do they do either, as they see fit?
  • My parcel has been loaded onto a vehicle. Has the vehicle has left the depot, or is that a separate step? How many further steps are there between loading and delivery? This question is easy to answer after the parcel has been delivered, since the additional steps show up in the list, but difficult to answer before.

On Saturday morning my best guess about when the parcel would arrive was between then and Monday 1pm. Having been through the complete process, I now know the best answer was between some still unknown time on Monday morning and Monday 1pm. With that information, I'd have taken my son to the park rather than keeping him cooped up indoors.

I suggest the company augment their user interface with the line "Your parcel will be delivered on Monday, between 9am and 1pm". It's information they could have computed easily, and answers my question.

The eagle-eyed may notice that there is a plus to expand to show more information. Alas, all that shows is:

I think they're trying to say my puny iPhone can't cope with the bold font that is essential to tell me the status and I should get an iPhone plus... Checking the desktop website also showed no further information.

Wednesday, December 10, 2014

Beta testing the Windows Minimal GHC Installer

Summary: There is now a minimal Windows GHC installer that can install the network library.

Michael Snoyman and I are seeking beta testers for our new Windows GHC installer. The installer is very new, and we're keen to get feedback.

Update: If you want to install to Program Files (the default installation path), download the file, right click and hit "Run as administrator". This will be automatic in the next version.

One of the standard problems when upgrading Haskell libraries on Windows is that sometimes you have to upgrade a package which requires a configure script - typically the network library. When that happens, you have to take special measures, and even then, sometimes the binary will fail at runtime (this happens on about half my machines).

Our installer solves that problem by bundling GHC, Cabal and MSYS in one installer. You can install the network library out of the box without any special configuration and it just works.

Our installer does not provide all the packages included with the Haskell Platform, but it does provide an environment where you can easily install those packages. This minimal installer might be more suitable for developers.

Why might I want this installer

The installer is much easier to install than the GHC distribution (has a real installer, sets up the %PATH%, includes Cabal and can build the network library).

The installer has less libraries than the Haskell Platform installer, but includes MSYS so you can build everything in the Platform. The lack of additional packages and close correspondence with GHC (there is one installer for each GHC version) might make it more suitable for library developers. Once GHC 7.10.1 is released, we hope to make a GHC Minimal Installer available within days.

The installer also has more precise control over what is added to the %PATH%. You can add all the binaries it provides, or just add a single binary minghc-7.8.3 which when run at the command prompt temporarily adds all the installed binaries to the %PATH%. Using the switcher, you can easily switch GHC versions.

Wednesday, November 19, 2014

Announcing Shake 0.14

Summary: Shake 0.14 is now out. The *> operator became %>. If you are on Windows the FilePath operations work differently.

I'm pleased to announce Shake 0.14, which has one set of incompatible changes, one set of deprecations and some new features.

The *> operator and friends are now deprecated

The *> operator has been the Shake operator for many years, but in GHC 7.10 the *> operator is exported by the Prelude, so causes name clashes. To fix that, I've added %> as an alias for *>. I expect to export *> for the foreseeable future, but you should use %> going forward, and will likely have to switch with GHC 7.10. All the Shake documentation has been updated. The &*> and |*> operators have been renamed &%> and |%> to remain consistent.

Development.Shake.FilePath now exports System.FilePath

Previously the module Development.Shake.FilePath mostly exported System.FilePath.Posix (even on Windows), along with some additional functions, but modified normalise to remove /../ and made </> always call normalise. The reason is that if you pass non-normalised FilePath values to need Shake can end up with two names for one on-disk file and everything goes wrong, so it tried to avoid creating non-normalised paths.

As of 0.14 I now fully normalise the values inside need, so there is no requirement for the arguments to need to be normalised already. This change allows the simplification of directly exporting System.FilePath and only adding to it. If you are not using Windows, the changes are:

  • normalise doesn't eliminate /../, but normaliseEx does.
  • </> no longer calls normalise. It turned out calling normalise is pretty harmful for FilePattern values, which leads to fairly easy-to-make but hard-to-debug issues.

If you are using Windows, you'll notice all the operations now use \ instead of /, and properly cope with Windows-specific aspects like drives. The function toStandard (convert all separators to /) might be required in a few places. The reasons for this change are discussed in bug #193.

New features

This release has lots of new features since 0.13. You can see a complete list in the change log, but the most user-visible ones are:

  • Add getConfigKeys to Development.Shake.Config.
  • Add withTempFile and withTempDir for the Action monad.
  • Add -j to run with one thread per processor.
  • Make |%> matching with simple files much faster.
  • Use far less threads, with corresponding less stack usage.
  • Add copyFileChanged.

Plans

I'm hoping the next release will be 1.0!

Thursday, November 13, 2014

Operators on Hackage

Summary: I wrote a script to list all operators on Hackage, and which packages they are used by.

In GHC 7.10 the *> operator will be moving into the Prelude, which means the Shake library will have to find an alternative operator (discussion on the mailing list). In order to pick a sensible operator, I wanted to list all operators in all Hackage packages so I could be aware of clashes.

Note that exported operators is more than just those defined by the package, e.g. Shake exports the Eq class, so == is counted as being exported by Shake. However, in most cases, operators exported by a package are defined by that package.

Producing the file

First I downloaded the Hoogle databases from Hackage, and extracted them to a directory named hoogle. I then ran:

ghc --make Operators.hs && operators hoogle operators.txt

And uploaded operators.txt above. The code for Operators.hs is:

import Control.Exception.Extra
import Control.Monad
import Data.List.Extra
import System.Directory.Extra
import System.Environment
import System.FilePath
import System.IO.Extra

main = do
    [dir,out] <- getArgs
    files <- listFilesRecursive dir
    xs <- forM files $ \file -> do
        src <- readFileUTF8' file `catch_` \_ -> readFile' file `catch_` \_ -> return ""
        return [("(" ++ takeWhile (/= ')') x ++ ")", takeBaseName file) | '(':x <- lines src]
    writeFileUTF8 out $ unlines [unwords $ a : nub b | (a,b) <- groupSort $ concat xs]

This code relies on the normal packages distributed with GHC, plus the extra package.

Code explanation

The script is pretty simple. I first get two arguments, which is where to find the extracted files, and where to write the result. I then use listFilesRecursive to recursively find all extracted files, and forM to loop over them. For each file I read it in (trying first UTF8, then normal encoding, then giving up). For each line I look for ( as the first character, and form a list of [(operator-name, package)].

After producing the list, I use groupSort to produce [(operator-name, [package])] then writeFileUTF8 to produce the output. Running the script takes just over a minute on my ancient computer.

Writing the code

Writing the code to produce the operator list took about 15 minutes, and I made some notes as I was going.

  • I started by loading up ghcid for the file with the command line ghcid -t -c "ghci Operators.hs". Now every save immediately resulted in a list of warnings/errors, and I never bothered opening the file in ghci, I just compiled it to test.
  • I started by inserting take 20 files so I could debug the script faster and could manually check the output was plausible.
  • At first I wrote takeBaseName src rather than takeBaseName file. That produced a lot of totally incorrect output, woops.
  • At first I used readFile to suck in the data and putStr to print it to the console. That corrupted Unicode operators, so I switched to readFileUTF8' and writeFileUTF8.
  • After switching to writeFileUTF8 I found a rather serious bug in the extra library, which I fixed and added tests for, then made a new release.
  • After trying to search through the results, I added ( and ) around each operator to make it easier to search for the operators I cared about.

User Exercise

To calculate the stats of most exported operator and package with most operators I wrote two lines of code - how would you write such code? Hint: both my lines involved maximumBy.

Tuesday, November 11, 2014

Upper bounds or not?

Summary: I thought through the issue of upper bounds on Haskell package dependencies, and it turns out I don't agree with anyone :-)

There is currently a debate about whether Haskell packages should have upper bounds for their dependencies or not. Concretely, given mypackage and dependency-1.0.2, should I write dependency >= 1 (no upper bounds) or dependency >= 1 && < 1.1 (PVP/Package versioning policy upper bounds). I came to the conclusion that the bounds should be dependency >= 1, but that Hackage should automatically add an upper bound of dependency <= 1.0.2.

Rock vs Hard Place

The reason the debate has continued so long is because both choices are unpleasant:

  • Don't add upper bounds, and have packages break for your users because they are no longer compatible.
  • Add PVP upper bounds, and have reasonable install plans rejected and users needlessly downgraded to old versions of packages. If one package requires a minimum version of above n, and another requires a maximum below n, they can't be combined. The PVP allows adding new functions, so even if all your dependencies follow the PVP, the code might still fail to compile.

I believe there are two relevant relevant factors in choosing which scheme to follow.

Factor 1: How long will it take to update the .cabal file

Let us assume that the .cabal file can be updated in minutes. If there are excessively restrictive bounds for a few minutes it doesn't matter - the code will be out of date, but only by a few minutes, and other packages requiring the latest version are unlikely.

As the .cabal file takes longer to update, the problems with restrictive bounds become worse. For abandoned projects, the restrictive upper bounds make them unusable. For actively maintained projects with many dependencies, bounds bumps can be required weekly, and a two week vacation can break actively maintained code.

Factor 2: How likely is the dependency upgrade to break

If upgrading a dependency breaks the package, then upper bounds are a good idea. In general it is impossible to predict whether a dependency upgrade will break a package or not, but my experience is that most packages usually work fine. For some projects, there are stated compatibility ranges, e.g. Snap declares that any API will be supported for two 0.1 releases. For other projects, some dependencies are so tightly-coupled that every 0.1 increment will almost certainly fail to compile, e.g. the HLint dependency on Haskell-src-exts.

The fact that these two variable factors are used to arrive at a binary decision is likely the reason the Haskell community has yet to reach a conclusion.

My Answer

My current preference is to normally omit upper bounds. I do that because:

  • For projects I use heavily, e.g. haskell-src-exts, I have fairly regular communication with the maintainers, so am not surprised by releases.
  • For most projects I depend on only a fraction of the API, e.g. wai, and most changes are irrelevant to me.
  • Michael Snoyman and the excellent Stackage alert me to broken upgrades quickly, so I can respond when things go wrong.
  • I maintain quite a few projects, and the administrative overhead of uploading new versions, testing, waiting for continuous-integration results etc would cut down on real coding time. (While the Hackage facility to edit the metadata would be quicker, I think that tweaking fundamentals of my package, but skipping the revision control and continuous integration, seems misguided.)
  • The PVP is a heuristic, but usually the upper bound is too tight, and occasionally the upper bound is too loose. Relying on the PVP to provide bounds is no silver bullet.

On the negative side, occasionally my packages no longer compile for my users (very rarely, and for short periods of time, but it has happened). Of course, I don't like that at all, so do include upper bounds for things like haskell-src-exts.

The Right Answer

I want my packages to use versions of dependencies such that:

  • All the features I require are present.
  • There are no future changes that stop my code from compiling or passing its test suite.

I can achieve the first objective by specifying a lower bound, which I do. There is no way to predict the future, so no way I can restrict the upper bound perfectly in advance. The right answer must involve:

  • On every dependency upgrade, Hackage (or some agent of Hackage) must try to compile and test my package. Many Haskell packages are already tested using Travis CI, so reusing those tests seems a good way to gauge success.
  • If the compile and tests pass, then the bounds can be increased to the version just tested.
  • If the compile or tests fail, then the bounds must be tightened to exclude the new version, and the author needs to be informed.

With this infrastructure, the time a dependency is too tight is small, and the chance of breakage is unknown, meaning that Hackage packages should have exact upper bounds - much tighter than PVP upper bounds.

Caveats: I am unsure whether such regularly changing metadata should be incorporated into the .cabal file or not. I realise the above setup requires quite a lot of Hackage infrastructure, but will buy anyone who sorts it out some beer.

Sunday, November 02, 2014

November talks

Summary: I'll be talking at CodeMesh 2014 on 5th November and FP Days on 20th November.

I am giving two talks in London this month:

CodeMesh 2014 - Gluing Things Together with Haskell, 5th Nov

I'll be talking about how you can use Haskell as the glue language in a project, instead of something like Bash. In particular, I'll cover Shake, NSIS and Bake. The abstract reads:

A large software project is more than just the code that goes into a release, in particular you need lots of glue code to put everything together - including build systems, test harnesses, installer generators etc. While the choice of language for the project is often a carefully considered decision, more often than not the glue code consists of shell scripts and Makefiles. But just as functional programming provides a better way to write the project, it also provides a better way to write the glue code. This talk covers some of the technologies and approaches we use at Standard Chartered to glue together the quant library. In particular, we'll focus on the build system where we replaced 10,000 lines of Makefiles with 1,000 lines of Haskell which builds the project twice as fast. We'll also look at how to test programs using Haskell, how to replace ancillary shell scripts with Haskell, and how to use Haskell to generate installers.

FP Days 2014 - Building stuff with Shake, 20th Nov

I'll be giving a tutorial on building stuff with Shake. It's going to be less sales pitch, more how you structure a build system, and how you use Shake effectively. The abstract reads:

Build systems are a key part of any large software project, relied upon by both developers and release processes. It's important that the build system is understandable, reliable and fast. This talk introduces the Shake build system which is intended to help meet those goals. Users of Shake write a Haskell program which makes heavy use of the Shake library, while still allowing the full power of Haskell to be used. The Shake library provides powerful dependency features along with useful extras (profiling, debugging, command line handling). This tutorial aims to help you learn how to think about writing build systems, and how to make those thoughts concrete in Shake.