Wednesday, October 01, 2014

Why Traversable/Foldable should not be in the Prelude

Summary: For GHC 7.10, Traversable and Foldable are going to be in the Prelude. I missed the original discussion, but I suspect it's a bad idea.

Types are how Haskell programmers communicate their intentions to each other. Currently, the Haskell Prelude contains:

mapM :: Monad m => (a -> m b) -> [a] -> m [b]

As of GHC 7.10, as part of something known as the Burning Bridges Proposal (ticket, discussion, I can't actually find a full proposal...), that will become:

mapM :: (Traversable t, Monad m) => (a -> m b) -> t a -> m (t b)

Surely that's a good thing? Aren't more general types always better? Isn't the Prelude an archaic beast from the time before? I'd argue functions which are highly polymorphic are hard to use, and hard to think about, especially for beginners. I'd also argue the Prelude is remarkably well designed, not perfect, but quite an impressive feat.

What makes a type signature complex?

I've been thinking recently about what makes type signatures complex, both to practitioners, and to relative beginners. My rough metric is:

  • Fully concrete types are usually simple, as long as they aren't too long. The longer a type gets, the more complex it gets.
  • Types with functions in them aren't too bad (order-1 types), but as you go up to order-2 types things start to get more complex.
  • Fully polymorphic functions can be simpler than concrete functions, since they declare what you don't need to worry about.
  • Functions with type classes are more complex, since you need to read the type signature while looking at the context, and need to know each class being used.
  • Simple type classes (Eq, Show) aren't too bad, but custom type classes impose more of a burden.
  • As you add more type classes, the complexity grows faster than linearly. Three type classes are not three times as complex as one, but quite a bit harder.
  • Higher kinded type classes are significantly more complex than kind * type classes, e.g. Monad, Functor. The reason is that instead of having a hole you fill in, you now have a hole which itself has a hole.
  • The higher-kinded type classes Monad and Functor aren't as bad as the others, since Functor is really the "simplest" higher-kinded type class, and Monad is required knowledge for IO.
  • As you have more higher kinded type classes, the complexity burden grows even worse than for kind * type classes. Two is significantly more complex than one.

By that metric, the old mapM isn't too bad, but the new mapM is quite complex. It has two higher-kinded type classes, and one of them is not one of the common ones. I appreciate that making Foldable and Traversable key to Haskell will probably lead to them being more used, but now all beginners are going to have to wade through the Monad tutorial, their Foldable tutorial and their Traversable tutorial before they start programming (or just give up).

Why generality hurts

There are two main reasons why generality hurts:

Reading type signatures becomes difficult/impossible. We already have that problem with the Control.Arrow module, which (as far as most people use it), is just a pile of tuple combinators. But unlike other tuple combinators, these are ones whose type signature can't be understood. When I want to use &&& or *** I just pick randomly, see if it type checks, then try again. When other people I know what to use these functions they just use an explicit lambda. No one thinks of referring to the documentation, since the documentation presents a unification problem (which most of the people I know could solve), not an intuition.

Reading code becomes difficult. Haskell is brilliant for letting you write a composable pipeline of code that takes some input, does some processing, and produces some output. But that only works if you have enough concrete pieces in each function to read each piece in isolation. As an example:

test = foo . mapM baz . bar

Using the current mapM definition I can, in a fraction of a second, know the approximate shape of what foo consumes, and what bar produces. With the new mapM I don't, and have to keep more context in my head to reason about the code.

Who it hurts

Generality of this nature tends to hurt two types of people:

Beginners are hurt because they need to know more concepts just to get going. As a beginner I read through Data.List regularly to build up weapons in my arsenal to attack larger problems. The new Data.List will be generalised, and reading it won't give the insights I enjoyed. Maybe the beginner can instantiate all Foldable things to [], but that adds a mental burden to exactly those people who can bear it least.

Practitioners, those who are paid to code for a living, will have greater problems with maintenance. This isn't an unsubstantiated guess... I have taken over a project which made extensive use of the generalised traverse and sequence functions. Yes, the code was concise, but it was read-only, and even then, required me to "trust" that the compiler and libraries snapped together properly.

Who it benefits

The benefit probably comes from those who are already using the Applicative/Traversable classes regularly. For these people, they can probably avoid an import Prelude(). I am not against ever changing the Prelude, but I do think that for changes of this magnitude the ideas should probably be prototyped as a separate package, widely accepted, and only then should significant surgery be attempted on the Prelude. The classy-prelude work has gone in that direction, and I wish them luck, but the significant changes they've already iterated through suggest the design space is quite large.

Concluding remarks

I realise that I got to this discussion late, perhaps too late to expect my viewpoint to count. But I'd like to leave by reproducing Henning Thielemann's email on the subject:

David Luposchainsky wrote:

+1. I think the Prelude should be a general module of the most commonly
needed functions, which (generalized) folds and traversals are certainly
part of; right now it feels more like a beginner module at times.

It is certainly a kind of beginner module, but that's good. Experts know
how to import. Putting the most general functions into Prelude does not
work because:

1. There are often multiple sensible generalizations of a Prelude
function.

2. You have to add more type annotations since types cannot be infered
from the functions.

There is simply no need to change Prelude and all packages that rely on
specific types. Just don't be lazy and import the stuff you need!

I should change my vote to:

-10

14 comments:

Francesco said...

Amen, brother!

Kevin Hammond said...

This is one of the reasons why we retreated from generalising >>= in Haskell 1.4 - too much generality is not always a good thing!

Max Bolingbroke said...

Yup. Scala's collections lib has attracted a lot of flak for it's beginner-unfriendly hyper-generalised types. It's sad to see Haskell succumb to the same trap.

Ramin said...

You have only convinced me that there should be a separate Prelude aimed at teaching beginners, like "Prelude.NooB". A good "Prelude.NooB" module would be as similar to classic Prelude as possible, but perhaps reduce the number of classes, especially all the different classes there are for Num, Rational, Integral, Floating, and so on.

Frankly, I wish Arrow, Applicative, Category, Monoid, and some of the newer packages not yet included in the Haskell Platform, such as Semigroups, were all included in Prelude. As it is now, my template file for creating a new Haskell module always has those modules included, and so does my ".ghci" file, because I uses those functions so often.

In fact, I would be in favor of comlpetely renaming fmap to "map" and getting rid of the list-specific "map" function.

Neil Mitchell said...

Ramin: Great, I've convinced you there should be two Preludes :). Instead of renaming Prelude and introducing a new Prelude, why not leave Prelude as it is and introduce Prelude.L33t? In fact, we can create a package base-l33t that contains this module, and people can try it out before we think about making it the official Prelude - it shouldn't be much hassle for the advanced users who want it (one line of import). Given the large support for this proposal, I don't get why this package doesn't already exist...

I'm not sure if this proposal does the map/fmap thing or not. It seems remarkably inconsistent if it doesn't, but I didn't see it mentioned either. I'm far more against generalising mapM than I am fmap, since if you look down the list of complexity of types, fmap is far less complex than mapM on Foldable. As a counter-weight to that, map is the first thing all beginners use, so perhaps it deserves to be super-simple.

Anonymous said...

As an Haskell autodidact, I completely agree with you. I used to hoogle functions via their type signature, which quite often cotained lists, since lists are one of the primary control structures in Haskell. As I see it, this sets them apart from other traversable (data) structures.

So for a newcomer to grasp basic list operations and be able to find the functions defining these operations is much more accesible when you don't have to peek into type classes first.

If you should decide one day on switching to Prelude.L33t, your programs stay valid with the more general definitions of mapM, sequence etc.

So I'm all in favour of a Prelude.L33t module and against raising the bar for newcomers to the language... BTW I have experience teaching Haskell, the bar is high enough as is.

Anonymous said...

Interesting post, and I fully agree. I may be still a beginner after several years of toying with Haskell, after all.
Traversable and Foldable are still two black holes for me…

What bother me most is the potential impact of such a change on tools like Hoogle or Hayoo! Will a search on the (a -> m b) -> [a] -> m [b] type still be able to find mapM ?

Neil Mitchell said...

Anon: I expect if the change goes through Hoogle will need to tweak its algorithms to find what it should, but it doesn't really make the problem any harder. My worry is that when users get to the type they were really after, will they really know it was what they were after.

I've been programming Haskell paid full-time for the last 6 years and still Traversable and Foldable are black holes to me!

Anonymous said...

I agree with your sentiments entirely.

I'm not sure mapM is a good example though: I think it's firmly in the class of functions a beginner won't use, or if she does it will be only after Googling for an example and pasting it into her code.

We already have map and fmap which are very similar functions with different degrees of abstraction. Perhaps there is merit in keeping this dichotomy. Perhaps there would be merit in expressing extra abstractions in functions like fmap which keeping map concrete.

On a different note, last night I found myself pondering replacing this:

x i = y i - z i

with

x = (-) <$> y <*> z

Happily I stopped myself in time, but I found it worrying how strong the siren call of abstraction sounded in my ears.

Pseudonym said...

I have no opinion on this proposal in particular, but I have always found the argument "it's confusing for beginners" puzzling. Unless the point of Haskell is for it to be a beginners' language, it should be optimised for users who already know at least the basics.

As a general rule, the more generic a function is, the more useful it is, up to a point. There is a ceiling above which more genericity hurts more than it helps. However, the opposite is not true: less generic almost always implies less useful. At the moment, all the good names (e.g. map) are reserved for what are clearly the least-generic (and hence least-useful) forms, and all the generic forms have inconsistent names (e.g. fmap vs mapM).

Yes, we all know how to import. If you make beginners do it, then it's cargo cult boilerplate. If you make non-beginners do it, then it's annoying friction. And the "almost no friction" aspect is one of Haskell's greatest selling points.

Luke Palmer said...

Good argument. I also care about the beginners. We don't want to introduce calculus with vector spaces and differential forms, and a great deal can be done with calculus without knowing any such thing, and introducing it as such would make a subject with an already steep reputation yet more impenetrable...

"Experts know how to import" is a great catch phrase.

Ben Kolera said...

A different prelude already *does* exist and I use it with good success:

http://hackage.haskell.org/package/base-prelude

Which is as simple as this in every file:

import Prelude ()
import BasePrelude

and it gives you all the general stuff from base.

This will only get better once semigroups and bifunctors are in base. :)

I really don't agree with this making code harder for the trained eye (as I and many others prefer things this way) but I'll certainly concede that it is not for everyone and it is very important to examine such things with beginners in mind.

And with that, while I think Haskell should not frighten people away unnecessarily I also think that anyone interested in Haskell is not going to be abstraction-phobic.

While it may cause a slight addition to learning, it should not be too hard to hand wave these things away as "list-like" things and then have the power there for everyone else. We're already forcing people through the gauntlet of monad and applicative tutorials; surely Traversable is easy after that, right?

Super simple things like using <> instead of ++ and foldMap instead of concatMap go a long way to meaning that you care less about the exact type and more about the shape, which can make refactors like changing from a Data.List to a Data.List.NonEmpty much less painful.

I'd caution comparing these things to Scala to be almost a strawman, though. The Scala libs were a ferocious beast of seemingly unprincipled inheritance, and that's before you even get started on CanBuildFrom ... ;P

I personally was very excited to hear that the prelude was getting cleaned up but it is interesting to know that there are some that are greatly opposed to it, too. Maybe this falls into the same kind of bucket as the dreaded monomorphism restriction; some people love it because it makes things easier for them and some people hate it because it makes their lives harder.

Dunno.

Neil Mitchell said...

Ben: I'm not sure anyone is that opposed to the monomorphism restriction - maybe John Hughes, but that's about it.

I think we probably stretch a lot of beginners to breaking point with Monad tutorials - it took me years to properly understand them. I worry that we're already losing people because of Monads (can't be helped, they're essential), and don't want to lose people due to something that is optional.

I think having alternative Preludes is a good thing, and I think your approach is great - I suggest you write down your experiences somewhere and see if it convinces others.

Anonymous said...

This is essentially an argument against parametric polymorphism. If someone finds that typeclasses and parametric polymorphism make code more difficult to reason about then I don't really think haskell is the best choice language for them -- maybe they would enjoy using go instead (or stick with haskell and learn why those things are so valuable.

I think a number of the arguments here are incorrect and in fact I think the opposite is true. For instance I disagree that polymorphic types + type classes make code and the corresponding type signatures more difficult to read/reason about. To the contrary I'd argue that they make code easier to read and reason about. This is in some ways objectively true -- changing concrete type parameters to polymorphic type parameters narrows down the possible implementations a function can have. If you make a type signature for a function more concrete, you are adding extraneous information that serves no purpose other than to mislead about what the code might do. Another argument where I think the opposite is true is the claim that using more generic code leads to having to learn and remember more things. The opposite is true. I can learn the api for monads once and then use that one api to code with many different instances of monad. If all those functions had to be re-implemented for each concrete instance of monad, then it would increase the surface area of the api I need to learn to write the same code. If there are 20 different monad instances I'm using then I would have to learn and remember 20x the amount of information.