How to make Data.Set a monad

…and how to fake Lisp macros with Template Haskell

(I wrote this article in response to a comment by sigfpe. You may find it pretty dry reading, unless you want to build domain-specific languages in Haskell. Proceed at your own risk.)

Haskell’s built-in Monad type has some serious limitations. We can fix those limitations using a number of advanced Haskell techniques, including Template Haskell, Haskell’s closest equivalent to Lisp macros.

We can illustrate the limitations of Monad with an example from math. In set theory, we can define a set by specifying how to compute each element:

{ xy : x ∈ {1,2,4}, y ∈ {1,2,4} }

We can read this as, “the set of all xy, where x is one of {1,2,4}, and y is one of {1,2,4}.” To calculate the answer, we first multiply together all the possible combinations:

1×1=1, 1×2=2, 1×4=4, 2×1=2, 2×2=4, 2×4=8, 4×1=4, 4×2=8, 4×4=16

We then collect up the answers, and—because we’re working with sets–we throw away the duplicates:

{1,2,4,8,16}

Can we do the same thing in Haskell? Well, using Haskell’s list monad, we can write:

listExample = do
  x <- [1,2,4]
  y <- [1,2,4]
  return (x*y)

But when we run this, Haskell gives us lots of duplicate values:

> listExample
[1,2,4,2,4,8,4,8,16]

Our problem: We’re using lists (which can contain duplicate values) to represent sets (which can’t). Can we fix this by switching to Haskell’s Data.Set?

import qualified Data.Set as S

-- This doesn't work.
setExample = do
  x <- S.fromList [1,2,4]
  y <- S.fromList [1,2,4]
  return (x*y)

Unfortunately, this code fails spectacularly. A Haskell monad is required to work for any types a and b:

class Monad m where
  return :: a -> m a
  fail :: String -> m a
  (>>=) :: m a -> (a -> m b) -> m b

But Data.Set only works for some types. Specifically, it requires that values of type a can be ordered:

data (Ord a) => Set a = ...

As it turns out, we can make Data.Set into a monad. But be warned: The solution involves some pretty ugly Haskell abuse.

Splitting `Monad` in half

If we want to put a restrictions on a and b, we’ll need to move them into the signature of Monad. My first attempt looked like this:

{-# LANGUAGE MultiParamTypeClasses,
             UndecidableInstances #-}

import Prelude hiding (return, fail, (>>=))
import qualified Prelude

-- This won't work:
class NewMonad m a b where
  return :: a -> m a
  fail :: String -> m a
  (>>=) :: m a -> (a -> m b) -> m b

Unfortunately, the type b doesn’t appear anywhere in return or fail. This makes the type checker sad. But there’s a workaround, discovered by oleg. We can split Monad into two pieces:

class Monad1 m a where
  return :: a -> m a
  fail :: String -> m a

class (Monad1 m a, Monad1 m b) =>
      Monad2 m a b where
  (>>=) :: m a -> (a -> m b) -> m b

Using these type classes, we can finally make Set a monad:

instance (Ord a) =>
         Monad1 S.Set a where
  return = S.singleton
  fail _ = S.empty

instance (Ord a, Ord b) =>
         Monad2 Set.Set a b where
  m >>= f = (setJoin . S.map f) m
    where setJoin = S.unions . S.toList

This gets us very close to our goal:

setExample :: S.Set Int
setExample =
  S.fromList [1,2,4] >>= \x ->
  S.fromList [1,2,4] >>= \y ->
  return (x*y)

When we run this code, the duplicates are gone:

> setExample
fromList [1,2,4,8,16]

Rebuilding `do`: Macros

Unfortunately, GHC won’t let us use the built-in do syntax. Even though we’ve carefully hidden the regular >>= operator, GHC goes digging around in the libraries and finds it anyway. So we need to somehow replace the built-in do with a new version that uses our redefined >>=.

Update: As Brandon points out below, we can get the same result with -fno-implicit-prelude, which will force GHC to use whatever >>= is in scope. Thanks, Brandon!

We can build our own version of do using Template Haskell. Template Haskell allows us to generate code at compile-time, in fashion similar to Lisp macros. In the code below, $(...) means “insert some code here,” and [|...|] parses an expression and returns it as a data structure.

setExample' :: S.Set Int
setExample' = $(restricted [|do
  x <- S.fromList [1,2,4]
  y <- S.fromList [1,2,4]
  return (x*y) |])

The function restricted maps a parsed expression to another parsed expression.

As it turns out, we’re not allowed to define restricted (or any of our earlier type classes) in the same file as setExample'. So we need to create a file “RestrictedMonad.hs” and move all our definitions into it:

{-# LANGUAGE MultiParamTypeClasses,
             UndecidableInstances,
             TemplateHaskell #-}

module RestrictedMonad (
  Monad1, Monad2, return, fail, (>>=),
  restricted
  ) where

import Prelude hiding (return, fail, (>>=))
import qualified Prelude

import qualified Data.Set as S
import Language.Haskell.TH
import Language.Haskell.TH.Syntax

-- Our earlier definitions go here...

Now, we can define our “macro”:

-- Maps a quoted expression to a quoted
-- expression.
restricted :: Q Exp -> Q Exp
restricted code = do
  (DoE stmts) <- code
  expand stmts

Before we can define expand, we need one more piece. Template Haskell is based on monads, and we’ve redefined the built-in monad operators. To generate our replacement code, we’ll need to use the built-in return operator:

ret :: a -> Q a
ret = Prelude.return

Now we’re ready to define expand. This function expands a parsed do-body into a series of function calls. The implementation gets a little messy at times, due to limitations in Template Haskell.

-- pat <- expr; stmts...  We use lamE here
-- because we can't insert 'pat' directly
-- into the [|...|] form.
--
-- Note that we don't call 'fail' on
-- pattern-match failure, but we should.
expand (BindS pat expr:stmts) =
  [| $(ret expr) >>=
       $(lamE [ret pat]
              (expand stmts)) |]

-- let decls...; stmts...
expand (LetS decls:stmts) =
  letE (fmap ret decls)
       (expand stmts)

-- The final expression in the 'do'.
expand (NoBindS expr:[]) = ret expr

-- expr; stmts...
expand (NoBindS expr:stmts) =
  [| $(ret expr) >>=
       (\_ -> $(expand stmts)) |]

expand stmts =
  error ("Malformed 'do': "++show stmts)

For more information on Template Haskell, see Template Meta-programming for Haskell and DSL Implementation in MetaOCaml, Template Haskell, and C++ (PDF).

Some lesser-known monad laws

There are three well-known laws that all monads must obey. But there are some lesser-known monad laws that are automatically enforced in standard Haskell, thanks to the type checker. But now that we allow our monad to use the operations defined by Ord, we need to prove these extra laws manually.

This should be a complete list:

-- The standard monad laws.
join . return == id
join . fmap return == id
join . join == join . fmap join

-- The other monad laws: Return and join
-- are natural transformations.
fmap f . return == return . f
fmap f . join == join . fmap (fmap f)

-- The functor laws.
fmap id == id
fmap (f . g) == fmap f . fmap g

Ultimately, I’d like to see Data.Set become a monad in standard Haskell.

Brandon S Allbery KF8NH wrote on Mar 15, 2007:

If you use -fno-implicit-prelude, GHC will use whatever (>>=) is in scope instead of pulling in (Prelude.>>=) behind your back; see section 7.3.5 of the User's Guide.

sigfpe wrote on Mar 16, 2007:

Nice! That looks like it probably needed a lot of persistence to get working. There's also this which I haven't tried yet.

Eric wrote on Mar 16, 2007:

Brandon: Thanks! That's really good to know. sigfpe: The Template Haskell bits were actually pretty interesting. I've been meaning to look at Haskell code-generation for a while, and I'm pleased to discover that things mostly work. There are definitely some missing pieces in the API, though--not all of Haskell's grammar is actually available, and splicing is very limited. So Template Haskell is less powerful than Lisp macros (and it lacks the runtime code-generation support of MetaOCaml). But it's still powerful enough to allow for some interesting applications.

alpheccar wrote on Mar 16, 2007:

I recently used the GADT method to implement a restricted monad on my blog. But, I prefer the solution with template Haskell. Good idea !

pepe Iborra wrote on Mar 16, 2007:

Yeah, very cool stuff! It might be that Associated Types could provide a cleaner solution (see here), but this one is _good enough_ for me.

alpheccar: Note that you don't need Template Haskell if you use -fno-implicit-prelude, as Brandon points out. Still, the Template Haskell solution is definitely interesting in its own right: There are some nice features there, which could be useful for other things. pepe: Oh, that's cool. That might be the right way to get something like this into standard Haskell.

Pepe Iborra wrote on Mar 17, 2007:

The TH solution is very interesting indeed, since one would probably want to have code using your restricted monad as well as vanilla monads in the same module. In that case, rebinding the do notation via -fnoimplicit-prelude is not an option. Otherwise, you need to keep code that uses a restricted monad in a separate module, which would get annoying very quickly.

Miles Gould wrote on Mar 30, 2007:

Nice one! I ran into the problem of trying to define a monad on a subcategory of the category of types rather than on the whole thing a while back, but couldn't get anywhere with it - nice to see some solutions!

Adriaan Moors wrote on Apr 30, 2007:

Interesting! Scala's support for subtyping, higher-kinded types (as of 2.5.0, or in a nightly build), and implicit arguments makes it easy to define type classes that abstract over type class contexts (such as Ord a => in the case of Set as an instance of Monad). A simplified version (without the implicits to fully model type classes) looks like this: (full source)

trait Monad[T <: Bound[T], MyType[x <: Bound[x]], Bound[_]] {
  def map[S <: Bound[S]](f: T => S): MyType[S] 

  def flatMap[S <: RBound[S], RContainer[x <: RBound[x]], RBound[_],  
              Result[x <: RBound[x]] <: Monad[x, RContainer, RBound]]
              (f: T => Result[S]): Result[S] 

  def filter(p: T => Boolean): MyType[T]
}

class Set[T <: Ordered[T]] extends Monad[T, Set, Ordered]

Essentially, this says Monad is a type constructor that takes a type T, a type constructor MyType, and a type constructor Bound. More specifically:

Bound must be of kind * -> *
T must be of kind * and be a subtype of (Bound T)
MyType is of kind * -> * and its first argument (call it x) must be a subtype of (Bound x)

PS: If you'd like more details, let me know (adriaan |at| cs/kuleuven/be) -- I've been working on adding support for type constructor polymorphism (as we tend to call higher-kinded types etc.) to the Scala compiler over the past couple of months.

Lemming wrote on Jun 08, 2009:

The disadvantage of this approach is, that in a function you need a (Monad2 m a b) for each transition from a type 'a' to another type 'b'. Wolfgang Jeltsch gave a solution on Haskell-Cafe which only needs a constraints like (Monad1 m a, Monad1 m b). It uses type families and existential quantification for packing and unpacking the Ord dictionary. http://www.haskell.org/pipermail/haskell-cafe/2008-March/041084.html

Here an interactive link to Monad instance for Data.Set, again on Haskell-Cafe.

Splitting Monad in half

Rebuilding do: Macros

Some lesser-known monad laws

More posts

Splitting `Monad` in half

Rebuilding `do`: Macros