In Part 1, we cloned PFP, a library for computing with probability distributions. PFP represents a distribution as a list of possible values, each with an associated probability.

But in the real world, things aren’t always so easy. What if we wanted to pick a random number between 0 and 1? Our previous implementation would break, because there’s an infinite number of values between 0 and 1—they don’t exactly fit in a list.

As it turns out, Sungwoo Park and colleagues found an elegant solution to this problem. They represented probability distributions as sampling functions, resulting in something called the λ calculus. (I have no idea how to pronounce this!)

With a little bit of hacking, we can use their sampling functions as a drop-in replacement for PFP.

A common interface

Since we will soon have two ways to represent probability distributions, we need to define a common interface.

type Weight = Float

class (Functor d, Monad d) => Dist d where
  weighted :: [(a, Weight)] -> d a

uniform :: Dist d => [a] -> d a
uniform = weighted . map (\x -> (x, 1))

The function uniform will create an equally-weighted distribution from a list of values. Using this API, we can represent a two-child family as follows:

data Child = Girl | Boy
  deriving (Show, Eq, Ord)

child :: Dist d => d Child
child = uniform [Girl, Boy]

family :: Dist d => d [Child]
family = do
  child1 <- child
  child2 <- child
  return [child1, child2]

Now, we need to implement this API two different ways: Once with lists, and a second time with sampling functions.

Finite distibutions, revisted

The monad from part 1 now becomes:

type FDist = PerhapsT ([])

instance Dist FDist where
  weighted [] = error "Empty distribution"
  weighted xws = PerhapsT (map weight xws)
    where weight (x,w) = Perhaps x (P (w / sum))
          sum = foldl' (+) 0 (map snd xws)

exact :: FDist a -> [Perhaps a]
exact = runPerhapsT

We can run this as follows:

> exact family
[Perhaps [Girl,Girl] 25.0%,
 Perhaps [Girl,Boy] 25.0%,
 Perhaps [Boy,Girl] 25.0%,
 Perhaps [Boy,Boy] 25.0%]

Sampling functions

This second part is also easy. First, we define a type Rand a, which represents a value of type a computed using random numbers.

We implement Rand by reduction to Haskell’s IO monad. (There’s any number of other ways to do this, of course.)

newtype Rand a = Rand { runRand :: IO a }

randomFloat :: Rand Float
randomFloat = Rand (getStdRandom (random))

Next, we make Rand a trivial, sequential monad:

instance Functor Rand where
  fmap = liftM

instance Monad Rand where
  return x = Rand (return x)
  r >>= f = Rand (do x <- runRand r
                     runRand (f x))

We can turn any FDist into a Rand by picking an element at random:

instance Dist Rand where
  weighted = liftF . weighted

liftF :: FDist a -> Rand a
liftF fdist = do
  n <- randomFloat
  pick (P n) (runPerhapsT fdist)

pick :: Monad m => Prob -> [Perhaps a] -> m a
pick _ [] = fail "No values to pick from"
pick n ((Perhaps x p):ps)
  | n <= p    = return x
  | otherwise = pick (n-p) ps

Of course, a single random sample won’t do us much good. We need functions for repeated sampling and for displaying the results:

sample :: Rand a -> Int -> Rand [a]
sample r n = sequence (replicate n r)

sampleIO r n = runRand (sample r n)

histogram :: Ord a => [a] -> [Int]
histogram = map length . group . sort

And sure enough, we get a nice, even distribution:

> liftM histogram (sampleIO family 1000)
[243,242,254,261]

That’s it! Two monads for the price of one.

Part 3: Coming soon.