Refactoring probability distributions, part 2: Random sampling
In Part 1, we cloned PFP, a library for computing with probability distributions. PFP represents a distribution as a list of possible values, each with an associated probability.
But in the real world, things aren’t always so easy. What if we wanted to pick a random number between 0 and 1? Our previous implementation would break, because there’s an infinite number of values between 0 and 1—they don’t exactly fit in a list.
As it turns out, Sungwoo Park and colleagues found an elegant solution to this problem. They represented probability distributions as sampling functions, resulting in something called the λ◯ calculus. (I have no idea how to pronounce this!)
With a little bit of hacking, we can use their sampling functions as a drop-in replacement for PFP.
A common interface
Since we will soon have two ways to represent probability distributions, we need to define a common interface.
type Weight = Float
class (Functor d, Monad d) => Dist d where
weighted :: [(a, Weight)] -> d a
uniform :: Dist d => [a] -> d a
uniform = weighted . map (\x -> (x, 1))
The function uniform
will create an equally-weighted distribution from a list of values. Using this API, we can represent a two-child family as follows:
data Child = Girl | Boy
deriving (Show, Eq, Ord)
child :: Dist d => d Child
child = uniform [Girl, Boy]
family :: Dist d => d [Child]
family = do
child1 <- child
child2 <- child
return [child1, child2]
Now, we need to implement this API two different ways: Once with lists, and a second time with sampling functions.
Finite distibutions, revisted
The monad from part 1 now becomes:
type FDist = PerhapsT ([])
instance Dist FDist where
weighted [] = error "Empty distribution"
weighted xws = PerhapsT (map weight xws)
where weight (x,w) = Perhaps x (P (w / sum))
sum = foldl' (+) 0 (map snd xws)
exact :: FDist a -> [Perhaps a]
exact = runPerhapsT
We can run this as follows:
> exact family
[Perhaps [Girl,Girl] 25.0%,
Perhaps [Girl,Boy] 25.0%,
Perhaps [Boy,Girl] 25.0%,
Perhaps [Boy,Boy] 25.0%]
Sampling functions
This second part is also easy. First, we define a type Rand a
, which represents a value of type a
computed using random numbers.
We implement Rand
by reduction to Haskell’s IO monad. (There’s any number of other ways to do this, of course.)
newtype Rand a = Rand { runRand :: IO a }
randomFloat :: Rand Float
randomFloat = Rand (getStdRandom (random))
Next, we make Rand
a trivial, sequential monad:
instance Functor Rand where
fmap = liftM
instance Monad Rand where
return x = Rand (return x)
r >>= f = Rand (do x <- runRand r
runRand (f x))
We can turn any FDist
into a Rand
by picking an element at random:
instance Dist Rand where
weighted = liftF . weighted
liftF :: FDist a -> Rand a
liftF fdist = do
n <- randomFloat
pick (P n) (runPerhapsT fdist)
pick :: Monad m => Prob -> [Perhaps a] -> m a
pick _ [] = fail "No values to pick from"
pick n ((Perhaps x p):ps)
| n <= p = return x
| otherwise = pick (n-p) ps
Of course, a single random sample won’t do us much good. We need functions for repeated sampling and for displaying the results:
sample :: Rand a -> Int -> Rand [a]
sample r n = sequence (replicate n r)
sampleIO r n = runRand (sample r n)
histogram :: Ord a => [a] -> [Int]
histogram = map length . group . sort
And sure enough, we get a nice, even distribution:
> liftM histogram (sampleIO family 1000)
[243,242,254,261]
That’s it! Two monads for the price of one.
Part 3: Coming soon.
Want to contact me about this article? Or if you're looking for something else to read, here's a list of popular posts.
sequence (replicate f xs)
is justreplicateM f xs
. Great articles, by the way!