Haskell has been tying my brain in knots. Sure, it keeps teaching me all sorts of amazing things, but it’s also forcing me to relearn the basics.
Right now, I’m trying to implement simple data structures in Haskell. It’s challenging, because most typical data structures are based on updating pointers, and Haskell doesn’t allow that. (Well, I could cheat, and use the IO monad, but where’s the fun in that?)
Fortunately, Chris Okasaki has done some amazing research into Haskell (and ML) data structures. He wrote an excellent thesis, which was later published as Purely Functional Data Structures. It’s a comprehensive book, well-written, and highly recommended for anyone programming in a functional language.
Let’s take a look at a data structure from Okasaki’s book: A purely functional queue. It’s competitive with a traditional queue (on average, at least), but it doesn’t use any pointers.
The queue API
A queue is a list of values to process. We can add values to the back of the queue, and retrieve values from the front.
In Haskell, we often begin by writing down the types of our functions. If we’re lucky, the code will fill itself in almost automatically. The first part of our API is easy:
-- A queue holding values of type 'a'. data Queue a = ??? -- Create a new queue. newQueue :: Queue a -- Test whether a queue is empty. empty :: Queue a -> Bool
The next part is a bit trickier. Since we can’t update our actual
Queue values, our data structure is read-only! We can work
around this by making
deq return an
updated version of the
-- Add an item to the back of the queue, -- returning the updated queue. enq :: Queue a -> a -> Queue a -- Remove an item from the front of the -- queue, returning the item and the -- updated queue. deq :: Queue a -> (a, Queue a)
We could use this API as follows:
q = newQueue `enq` 1 `enq` 2 `enq` 3 (x,q') = deq q -- x is 1 (y,q'') = deq q' -- y is 2
Now, the tricky bit: Making it run quickly.
Haskell queues, in O(1) amortized time
We could represent our queues as lists, but they’d be slow. Appending a value to the end of a Haskell list requires walking over all the existing values.
The trick is simple: We represent a queue as two lists. The first list contains the front of the queue, and the second list contains the back. We store the back part in reverse order, so we can add a new value in constant time:
data Queue a = Queue [a] [a]
empty functions are trivial:
newQueue = Queue   empty (Queue  ) = True empty _ = False
Adding a new item to the queue is easy. We just append it to the second list:
enq (Queue xs ys) y = Queue xs (y:ys)
The first two parts of
enq are similarly straightforward:
-- If the queue is empty, raise an error. deq (Queue  ) = error "Can't deq from an empty queue" -- If there's at least one item in the front -- part of the queue, return it. deq (Queue (x:xs) ys) = (x, Queue xs ys)
But what if the front part of the queue is empty? We can just reverse the back part and move it to the front:
-- If the front part is empty, reverse the -- back part, move it to the front, and try -- again. deq (Queue  ys) = deq (Queue (reverse ys) )
Our implementation of
enq runs in constant time. But
deq occasionally needs to reverse a list, which requires
walking the entire length of the list. Fortunately, each item only gets
moved once, so
deq averages out to constant time.
And note that our queues are “persistent”–we can hold onto earlier versions of the queue, in case we need to implement “Undo” or some sort of backtracking. (Update: As mauke points out, using this queue persistently can break the amortized O(1) guarantees. See Okasaki for a discussion and simple fix.)
There’s a lot more good stuff in Okasaki’s book: True constant-time queues, an explanation of how laziness and strictness affect running time, and a wide variety of standard data structures. Highly recommended.