As my final year undergraduate project I’ve implemented Philip Wadler’s and Simon Peyton Jones’ Comprehensive Comprehensions in the Glasgow Haskell Compiler. I’m happy to report that my patch was accepted for inclusion in the compiler, so this is a feature you’ll really be able to use come the release of 6.10!

So, what are comprehensive comprehensions? Well, first let’s just review what list comprehensions are:

triples = [(a,b,c) | a <- [1..20]
                   , b <- [a..20]
                   , let sum = a^2 + b^2
                   , c <- [b..20]
                   , sum == c^2]

This Haskell code draws numbers from 1 to 20 into the variable a, and for each one of those elements draws the numbers from a to 20 into b. We then compute the sum of the squares of each of those numbers, and finally give up in all but the case where we can find some c in the range from b to 20 such that that sum is the same as the square of c. Assuming that we have passed this test, we output a tuple containing a, b and c. Clearly, this implements some code for finding small Pythagorean triples.

Pythagorean Triples

So, the result will be:

[(3,4,5), (5,12,13), (6,8,10), (8,15,17), (9,12,15), (12,16,20)]

We can build an analogy between what we can do with list comprehensions and what we can do with parts of SQL. Drawing an item from a list is like SELECTion, filtering out certain items is similar to a WHERE clause and so on. However, there are some prominent features of SQL that aren’t so easy to express with just these vanilla list comprehensions. How could we write the following query in Haskell?

SELECT Name, SUM(Salary)
FROM Employees
GROUP BY Dept
ORDER BY SUM(Salary) ASC
LIMIT 5

Well, the answer is that we can use our lovely new generalized list comprehensions and write this:

[(the dept, sum salary)
  | (name, dept, salary) <- employees
  , then group by dept
  , then sortWith by sum salary
  , then take 5]

Notice that the existing keyword “then” is being used to introduce the sorting and grouping functionality that we require: this happens to be a bit different from the syntax in the paper, so be careful you don’t get caught out by this.

What isn’t obvious from the presentation above is just how general these comprehensive comprehensions are! Unlike SQL you can sort or group at any point in the query without making use of subqueries. Unlike SQL you can use any aggregation function, not just one it has defined like COUNT or SUM. Most significantly, unlike SQL you can actually use alternative grouping and sorting functions! So, you could choose a grouping function that actually looked for runs in the thing you are grouping by, or a “sorting” function that just returns the first 5 elements of the list: if you look at the example above we used just this feature to implement the LIMIT clause of SQL.

These new pieces of syntax are pretty nice, and speaking from personal experience it’s quite hard to do heavy duty list manipulation without them once you’ve grown used to having them available. In particular, I found them to have a killer application in the code I used for benchmarking the syntax extension, for slicing and dicing lists of the runtime and heap usage of various program runs.

If you want the full story on how to use the syntax, you can check out the documentation. Go forth and conquer with your new, readable, list-munging powers! I’ll leave you with a picture of the guys you have it all to thank for, those giants of functional programming Simon Peyton Jones and Philip Wadler:

Simon Peyton Jones and Philip Wadler