24 Days of Hackage: haskelldb

I conjecture that the majority of bugs that programmers encounter occur at the boundaries between 2 different systems. In Haskell, we do everything we can to encode our assumptions of behaviour via the type system, and perform strict validation when bringing outside information into our cosy type-safe bubble. However, this doesn’t help if you’ve requested the wrong data in the first place - all you can do is throw a runtime exception.

For me, this is a frequent problem with my choice of database access - postgresql-simple. Too often I have queried for foo, bar in some_table, but elsewhere assumed I was querying for bar, foo - the order doesn’t agree. In today’s post, we will look at haskelldb, which is a high level abstract interface to databases.

Fundamentally, haskelldb takes an encoding of your database, and then uses this information to construct SQL for you, and deals with mapping data too and from the format the database. Kicking off today’s example, again using the Helping Santa project from earlier posts, we begin by defining our database interface. First of all, we need to define the columns:

data PresentName = PresentName

instance FieldTag PresentName where
  fieldName = const "name"

presentName :: Attr PresentName String
presentName = mkAttr PresentName

data ChildName = ChildName

instance FieldTag ChildName where
  fieldName = const "name"

childName :: Attr ChildName String
childName = mkAttr ChildName

data ChildLocationLat = ChildLocationLat

instance FieldTag ChildLocationLat where
  fieldName = const "loc_lat"

childLocLat :: Attr ChildLocationLat Double
childLocLat = mkAttr ChildLocationLat

data ChildLocationLong = ChildLocationLong

instance FieldTag ChildLocationLong where
  fieldName = const "loc_long"

childLocLong :: Attr ChildLocationLong Double
childLocLong = mkAttr ChildLocationLong

For each column we introduce a unique type, we introduce an Attr for it (with its underlying Expr type), and also add FieldTag instances to map to and from the database column name. Now that we are armed with these column definitions, we can define our tables:

child :: Table (RecCons ChildName (Expr String)
               (RecCons ChildLocationLat (Expr Double)
               (RecCons ChildLocationLong (Expr Double)
child = baseTable "child"
          $ hdbMakeEntry ChildName
          # hdbMakeEntry ChildLocationLat
          # hdbMakeEntry ChildLocationLong

present :: Table (RecCons PresentName (Expr String)
                 (RecCons ChildName (Expr String)
present = baseTable "present" $ hdbMakeEntry PresentName
                              # hdbMakeEntry ChildName

We use baseTable to define the “base” tables in the database - the tables that actually have data in, and use hdbMakeEntry to add columns to each table.

It’s quite a bit of up front typing I’ll admit, but you could automate this typing with a custom step in your build process (easy with Shake!), or use Template Haskell. I’ve opted to write it all by hand as I find this helps further develop my understanding, as I only have to understand one thing at a time.

Now that we’ve represented the schema in code, let’s have a look at leveraging this to write some queries.

allPresents :: Query (Rel (RecCons PresentName (Expr String)
allPresents = do
  allPresents <- table present
  project $ presentName << allPresents ! presentName

presentsFor :: String
            -> Query (Rel (RecCons PresentName (Expr String)
presentsFor name = do
  children <- table child
  presents <- table present
  restrict $ children ! childName .==. presents ! childName
  restrict $ children ! childName .==. constant name
  project $ presentName << presents ! presentName

If you’ve only written SQL before this might look a little alien - haskelldb provides an API that is very close to relational algebra. In the first example we’ve taken the output of the entire present table and returned all rows. In the second example we’ve selected from two tables - a Cartesian join - and filtered this join on a predicate to find just the rows about a specific child (the child’s name is given by input).

The beauty of this interface is that haskelldb is free to rewrite your query in a more optimal form. You can have a look at the generated SQL by using ppSql:

> ppSql (presentsFor "Little Bobby Tables")
SELECT name2 as name
FROM (SELECT name as name2,
             name as name2
      FROM present as T1) as T1,
     (SELECT name as name1
      FROM child as T1) as T2
WHERE name1 = 'Little Bobby Tables' AND name1 = name2

Coupled with PostgreSQL’s stunning query planer, the end result is query with the same speed as one I’d write by hand!

haskelldb gives us a extremely expressive API for querying databases, and doesn’t sacrifice performance to do so. This allows you to consume and compose the API in a way that best suits the needs of your application. If you frequently select from the result of a join you can simply store part of this query separately, and compose it later.

One downside of haskelldb is that documentation can be a little bit terse, and the API is quite large, so it can be hard to get going sometimes. I highly recommend Chris Done’s series of blog posts on haskelldb if you’re interested in learning more.

You can contact me via email at ollie@ocharles.org.uk or tweet to me @acid2. I share almost all of my work at GitHub. This post is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.