morphir

Modeling for database developers

Introduction

As a database developer you will find it very easy and enjoyable to learn Elm because it’s basically an extended version of SQL. Being a functional programming language it’s based on the same strong mathematical foundations as relational algebra and you get very similar declarative building blocks.

The syntax is different from SQL but it’s very lightweight and easy to pick up. Probably the best way to start learning is to see a few examples. Let’s start with a simple SQL query:

SELECT t.productID as product, t.quantity as qty
FROM Trades as t
WHERE t.buySell = 'S'

The SQL above translates to the following Elm code:

query1 =
    trades
        |> filter (\t -> t.buySell == "S")
        |> map
            (\t ->
                { product = t.productID
                , qty = t.quantity
                }
            )

There are a few things here that will look unfamiliar so let me explain:

Note: At this point you might be thinking: hold on, this is actually doing the whole operation in-memory, that’s completely different. This is where SlateX comes into the picture. Elm compiles to JavaScript so your assumption would be correct if we were just using Elm. With SlateX though we transpile to various execution platforms so this logic could turn into SQL, Spark SQL or even straight Java. There’s a lot of options.

So far we have been trying to map a SQL query directly to Elm. Where it gets interesting is when you start to use the power of the language. For example you can move out filters to named functions:

isSell t =
    t.buySell == "S"


query1 =
    trades
        |> filter isSell
        |> map
            (\t ->
                { product = t.productID
                , qty = t.quantity
                }
            )

This does the same thing but reads better and also allows reuse across queries. This in itself gives you a lot of power as a developer but it doesn’t stop there. With Elm you get a whole set of new features compared to SQL:

Schema

A database schema specifies the structure of the data stored in the database. In a programming language a type-system does the same thing. Elm’s type-system is a superset of what a database schema provides. It has record types which makes it very easy to model relational objects and offers a whole set of additional features that help you model your domain more accurately.

Let’s start with a simple DDL:

CREATE TABLE Trade (
    product_id CHAR(9) NOT NULL,
    quantity NUMBER(20, 0) NOT NULL,
    buy_sell CHAR(1) NOT NULL,
    comment VARCHAR(100) NULL
)

As mentioned above Elm has record types which makes it easy to define the same structure in the type-system:

type alias Trade =
    { productID : String
    , quantity : Int
    , buySell : String
    , comment : Maybe String
    }

Let’s go through the type definition to understand what it means:

As mentioned we did loose some granularity in this definition. One of the main goals of modeling is to very accurately model your domain so we need to address this. Fortunately Elm makes this easy. As a first step let’s define buySell as an enumeration.

type alias Trade =
    { productID : String
    , quantity : Int
    , buySell : BuySell -- changed from String
    , comment : Maybe String
    }

-- this type captures the fact that there are only 2 valid values here
type BuySell 
    = Buy 
    | Sell

Elm allows you to define types as a choice between different values. The simplest way to use these is to define enums. The definition above is pretty self explanatory: the type BuySell can either be a Buy or a Sell.

The type in the DDL is less accurate because it allows any single character strings as values. This makes it easy to overload the field which tends to happen in systems a lot. This definition makes it impossible to overload the value while also makes it easy to add new values if needed. The big difference is that you are forced to at least document the valid values.

An enum like this can be mapped to the DDL in many ways depending on the database product or team best-practices. The bottom line is that when you are modeling your data you don’t need to deal with that. You can leave that to the physical mapping layer.

The other thing we lost here is the lengths of some of the remaining columns. We can easily address that too:

type alias Trade =
    { productID : Cusip -- changed from String
    , quantity : Int
    , buySell : BuySell
    , comment : Maybe Comment -- changed from Maybe String
    }


type BuySell 
    = Buy 
    | Sell


type Cusip = 
    Cusip String

cusip =
    String.ofLength 9 Cusip


type Comment = 
    Comment String

comment =
    String.ofMaxLength 100 Comment

Branching out

Coming soon

More relational operators

Coming soon