December 10, 2018

5333 words 26 mins read

A touch of Haskell for the curious

Hello!, today we will write about things a newcomer to Haskell would need to ease the journey into the pure world, as for my self I could have certainly avoided the struggle in the paradigm shift if I were given something like this. I honestly hope this helps the typical oop/imperative oriented developer coming from C#, Python, Ruby, PHP, Java, JavaScript to Haskell.

WARNING!

If you are just about to begin the journey into Haskell, or you are not too deep in the woods to have noticed yet, let me warn you one little thing about Haskell and the FP world behind it… it’s a trap!.

Honestly, no one told me this, I just wanted to become good at Scala and what better introduction to functional programming that start with Haskell and then transition into Scala, then it would be a piece of cake since you will grasp the functional thinking with a fp-only language first. Up to this day I don’t know if this was a curse or a gift, I like to think is the latter, it must be, right?.

When I realized, it was too late, it had absorbed me, soon I began to read about similar stories, people wanting to try Haskell just because they wanted to prove other colleges their assumptions, something along the lines of “Haskell is too difficult and doesn’t bring anything new to the table”, or “why Haskell? JavaScript is by far the most widely adopted language today and it’s because simply put, it is the best.", they too are now trapped.

As Aditya Siram (in his talk “A (Not So Gentle) Introduction To Systems Programming In ATS”) put it:

People keep saying that learning programming languages makes you a better programmer, it really doesn’t, it makes you a better programmer up to a point… then it makes you bitter and dissatisfied…

That’s so true! πŸ˜‚ …

because you’ll never be able port those ideas to your day job, even if you could get past of the technical hurdle of your programming language not supporting those constructs you’ll never get there.

😧 -> 😟 -> πŸ˜• -> πŸ˜” -> 😫… Sad but kind of true, if you recognize the value that lies within functional programming you’ll see that in most organizations you simply can not use it (practicality, team, management, etc), however if you are reading this, it must mean you are not a “9 to 5”-er, you care deeply about all aspects of software development and as such it is a moral duty to spread the word, so that some day interstellar infrastructure is well constructed and powered by using mathematical principles not only at the physical/hardware level, but at the software level too (I’m being sarcastic btw).

With that, let’s begin.

The basics

Usually when learning other programming languages we tend to “map” what we already know about our language to the new language we’re learning, some of us even try at any point to defend our beliefs internalizing expressions like “X can also do this”, “This is easier to do in X”, I suggest avoid doing this, not because it might be true or false, but because once you grok Haskell, it will simply not make sense, we are at different leagues here, forget all you know about programming, we’re not in Kansas anymore (and I’m in Mexico).

Think everything is an immutable expression, once you state something is X then that identifier is X for ever, and that is an expression written in stone… in code, I mean. There are different types of expressions and expressions have types. Let’s begin with something easy:

awesome = True

Ahh, a pure expression 😌, awesome will always be True it doesn’t matter what you think, it simply is. From this moment awesome can’t be other thing:

awesome = "great"  -- Errr, will not compile, 'awesome' is already defined
awesome = False    -- Errr, will not compile, 'awesome' is already defined

Not only we can’t assign values of other type, but also, we can’t give it a value of the same type, is just immutable. awesome is a Bool, the compiler knows this because both True and False are values of this type. If we want to be clearer about our intent or state our assumptions about the type of an expression, we can use type annotations when defining an expression like this:

great :: Bool
great = True

We can read :: as “is a” or “has the type”, so here “great is a Bool”. By adding type annotations to definitions we can’t accidentally assign values of other types to an expression:

canNot :: Int
canNot = "hello"   -- Errr, will not compile, types don't match

Yes, this it may seem to basic, but when we’re dealing with more complex expressions this is awesome.

Functions

Now let’s write a function scream that takes no arguments and gives a good old terror movie scream back:

scream = "Aaaah!"

You see? values and functions without arguments are the same thing! we might just tell everything is a function! More over we don’t need to add an explicit return statement, the last expression is returned (“as in Ruby” I hear you scream with desperation 😏). Let’s define a function that takes a name to salute:

-- Define the function:
hello name = "Hello " ++ name

-- Use the function
hello "Joe"        -- Outputs: "Hello Joe"

The hello function takes one argument name, then it just appends that name to our "Hello " message, ++ is the concatenation operator, but is not an operator but a function!. To use the hello function we just specify the function name and the arguments will follow, no function-call-parenthesis required.

Functions defined with symbols like the ++ function are known as infix functions, because they are used in infix position, that is in between values, like mathematical operators. hello is not infix because its name is not comprised of symbols.

What happens if why change the type of the parameter:

hello True   -- Errr, will not compile, types don't match

Our hello function expects a String and only a String, why is that? Here the compiler infers the type of our function by inspecting what it does, or more precisely, by inspecting the types of the expressions that form our hello expression (because everything, even functions are expressions). Here the compiler sees we use the ++ function, and it so happens that the type of this function looks like this:

(++) :: String -> String -> String

Note: For our pedantic friend out there, I know this is not the type of the ++ function, but for simplicity we’ll assume it is.

If the type annotation contains any arrows (->) we can easily tell this is a function taking some arguments. The -> in the type definition is how type arguments of the function are separated in the type annotation, the last -> indicates the return type of the whole function. So we can say /"++ is an infix function that takes a String and another String to result in a final String.

We can see in our hello function definition, that we are giving "Hello " and name as the parameters to this function, and the final return value is then a value of type String, and thus our hello function has the following inferred type:

hello :: String -> String

That’s why when we pass anything to our function but a String we get an error back, because types do not match.

Expressions are not statements

This means we can’t write more than one expression in a function, and this may have the effect that we can’t create a sequence of instructions in a function, for example, this will not work at all:

myAgeNextYear :: Int -> Int -> Int
myAgeNextYear birthYear currentYear =
  nextYear = currentYear + 1   -- Attempt to declare a 'variable'
  nextYear - birthYear         -- Implicitly return the difference

That is not valid Haskell code, it will not compile, because the concept of “sequence” as in statements do not exists, but it can be “emulated” if one can says so with the let ... in ... expression:

myAgeNextYear :: Int -> Int -> Int
myAgeNextYear birthYear currentYear =
  let
    nextYear = currentYear + 1   -- Add a local definition
  in
    nextYear - birthYear         -- Implicitly return the difference

Note this pattern can be abused to “sequence” expressions, please don’t do that, that’s nasty πŸ‘Ž

Now, this will compile with no issues, but take a closer look, there are no statements here and no mutation, everything is an expression, including the let ... in ..., this means we can assign “its result” as a value:

message :: String
message = let word = "World" in "Hello " ++ word

And this can also be expressed using where as follows:

message' :: String
message' = "Hello " ++ word
  where word = "World"

The tick mark ' is just part of the function name, it doesn’t mean anything special here (in other contexts it could have meaning but that’s more advanced and not relevant for this introduction), other than that both message and message' work in the same way.

Currying and Function composition

Let’s say we have defined the following functions:

add a b = a + b

mul a b = a * b

square a = a * a

As this functions use arithmetic functions in their definition, it means these functions will work on numbers, let’s assume add and mul both have the type signature Int -> Int -> Int (although this is exactly not true it will make the examples easier), meanwhile the square function will have the type Int -> Int because it takes a single Int value and yields another one back. From now on, I’ll include type annotations just to be clear, so let’s add the type signatures to the functions:

add :: Int -> Int -> Int
add a b = a + b

mul :: Int -> Int -> Int
mul a b = a * b

square :: Int -> Int
square a = a * a

Now, on to currying.

Currying

Even though it looks that both add and mul take two parameters in fact they just take a single argument and both return a function. We can group the type annotation with parenthesis to actually see what’s happening:

add :: Int -> (Int -> Int)
add a b = a + b

mul :: Int -> (Int -> Int)
mul a b = a * b

This means that, add takes a value of type Int and yields a function that takes an Int value to yield another Int value back. What this means is that we can do partial function application and save the resulting function to another value:

add5 :: Int -> Int
add5 = add 5

-- Using 'add5':
add5 10  -- Outputs: 15

It can be done in JavaScript…” I hear you mumble. The point I’m making is that we can pass functions as values at any time, they are just expressions, and expressions can compose with other expressions.

Function composition

Now if we want to create a function that adds two numbers, then multiplies the result by 5, and finally squares the result we can do this:

compute :: Int -> Int -> Int
compute x y = square (mul 5 (add x y))

compute 3 4 -- Outputs: 1225

Note that, the parenthesis are not due to function calls as in many other programming languages, here they’re used just to group expressions, this is needed because in Haskell function application has the higher precedence and thus without the parenthesis we will be saying that the square function takes multiple parameters (the mul function being one of them) and this is not true. In Haskell we can use the $ function make it look cleaner:

compute :: Int -> Int -> Int
compute x y = square $ mul 5 $ add x y

It means the same thing, first apply add function, then mul 5 and finally square. Or read otherwise, apply square to everything to the right of $, for this we will first need to apply mul 5 to everything to the right of the second $.

You’ll see this starts to look like a pipeline with water flowing from right to left, but Haskellers prefer function composition, using the . function composes functions:

compute :: Int -> Int -> Int
compute x y = square . mul 5 $ add x y

But see where partially applying mul so we get a composable function, that means we can do the same with add and compose that one too:

compute :: Int -> Int -> Int
compute x y = square . mul 5 . add x $ y

The . is just a function that effectively combines functions, this means that on the fly we define a single function that is the same as applying square to the result of applying mul 5 to the result of the add x function. As this is just a function without a name (is being composed on the fly) we use the $ operator, to apply the function on the left to everything on the right.

This means the same thing:

compute :: Int -> Int -> Int
compute x y = (square . mul 5 . add x) y

Then, see how on both sides of the = sign we have y values? you can think of it as y is multiplying at both sides of the equation and just as in math equations, we can drop them off because the equality will hold, so this is the same function:

compute :: Int -> Int -> Int
compute x = square . mul 5 . add x

Reducing expressions in this way when there are implicit parameters being passed around is called point free style.

One thing it may trick you on functions defined as the composition of other functions (such as compute) is that computing steps are written backwards, other languages such as Elm or Elixir have a pipe operator that allows basically the same but with a sequential order, doing that is not well seen in Haskell as “it’s not the mathematical way” but if you’re stuborn you can achieve the same thing with the & function, is more like $ but reversed:

compute :: Int -> Int -> Int
compute x y = add x y & mul 5 & square

Note: This & function needs to be imported explicitly from Data.Function module.

This does not compose the functions but it acts as a pipeline, the result of the add function is then passed as the input to the mul function and its result then goes to the square function as its input. We can write it in multiple lines as well:

compute :: Int -> Int -> Int
compute x y
  = add x y
  & mul 5
  & square

But as I mentioned, this is not “the Haskell way” of doing things and one should try to look for composition when possible. But how can we know when it’s possible to compose functions? Haskell makes this very easy thanks to the type system, that is, by adding type annotations to our expressions we can clearly see the edges of our Lego pieces and see how they can fit one with another, effectively composing them. Take a look to our functions above, they all take and produce Ints, here it’s easy to see that what one function produces and what other needs are compatible types, so these Lego pieces will fit perfectly. When writing more complex expressions and abstract expressions in Haskell the same idea will remain, but unless you don’t learn how to use the type system to your favor, you won’t be able to see this, and you’ll always feel restricted when using Haskell and finally quit without knowing why others share so much respect and excitement about this language.

Lambdas

This is a good point to introduce lambdas which are function literals, the syntax to define one is \ followed by named arguments and -> to signal the end of parameter list and beginning of function body. The following compute example is 100% equivalent to this one using a lambda:

compute :: Int -> Int -> Int
compute = \x -> square . mul 5 . add x

Notice that since we’re binding compute to a lambda we no longer need to capture the x parameter to the left side of the =, but instead the lambda captures the argument that is needed for the add function, if we think about it, the lambda exists just to fulfill that purpose, so it means we can do this instead:

compute :: Int -> Int -> Int
compute = square . mul 5 . (\x -> add x)

Lambdas are just functions, and we can compose them as such.

Data types

In Haskell we have “primitive” data types, that is, data types that are very common in software, for example Int that represents a bounded integer, whereas Integer is unbounded and can represent any number no matter how big it is (up to your memory capacity limit). Float for floats and Double for double precision floating point number. There are more types used to describe numbers, I suggest checking out this gist about how numbers work in Haskell.

Other types you’re already familiar with is the Bool type which its only values can be either True or False. The [] (list) type to hold several values of any other type, this is known as a parameterized type or more formally a type constructor, that is, a type expression that needs other type to be provided to serve as a type.

Lastly a good type to know is the Maybe type, this is very similar to a list, whereas the list can hold no value (empty list), a single value (single element list) or multiple values of a type, the Maybe type can either have no value (in which case the value will be Nothing) or it has a single value (in which the value is Just value). For example we can define a function safeDiv that will not fail even if we divide by zero, in which case the value to be returned will be Nothing:

safeDiv :: Double -> Double -> Maybe Double
safeDiv a b =
  if b == 0 then
    Nothing
  else
    Just (a / b)

And by the way, that’s how if, then, else works in Haskell, but it isn’t used as much, we can achieve the same using guards:

safeDiv :: Double -> Double -> Maybe Double
safeDiv a b
  | b == 0    = Nothing
  | otherwise = Just (a / b)

By using guards notation we list out conditions after each | the first one to evaluate to True will be executed, that last one otherwise is just an alias to True, so we could put True directly instead and would achieve the same thing.

Custom data types

We can’t define classes or objects in functional programming as in object oriented programming, but we can define our own data types, this is a different way to model structures in our computer programs and in my opinion is a simpler yet more expressive approach. For example, if your application handle user’s email accounts its common to have the Email type defined, this can be done like this:

newtype Email = Email String

With this we can construct an email like this:

myEmail = Email "[email protected]"

The compiler will see the type of myEmail to be Email and we can define functions that expect values of this type to provided to them, then our code will only compile if these functions receive Emails and not any other type.

Of course, anyone could create an Email out of gibberish and it will be valid!:

gibberishEmail = Email "what's up?"

In order to avoid use we can define a function String -> Maybe Email, if the String is a valid email then we can give a Just email value back, otherwise Nothing will be returned. Then our module can hide (or make private) the Email value constructor so no one can use it directly, this will turn impossible to construct invalid emails.

But how to access the inner String in case I need it? Let’s suppose we just want to check weather an Email is on Gmail like this:

isOnGmail $ Email "[email protected]"  -- Outputs: False
isOnGmail $ Email "[email protected]"  -- Outputs: True

So isOnGmail is a function that should be able to receive an Email value but also check the suffix of the email String, and we already know we can’t treat an Email as a String because those are different types now. For that we can use pattern matching:

isOnGmail :: Email -> Bool
isOnGmail (Email address) = "gmail.com" `isSuffixOf` address

Note to the left of the = we can use the data constructor Email as if we were constructing one but here it means to deconstruct because it is a pattern match, the value of inner String will be bound to the address identifier. Also here we use the isSuffixOf function (to use it we will need to import it from Data.List module) in a prefix style, by surrounding the function name with ` we turn a non-infix function into a infix function. So previously when calling our add 4 3 function we could as well call it like 4 `add` 3 and it means the same thing. So this is equivalent:

isOnGmail :: Email -> Bool
isOnGmail (Email address) = isSuffixOf "gmail.com" address

If we can turn a non-infix function into a infix function, could we turn a infix function into non-infix? Yes we can, recall how add is defined:

add :: Int -> Int -> Int
add a b = a + b

The + there is an infix function, we can surround it with parenthesis ( ) to make it into a prefix (non-infix) function:

add :: Int -> Int -> Int
add a b = (+) a b

And we can apply that old trick, b is at both sides of the equation:

add :: Int -> Int -> Int
add a = (+) a

And a is too at both sides of the equation:

add :: Int -> Int -> Int
add = (+)

And now we see clearly add and + are the same function, they’re equal (ignoring sameness and equality distinctions).

Type aliases

We can also alias some types, that is give it several names to the same type, for example in Haskell a String is just an alias to a list of characters [Char] and is defined like this:

type String = [Char]

Or suppose that in your sales application want to track order numbers as if it was just a number:

type OrderId = Int

Unlike newtype, this would make Haskell accept an OrderId whenever an Int is expected and vice versa, so it wouldn’t warn you if you try to add up order numbers.

Algebraic data types

There’s a class of more interesting types that can be defined, these are types with more than one value or constructor, we’ve already met Bool, an algebraic data type roughly defined like this:

data Bool = True | False

This are called sum types and it means a value of type Bool can either be True or False but not both at the same time. In contrast to product types whose values can include multiple values at once, for example a Person has a Name and an Age:

type Name = String
type Age = Int

data Person = Person Name Age

The Person to the left side of the = is the type we’re creating, and the Person to the right-hand side is the name of the data constructor to create a value of such type, they don’t need to be called the same but is common practice to use the same name for both the type and data constructor. So this is how we create a value of type Person:

jane :: Person
jane = Person "Jane Doe" 21

Now you could create a function to check if Jane is allowed to drink.

Recall Maybe? we said is like a list that could hold one value or nothing at all inside it, when a type can hold a value of other type and we want to make this inner type “configurable” then we have an algebraic parameterized data type:

data Maybe a = Nothing | Just a

“Generics!" you may say. Yes and we can also define recursive algebraic parameterized recursive data types, as List:

data List a = Nil | Cons a (List a)

That is, a linked List can be either empty (Nil) or be a Cons that holds an element a of the list and references another List. We could create a list of lucky numbers like so:

luckyNumbers :: List Int
luckyNumbers = Cons 7 (Cons 21 (Cons 99 Nil))

Or we could also use $ instead:

luckyNumbers :: List Int
luckyNumbers = Cons 7 $ Cons 21 $ Cons 99 Nil

In Haskell lists are denoted with [] as already noted, so a more truthful definition for the list type in Haskell will be:

data [] a = [] | a : [a]

This is harder to read because it uses symbols as both the type and constructor names but overall is the same structure, Cons is : (as is a symbol this is infix) and Nil is [], so a Haskell list can be created like this:

luckyNumbers :: [] Int
luckyNumbers = 7 : 21 : 99 : []

Thanks to the associativity defined for these data constructor we don’t need to use parenthesis here, but will still to be ugly to write like that, so Haskell gives some syntax sugar for lists:

luckyNumbers :: [Int]
luckyNumbers = [7, 21, 99]

Now, product types can also be polymorphic and take more than one type variable (sum types can also take multiple type variables):

data Tuple a b = Tuble a b

So our Tuple holds values of different types:

tupleOne :: Tuple Int String
tupleOne = Tuple 3 "hello"

tupleTwo :: Tuple Bool Float
tupleTwo = Tuple False 3.14

In reality Haskell already provides tuples, these are denoted with parenthesis:

tupleOne :: (Int, String)
tupleOne = (3, "hello")

tupleTwo :: (Bool, Float)
tupleTwo = (False, 3.14)

Type classes

Type classes are a mechanism to decouple behavior from data, as you may have noticed when defining our own data types such as Person we didn’t define any behavior for values of such types, they just “are”, pure data. However under some circumstances it may be desirable to associate some behavior to values of some type, as an example let’s define both a circle and a rectangle:

data Circle = Circle { radius :: Double }

data Rectangle = Rectangle
  { base   :: Double
  , height :: Double
  }

Here we see a different syntax for data definitions, we can give names to the components of the data type, we only need to surround the declarations within braces { .. }, other than that it should look pretty straight forward, Circle has a radius of type Double, whereas Rectangle as a base and a height, both of type Double, these are called attribute fields or just fields.

Moving on, this is just data without any behavior associated to it, but we know they have something in common, both are shapes and we can compute the area on them, we can use type classes to express this behavior, this is how a type class is defined:

class Shape a where
  area :: a -> Double

What this means is that we want to introduce some behavior Shape for any value of type a, values for whom this behavior is defined we expect to be able to call the area function on them to get their area computed. At this point we have behavior in the type class and data in our data types, but they’re not linked, we do that by instantiating the type class on the data types we want:

instance Shape Circle where
  area (Circle r) = pi * r ^ 2

instance Shape Rectangle where
  area rectangle = height rectangle * base rectangle

We have to implement the behaviors dictated by the Shape type class, that is area in this case. Now, this may look like interfaces from OOP but with a subtle difference, the type knows nothing about the behavior associated to it, nowhere in the definition of Circle or Rectangle we mentioned they had to had this behavior, these are totally decoupled from each other and thus a behavior can be introduced at a later point or in a different place altogether.

Also notice, we accessed the inner data for Rectangle in a different way, for Circle we used pattern matching, whereas for Rectangle we used the same attribute fields we used when defined the Rectangle data type, this means that attribute fields are just functions that work on a value of the same type they’re defined on, that is height :: Rectangle -> Double and base :: Rectangle -> Double.

Let’s close this section by creating a function that just tells weather a shape is big or small:

isBigShape :: a -> Bool
isBigShape shape = area shape > 100

This will return true if the area of the given shape is greater than 100, however we’re missing something, the type a we receive could be anything, we know nothing about it, maybe is a number, maybe is a String!, we want isBigShape to work on member of the Shape type class, and for that we specify a constraint on the function type:

isBigShape :: Shape a => a -> Bool
isBigShape shape = area shape > 100

Now it’s all good, isBigShape is a function defined for any a, given that a is a member of the Shape type class:

isBigShape (Circle 20)  -- Outputs: True
isBigShape (Rectangle { height = 5, base = 10 })  -- Outputs: False
isBigShape "Hola"  -- Can not do it! 'String' type is not member of 'Shape'!

This last function, also let’s see something else about Haskell, we can have type variables in function signatures, so a function may work with values of any type, and when they do, it means they’re just working in some higher level structure, for example, recall our friends $ and . we used before? this is how they’re defined:

($) :: (a -> b) -> a -> b
($) f x = f x

(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)

Because the ability to call a function on a value does not depend on the type of the value it means that $ will have to work with values of any type, like wise, composition function should be able to compose functions no matter their types, so these are polymorphic functions because they can work on values of any type, not just Int or just String, but any type.

Conclusion

Maybe this post was overwhelming, or maybe not, but you can’t deny it is long.

Here I wanted to provide a general overview about the basics of Haskell syntax and semantics for already software developers using other mainstream languages. However, you can see I did not write about how to get started with Haskell, how to compile a program, what should we install, where to search documentation and packages, or in generally how the ecosystem is set up. We will see that later because this is already too loooong (many o’s there), and this was just a touch of Haskell.