[Sources: 1]
Below are Haskell's primitive data types. These cannot be defined in Haskell itself and must, therefore, be provided by the implementation.
Name | Description |
---|---|
Char |
An enumeration whose values represent Unicode characters |
Integer |
Arbitrary-precision integer |
Int |
Fixed-precision integer, typically 64-bit signed integer on 64-bit platforms |
Float |
Single-precision floating-point real number |
Double |
Double-precision floating-point real number |
Additional user-defined types can be defined as composites of other types using Haskell's support for algebraic data types.
Consider the following data
declaration:
data Colour = Red | Green | Blue
Colour
is the type constructor and appears on the left-hand side of the =
sign. This type has three data constructors, named Red
, Green
and Blue
which appear on the right-hand side of the =
sign. You use type constructors where you'd expect a type (e.g. in type annotations) and data constructors where you'd expect a value. Colour
is a type while Red
, Green
and Blue
contain values of type Colour
. This is known as a sum type.
Let's try a more interesting version of this type:
data Colour = RGB Int Int Int
Colour
is still a type. However, RGB
is not a value. It is a function taking three Int
s and returning a value of type Colour
:
Input | Output | Comment |
---|---|---|
λ> data Colour = RGB Int Int Int |
Defines type Colour |
|
λ> :info Colour or λ> :i Colour |
data Colour = RGB Int Int Int -- Defined at <interactive>:2:1 |
Shows information about type Colour |
λ> :t RGB |
RGB :: Int -> Int -> Int -> Colour |
Shows type of RGB |
λ> let c = RGB 10 20 30 |
Assigns name c to a Colour value |
|
λ> :t c |
c :: Colour |
Shows type of c |
λ> c |
<interactive>:13:1: No instance for (Show Colour) arising from a use of `print' In a stmt of an interactive GHCi command: print it |
What? |
λ> data Colour = RGB Int Int Int deriving Show λ> let c = RGB 10 20 30 λ> c |
RGB 10 20 30 |
That's better! |
This version of Colour
is a product type. It is isometric to a 3-tuple, or triple, of Int
s.
[Sources: 1]
There is another way of defining a product type with the added convenience of automatically-generated accessor functions. This is Haskell's record syntax:
data Colour = RGB { red :: Int, green :: Int, blue :: Int }
Just like the previous product type definition of Colour
, this definition consists of a triple of Int
s. Similarly, RGB
is a data constructor of type Int -> Int -> Int -> Colour
. However, this definition also names the three components and generates accessor functions for them, as you can convince yourself by defining Colour
in GHCi and using :t
on red
, green
, blue
. Each has type Colour -> Int
: i.e. each is a function taking a Colour
and returning an Int
.
Similarly, we might define a "pair" type as follows:
Input | Output |
---|---|
λ> data Pair a b = P { first :: a, second :: b } |
|
λ> :t P |
P :: a -> b -> Pair a b |
λ> :t first |
first :: Pair a b -> a |
λ> :t second |
second :: Pair a b -> b |
λ> indianaPi = P "pi" (4 / 1.25) |
|
λ> first indianaPi |
"pi" |
λ> :t first indianaPi |
first indianaPi :: [Char] |
λ> second indianaPi |
3.2 |
λ> :t second indianaPi |
second indianaPi :: Fractional b => b |
This is our first experience of parametric polymorphism. In this case, a
and b
are type arguments/variables and define the polymorphic type Pair a b
. Pair
is the type constructor in two type arguments, so-called because each distinct pair of concrete types substituted for a
and b
construct a new type. As with previous data
definitions, P
is a data constructor and construct specific values belonging to a concrete instantiation of Pair a b
.
The type of first indianaPi
is is also our first experience of a list in Haskell which we'll discuss in more detail soon and are ubiquitous in Haskell.
We now have enough knowledge to look into some other predefined Haskell types and polymorphic types that you'll run into a lot. These types are algebraic data types and can, therefore, be expressed using data
definitions and derived from Haskell's primitive types. The "definition" is some cases is really an instructive approximation of the real definition and elide detail that is superfluous for our immediate purposes:
Name | Definition | Description |
---|---|---|
Bool |
data Bool = True | False |
A nullary type constructor with two nullary data constructors True and False representing Boolean truth values |
String |
type String = [Char] |
A list of Unicode characters |
Ratio a |
n/a | A rational number represented as a pair of primitive numeric values |
Complex a |
n/a | A complex floating-point number represented as a pair of primitive numeric values |
[a] |
data [a] = [] | a : [a] |
A homogeneous singly-linked list represented as a pair of head element and tail list |
(a, b) , (a, b, c) etc. |
data (a, b) = (a, b) etc. |
A heterogeneous \(n\)-tuple |
() ("unit") |
data () = () |
A type with the single nullary member () |
We'll describe type
definitions in more detail later, but suffice it to say that type
defines simple type aliases for other types.
Given our previous definition of Colour
as a product type without record syntax, how do we extract the component values? This is where "pattern matching" comes in. Pattern matching is a mechanism for deconstructing Haskell values, so-called because the patterns mimic the data constructor invocation used to construct the value initially. Consequently, the runtime representation of values of product types retain sufficient information to allow code to determine how a value was constructed at runtime.
There are two-and-a-half distinct places where you'll see pattern matching:
case
expressions, used to deconstruct arbitrary valuesThis is equivalent to the code that the Haskell compiler generates for record accessor functions described previously. Here's an example using our trusty Colour
data type:
data Colour = RGB Int Int Int
red :: Colour -> Int
red (RGB r _ _) = r
green :: Colour -> Int
green (RGB _ g _) = g
blue :: Colour -> Int
blue (RGB _ _ b) = b
_
("unknown") is a "throwaway" name: it matches a value but does not assign a name to it which is useful when we don't care about that specific valueRGB r _ _
matches the value of type Colour
on its RGB
data constructor of type Int -> Int -> Int -> Colour
, matching r
to the first value of the triple and ignoring the second and third valuescase
An alternative implementation can make use of a case
expression:
data Colour =
RGB Int Int Int |
CMYK Float Float Float Float
colourModelV1 :: Colour -> String
colourModelV1 (RGB _ _ _) = "RGB"
colourModelV1 (CMYK _ _ _ _) = "CMYK"
colourModelV2 :: Colour -> String
colourModelV2 c =
case c of RGB _ _ _ -> "RGB"
CMYK _ _ _ _ -> "CMYK"
main :: IO ()
main =
let c1 = CMYK 0.5 0.5 0.5 0.0
cm1 = colourModelV1 c1
c2 = RGB 50 100 150
cm2 = colourModelV2 c2
in putStrLn ("cm1=" ++ cm1 ++ ", cm2=" ++ cm2)
This will yield cm1=CMYK, cm2=RGB
.
Patterns can be matched both in function argument position and in case
expressions. Here's a contrived example using our RGB
/CMYK
definition of Colour
:
data Point = Point { x :: Int, y :: Int }
data Line = Line
{ start :: Point
, end :: Point
, thickness :: Int
, colour :: Colour
}
lineRedness :: Line -> Int
lineRedness (Line _ _ _ (RGB r _ _)) = r
main :: IO ()
main =
let l = Line
(Point 10 10)
(Point 50 50)
1
(RGB 255 0 0)
in print (lineRedness l)
This will output 255
.
This example illustrates another important aspect of pattern matching, namely exhaustiveness. Let's change our main
function to the following to illustrate this:
main :: IO ()
main =
let l = Line
(Point 10 10)
(Point 50 50)
1
(CMYK 0.5 0.5 0.5 0.0)
in print (lineRedness l)
Let's compile and run this. Here's the output:
scratch: src/Main.hs:15:1-40: Non-exhaustive patterns in function lineRedness
Well, that's interesting but makes sense. The pattern in the definition of lineRedness
cannot match the value CMYK 0.5 0.5 0.5 0.0
since it was not constructed using the RGB
data constructor.
This class of issue can be addressed in one of several ways:
CMYK
data constructor: this would require a valid conversion from the CMYK
representation of a colour to RGB
in order to provide a red component_
to match any Colour
value after the RGB
match: this would require the existence of some "default" notion of line rednessThe option you will choose will depend primarily on the semantics of the function. To make this determination, you'd need to ask questions such as:
I'll illustrate these three alternatives to the existing behaviour. Note that there are also other ways to provide different runtime error messages through Haskell's various exception mechanisms etc. We will discuss these more when we get to I/O.
Here we pattern-match on CMYK
and apply a formula:
lineRedness :: Line -> Int
lineRedness (Line _ _ _ (RGB r _ _)) = r
lineRedness (Line _ _ _ (CMYK c _ _ _)) = round ((1.0 - c) * 255.0)
Look sideways and ignore the fact that we're ignoring the K-component!
Here we provide a catch-all or fall-through case:
defaultRed :: Int
defaultRed = 0
lineRedness :: Line -> Int
lineRedness (Line _ _ _ (RGB r _ _)) = r
lineRedness _ = defaultRed
Here we augment the return type:
lineRedness :: Line -> Maybe Int
lineRedness (Line _ _ _ (RGB r _ _)) = Just r
lineRedness _ = Nothing
We'll definitely talk more about Maybe
later. Suffice it to say for now that it's a type constructor taking one type variable and is analogous to option types and nullable types in other languages.
Something about this may irk you. Given that the compiler has full information about a given type's data constructor (at least in code defined in the same module as the type), surely it must be possible to detect when a pattern match misses one or more cases at compile time instead of at runtime. In fact, you should be demanding an explanation of this, given Haskell's much-vaunted strong static type system and supposed type safety.
Well, it turns out that this can be detected at compile time by enabling the incomplete-patterns
warning. From GHCi, this can be done using the :set
command:
λ> :set -Wincomplete-patterns
λ> data Colour = Red | Green | Blue
λ> render :: Colour -> String; render Red = "red"
<interactive>:5:29: warning: [-Wincomplete-patterns]
Pattern match(es) are non-exhaustive
In an equation for ‘render’:
Patterns not matched:
Green
Blue
When compiling source files, the same can be achieved by passing -fwarn-incomplete-patterns
on the GHC command line, setting ghc-options
in your project's .cabal file or by inserting a "pragma" into the top of a source file as follows:
{-# OPTIONS_GHC -fwarn-incomplete-patterns #-}
data Colour = Red | Green | Blue
render :: Colour -> String
render Red = "red"
main :: IO ()
main = putStrLn (render Green)
However, this is still just a warning: therefore, the build still succeeds and the program still runs and fails at runtime. Therefore, we need to promote warnings to errors.
{-# OPTIONS_GHC -fwarn-incomplete-patterns -Werror #-}
data Colour = Red | Green | Blue
render :: Colour -> String
render Red = "red"
main :: IO ()
main = putStrLn (render Green)
This will generate a similar warning as above and will abort the build.
So, why? The general consensus, among GHC developers at least, seems to be that the exhaustiveness checker "is not very good": it tends to report false positives and counterintuitive error messages given in terms of desugared forms instead of the original source code—we'll look at desugaring a little more later.
Personally, I think that this is a bit of a wart on Haskell in principle, but—in practice—I've never run into a false positive and so would recommend enabling the warning and promoting warnings to errors.
Haskell provides type
definitions used to create aliases, or alternative names, for existing types:
type Ordinate = Int
data Point = Point Ordinate Ordinate
doubleInt :: Int -> Int
doubleInt x = x * 2
doubleX :: Point -> Int
doubleX (Point x _) = doubleInt x
main :: IO ()
main = print (doubleX (Point 100 200))
You'll see type
used a lot in Haskell code even though it doesn't give you any type safety—as shown in the sample above, Ordinate
is completely indistinguishable from a regular Int
.
newtype
newtype
, on the other hand, is an altogether different beast:
newtype Ordinate = Ordinate { unOrdinate :: Int }
data Point = Point Ordinate Ordinate
doubleInt :: Int -> Int
doubleInt x = x * 2
doubleX :: Point -> Int
doubleX (Point x _) = doubleInt x
main :: IO ()
main = print (doubleX (Point 100 200))
Attempting to compile this example will result in the following:
Main.hs:9:33: error:
• Couldn't match expected type ‘Int’ with actual type ‘Ordinate’
• In the first argument of ‘doubleInt’, namely ‘x’
In the expression: doubleInt x
In an equation for ‘doubleX’: doubleX (Point x _) = doubleInt x
newtype
defines a distinct new type whose internal representation is equivalent to a base type. In this case the new type is Ordinate
and the underlying (base) type is Int
. Syntactically, a newtype
definition is closer to a data
definition—with exactly one data constructor and exactly one field inside it, which can be specified in normal or record syntax. Furthermore, like data
and unlike type
, the resulting type is distinct from, and not directly compatible with, the original type. In order to pass an Ordinate
into a function expecting an Int
, as in this example, one must first unwrap the field either using pattern matching or, in the case of record-style syntax, using the accessor function:
newtype Ordinate = Ordinate { unOrdinate :: Int }
data Point = Point { x :: Ordinate, y :: Ordinate }
translateBy :: Int -> Int -> Int
translateBy o ord = ord + o
main :: IO ()
main =
let p = Point (Ordinate 10) (Ordinate 20)
translate = translateBy 10
translatedP = Point (Ordinate (translate (unOrdinate (x p)))) (Ordinate (translate (unOrdinate (y p))))
translatedX = unOrdinate (x translatedP)
translatedY = unOrdinate (y translatedP)
in putStrLn ("x=" ++ show translatedX ++ ", y=" ++ show translatedY)
We will return to this example shortly to explain some new things you may have spotted as well as some ways to make this less ugly while retaining type safety.
[Sources: 1]
This last example is ugly. Can we make it more pleasing to the eye?
Recall Ordinate (translate (unOrdinate someExpression))
. What we're really doing here is applying three functions in turn: we apply unOrdinate
to someExpression
, translate
to the result of that and Ordinate
to the result of that. This should leap out at you: it's simply the composition of three functions. This can, therefore, be rewritten as (Ordinate . translate . unOrdinate) someExpression
. This doesn't save many characters of typing, but it does reveal a nice structure to the code by emphasizing the "valueness" of functions, since Ordinate . translate . unOrdinate
is a value just like any other. This is core to a functional programming language.
Let's use $
and also observe that translateBy
is really just the "add this value to my argument function", which is the same as a partially applied +
operator. Here's our ugly Point
example rewritten to use .
and +
:
newtype Ordinate = Ordinate { unOrdinate :: Int }
data Point = Point { x :: Ordinate, y :: Ordinate }
main :: IO ()
main =
let p = Point (Ordinate 10) (Ordinate 20)
translateOrdinate = Ordinate . ((+) 10) . unOrdinate
translatedP = Point (translateOrdinate (x p)) (translateOrdinate (y p))
translatedX = unOrdinate $ x translatedP
translatedY = unOrdinate $ y translatedP
in putStrLn $ "x=" ++ show translatedX ++ ", y=" ++ show translatedY
Partial application of a binary operator is known as a left or right section depending on the order in which the operands are handled. Conceptually:
(2^)
(left section) is equivalent to \x -> 2 ^ x
(^2)
(right section) is equivalent to \x -> x ^ 2
Of course, there is more than one way to skin a cat, and this code can be elegantly reformulated using pattern matching:
newtype Ordinate = Ordinate { unOrdinate :: Int } deriving Show
data Point = Point { x :: Ordinate, y :: Ordinate } deriving Show
translatedPoint :: Int -> Int -> Point -> Point
translatedPoint xOffset yOffset (Point (Ordinate xValue) (Ordinate yValue))
= Point (Ordinate $ xValue + xOffset) (Ordinate $ yValue + yOffset)
main :: IO ()
main =
let p = Point (Ordinate 10) (Ordinate 20)
translatedP = translatedPoint 10 10 p
in print translatedP
This also takes advantage of automatic deriving to eliminate the calls to show
.