Normalization.tex

   1 \chapter[chap:normalization]{Normalization}
   2   % A helper to print a single example in the half the page width. The example
   3   % text should be in a buffer whose name is given in an argument.
   4   %
   5   % The align=right option really does left-alignment, but without the program
   6   % will end up on a single line. The strut=no option prevents a bunch of empty
   7   % space at the start of the frame.
   8   \define[1]\example{
   9     \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
  10       \setuptyping[option=LAM,style=sans,before=,after=,strip=auto]
  11       \typebuffer[#1]
  12       \setuptyping[option=none,style=\tttf,strip=auto]
  13     }
  14   }
  15
  16   \define[4]\transexample{
  17     \placeexample[here][ex:trans:#1]{#2}
  18     \startcombination[2*1]
  19       {\example{#3}}{Original program}
  20       {\example{#4}}{Transformed program}
  21     \stopcombination
  22   }
  23
  24   The first step in the core to \small{VHDL} translation process, is normalization. We
  25   aim to bring the core description into a simpler form, which we can
  26   subsequently translate into \small{VHDL} easily. This normal form is needed because
  27   the full core language is more expressive than \small{VHDL} in some areas and because
  28   core can describe expressions that do not have a direct hardware
  29   interpretation.
  30
  31   \section{Normal form}
  32     The transformations described here have a well-defined goal: To bring the
  33     program in a well-defined form that is directly translatable to hardware,
  34     while fully preserving the semantics of the program. We refer to this form as
  35     the \emph{normal form} of the program. The formal definition of this normal
  36     form is quite simple:
  37
  38     \placedefinition{}{A program is in \emph{normal form} if none of the
  39     transformations from this chapter apply.}
  40
  41     Of course, this is an \quote{easy} definition of the normal form, since our
  42     program will end up in normal form automatically. The more interesting part is
  43     to see if this normal form actually has the properties we would like it to
  44     have.
  45
  46     But, before getting into more definitions and details about this normal form,
  47     let's try to get a feeling for it first. The easiest way to do this is by
  48     describing the things we want to not have in a normal form.
  49
  50     \startitemize
  51       \item Any \emph{polymorphism} must be removed. When laying down hardware, we
  52       can't generate any signals that can have multiple types. All types must be
  53       completely known to generate hardware.
  54
  55       \item Any \emph{higher order} constructions must be removed. We can't
  56       generate a hardware signal that contains a function, so all values,
  57       arguments and returns values used must be first order.
  58
  59       \item Any complex \emph{nested scopes} must be removed. In the \small{VHDL}
  60       description, every signal is in a single scope. Also, full expressions are
  61       not supported everywhere (in particular port maps can only map signal
  62       names and constants, not complete expressions). To make the \small{VHDL}
  63       generation easy, a separate binder must be bound to ever application or
  64       other expression.
  65     \stopitemize
  66
  67     \todo{Intermezzo: functions vs plain values}
  68
  69     A very simple example of a program in normal form is given in
  70     \in{example}[ex:MulSum]. As you can see, all arguments to the function (which
  71     will become input ports in the final hardware) are at the outer level.
  72     This means that the body of the inner lambda abstraction is never a
  73     function, but always a plain value.
  74
  75     As the body of the inner lambda abstraction, we see a single (recursive)
  76     let expression, that binds two variables (\lam{mul} and \lam{sum}). These
  77     variables will be signals in the final hardware, bound to the output port
  78     of the \lam{*} and \lam{+} components.
  79
  80     The final line (the \quote{return value} of the function) selects the
  81     \lam{sum} signal to be the output port of the function. This \quote{return
  82     value} can always only be a variable reference, never a more complex
  83     expression.
  84
  85     \todo{Add generated VHDL}
  86
  87     \startbuffer[MulSum]
  88     alu :: Bit -> Word -> Word -> Word
  89     alu = λa.λb.λc.
  90         let
  91           mul = (*) a b
  92           sum = (+) mul c
  93         in
  94           sum
  95     \stopbuffer
  96
  97     \startuseMPgraphic{MulSum}
  98       save a, b, c, mul, add, sum;
  99
 100       % I/O ports
 101       newCircle.a(btex $a$ etex) "framed(false)";
 102       newCircle.b(btex $b$ etex) "framed(false)";
 103       newCircle.c(btex $c$ etex) "framed(false)";
 104       newCircle.sum(btex $res$ etex) "framed(false)";
 105
 106       % Components
 107       newCircle.mul(btex * etex);
 108       newCircle.add(btex + etex);
 109
 110       a.c      - b.c   = (0cm, 2cm);
 111       b.c      - c.c   = (0cm, 2cm);
 112       add.c            = c.c + (2cm, 0cm);
 113       mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
 114       sum.c            = add.c + (2cm, 0cm);
 115       c.c              = origin;
 116
 117       % Draw objects and lines
 118       drawObj(a, b, c, mul, add, sum);
 119
 120       ncarc(a)(mul) "arcangle(15)";
 121       ncarc(b)(mul) "arcangle(-15)";
 122       ncline(c)(add);
 123       ncline(mul)(add);
 124       ncline(add)(sum);
 125     \stopuseMPgraphic
 126
 127     \placeexample[here][ex:MulSum]{Simple architecture consisting of a
 128     multiplier and a subtractor.}
 129       \startcombination[2*1]
 130         {\typebufferlam{MulSum}}{Core description in normal form.}
 131         {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
 132       \stopcombination
 133
 134     The previous example described composing an architecture by calling other
 135     functions (operators), resulting in a simple architecture with components and
 136     connections. There is of course also some mechanism for choice in the normal
 137     form. In a normal Core program, the \emph{case} expression can be used in a
 138     few different ways to describe choice. In normal form, this is limited to a
 139     very specific form.
 140
 141     \in{Example}[ex:AddSubAlu] shows an example describing a
 142     simple \small{ALU}, which chooses between two operations based on an opcode
 143     bit. The main structure is similar to \in{example}[ex:MulSum], but this
 144     time the \lam{res} variable is bound to a case expression. This case
 145     expression scrutinizes the variable \lam{opcode} (and scrutinizing more
 146     complex expressions is not supported). The case expression can select a
 147     different variable based on the constructor of \lam{opcode}.
 148
 149     \startbuffer[AddSubAlu]
 150     alu :: Bit -> Word -> Word -> Word
 151     alu = λopcode.λa.λb.
 152         let
 153           res1 = (+) a b
 154           res2 = (-) a b
 155           res = case opcode of
 156             Low -> res1
 157             High -> res2
 158         in
 159           res
 160     \stopbuffer
 161
 162     \startuseMPgraphic{AddSubAlu}
 163       save opcode, a, b, add, sub, mux, res;
 164
 165       % I/O ports
 166       newCircle.opcode(btex $opcode$ etex) "framed(false)";
 167       newCircle.a(btex $a$ etex) "framed(false)";
 168       newCircle.b(btex $b$ etex) "framed(false)";
 169       newCircle.res(btex $res$ etex) "framed(false)";
 170       % Components
 171       newCircle.add(btex + etex);
 172       newCircle.sub(btex - etex);
 173       newMux.mux;
 174
 175       opcode.c - a.c   = (0cm, 2cm);
 176       add.c    - a.c   = (4cm, 0cm);
 177       sub.c    - b.c   = (4cm, 0cm);
 178       a.c      - b.c   = (0cm, 3cm);
 179       mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
 180       res.c    - mux.c = (1.5cm, 0cm);
 181       b.c              = origin;
 182
 183       % Draw objects and lines
 184       drawObj(opcode, a, b, res, add, sub, mux);
 185
 186       ncline(a)(add) "posA(e)";
 187       ncline(b)(sub) "posA(e)";
 188       nccurve(a)(sub) "posA(e)", "angleA(0)";
 189       nccurve(b)(add) "posA(e)", "angleA(0)";
 190       nccurve(add)(mux) "posB(inpa)", "angleB(0)";
 191       nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
 192       nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
 193       ncline(mux)(res) "posA(out)";
 194     \stopuseMPgraphic
 195
 196     \placeexample[here][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
 197       \startcombination[2*1]
 198         {\typebufferlam{AddSubAlu}}{Core description in normal form.}
 199         {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
 200       \stopcombination
 201
 202     As a more complete example, consider \in{example}[ex:NormalComplete]. This
 203     example contains everything that is supported in normal form, with the
 204     exception of builtin higher order functions. The graphical version of the
 205     architecture contains a slightly simplified version, since the state tuple
 206     packing and unpacking have been left out. Instead, two seperate registers are
 207     drawn. Also note that most synthesis tools will further optimize this
 208     architecture by removing the multiplexers at the register input and
 209     instead put some gates in front of the register's clock input, but we want
 210     to show the architecture as close to the description as possible.
 211
 212     As you can see from the previous examples, the generation of the final
 213     architecture from the normal form is straightforward. In each of the
 214     examples, there is a direct match between the normal form structure,
 215     the generated VHDL and the architecture shown in the images.
 216
 217     \startbuffer[NormalComplete]
 218       regbank :: Bit
 219                  -> Word
 220                  -> State (Word, Word)
 221                  -> (State (Word, Word), Word)
 222
 223       -- All arguments are an inital lambda (address, data, packed state)
 224       regbank = λa.λd.λsp.
 225       -- There are nested let expressions at top level
 226       let
 227         -- Unpack the state by coercion (\eg, cast from
 228         -- State (Word, Word) to (Word, Word))
 229         s = sp ▶ (Word, Word)
 230         -- Extract both registers from the state
 231         r1 = case s of (a, b) -> a
 232         r2 = case s of (a, b) -> b
 233         -- Calling some other user-defined function.
 234         d' = foo d
 235         -- Conditional connections
 236         out = case a of
 237           High -> r1
 238           Low -> r2
 239         r1' = case a of
 240           High -> d'
 241           Low -> r1
 242         r2' = case a of
 243           High -> r2
 244           Low -> d'
 245         -- Packing a tuple
 246         s' = (,) r1' r2'
 247         -- pack the state by coercion (\eg, cast from
 248         -- (Word, Word) to State (Word, Word))
 249         sp' = s' ▶ State (Word, Word)
 250         -- Pack our return value
 251         res = (,) sp' out
 252       in
 253         -- The actual result
 254         res
 255     \stopbuffer
 256
 257     \startuseMPgraphic{NormalComplete}
 258       save a, d, r, foo, muxr, muxout, out;
 259
 260       % I/O ports
 261       newCircle.a(btex \lam{a} etex) "framed(false)";
 262       newCircle.d(btex \lam{d} etex) "framed(false)";
 263       newCircle.out(btex \lam{out} etex) "framed(false)";
 264       % Components
 265       %newCircle.add(btex + etex);
 266       newBox.foo(btex \lam{foo} etex);
 267       newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
 268       newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
 269       newMux.muxr1;
 270       % Reflect over the vertical axis
 271       reflectObj(muxr1)((0,0), (0,1));
 272       newMux.muxr2;
 273       newMux.muxout;
 274       rotateObj(muxout)(-90);
 275
 276       d.c               = foo.c + (0cm, 1.5cm);
 277       a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
 278       foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
 279       muxr1.c           = r1.c + (0cm, 2cm);
 280       muxr2.c           = r2.c + (0cm, 2cm);
 281       r2.c              = r1.c + (4cm, 0cm);
 282       r1.c              = origin;
 283       muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
 284       out.c             = muxout.c - (0cm, 1.5cm);
 285
 286     %  % Draw objects and lines
 287       drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
 288
 289       ncline(d)(foo);
 290       nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
 291       nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
 292       nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
 293       nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
 294       nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
 295       nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
 296       nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
 297       nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
 298       % Connect port a
 299       nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
 300       nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
 301       nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
 302       ncline(muxout)(out) "posA(out)";
 303     \stopuseMPgraphic
 304
 305     \todo{Don't split registers in this image?}
 306     \placeexample[here][ex:NormalComplete]{Simple architecture consisting of an adder and a
 307     subtractor.}
 308       \startcombination[2*1]
 309         {\typebufferlam{NormalComplete}}{Core description in normal form.}
 310         {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
 311       \stopcombination
 312
 313
 314
 315     \subsection[sec:normalization:intendednormalform]{Intended normal form definition}
 316       Now we have some intuition for the normal form, we can describe how we want
 317       the normal form to look like in a slightly more formal manner. The following
 318       EBNF-like description completely captures the intended structure (and
 319       generates a subset of GHC's core format).
 320
 321       Some clauses have an expression listed in parentheses. These are conditions
 322       that need to apply to the clause.
 323
 324       \defref{intended normal form definition}
 325       \todo{Fix indentation}
 326       \startlambda
 327       \italic{normal} := \italic{lambda}
 328       \italic{lambda} := λvar.\italic{lambda} (representable(var))
 329                       | \italic{toplet}
 330       \italic{toplet} := letrec [\italic{binding}...] in var (representable(var))
 331       \italic{binding} := var = \italic{rhs} (representable(rhs))
 332                        -- State packing and unpacking by coercion
 333                        | var0 = var1 ▶ State ty (lvar(var1))
 334                        | var0 = var1 ▶ ty (var1 :: State ty ∧ lvar(var1))
 335       \italic{rhs} := userapp
 336                    | builtinapp
 337                    -- Extractor case
 338                    | case var of C a0 ... an -> ai (lvar(var))
 339                    -- Selector case
 340                    | case var of (lvar(var))
 341                       [ DEFAULT -> var ]  (lvar(var))
 342                       C0 w0,0 ... w0,n -> var0
 343                       \vdots
 344                       Cm wm,0 ... wm,n -> varm       (\forall{}i \forall{}j, wi,j \neq vari, lvar(vari))
 345       \italic{userapp} := \italic{userfunc}
 346                        | \italic{userapp} {userarg}
 347       \italic{userfunc} := var (gvar(var))
 348       \italic{userarg} := var (lvar(var))
 349       \italic{builtinapp} := \italic{builtinfunc}
 350                           | \italic{builtinapp} \italic{builtinarg}
 351       \italic{builtinfunc} := var (bvar(var))
 352       \italic{builtinarg} := var (representable(var) ∧ lvar(var))
 353                           | \italic{partapp} (partapp :: a -> b)
 354                           | \italic{coreexpr} (¬representable(coreexpr) ∧ ¬(coreexpr :: a -> b))
 355       \italic{partapp} := \italic{userapp} | \italic{builtinapp}
 356       \stoplambda
 357
 358       \todo{There can still be other casts around (which the code can handle,
 359       e.g., ignore), which still need to be documented here}
 360
 361       When looking at such a program from a hardware perspective, the top level
 362       lambda's define the input ports. The variable reference in the body of
 363       the recursive let expression is the output port. Most function
 364       applications bound by the let expression define a component
 365       instantiation, where the input and output ports are mapped to local
 366       signals or arguments. Some of the others use a builtin construction (\eg
 367       the \lam{case} expression) or call a builtin function (\eg \lam{+} or
 368       \lam{map}). For these, a hardcoded \small{VHDL} translation is
 369       available.
 370
 371   \section[sec:normalization:transformation]{Transformation notation}
 372     To be able to concisely present transformations, we use a specific format
 373     for them. It is a simple format, similar to one used in logic reasoning.
 374
 375     Such a transformation description looks like the following.
 376
 377     \starttrans
 378     <context conditions>
 379     ~
 380     <original expression>
 381     --------------------------          <expression conditions>
 382     <transformed expresssion>
 383     ~
 384     <context additions>
 385     \stoptrans
 386
 387     This format desribes a transformation that applies to \lam{<original
 388     expresssion>} and transforms it into \lam{<transformed expression>}, assuming
 389     that all conditions apply. In this format, there are a number of placeholders
 390     in pointy brackets, most of which should be rather obvious in their meaning.
 391     Nevertheless, we will more precisely specify their meaning below:
 392
 393       \startdesc{<original expression>} The expression pattern that will be matched
 394       against (subexpressions of) the expression to be transformed. We call this a
 395       pattern, because it can contain \emph{placeholders} (variables), which match
 396       any expression or binder. Any such placeholder is said to be \emph{bound} to
 397       the expression it matches. It is convention to use an uppercase letter (\eg
 398       \lam{M} or \lam{E}) to refer to any expression (including a simple variable
 399       reference) and lowercase letters (\eg \lam{v} or \lam{b}) to refer to
 400       (references to) binders.
 401
 402       For example, the pattern \lam{a + B} will match the expression
 403       \lam{v + (2 * w)} (binding \lam{a} to \lam{v} and \lam{B} to
 404       \lam{(2 * w)}), but not \lam{(2 * w) + v}.
 405       \stopdesc
 406
 407       \startdesc{<expression conditions>}
 408       These are extra conditions on the expression that is matched. These
 409       conditions can be used to further limit the cases in which the
 410       transformation applies, commonly to prevent a transformation from
 411       causing a loop with itself or another transformation.
 412
 413       Only if these conditions are \emph{all} true, the transformation
 414       applies.
 415       \stopdesc
 416
 417       \startdesc{<context conditions>}
 418       These are a number of extra conditions on the context of the function. In
 419       particular, these conditions can require some (other) top level function to be
 420       present, whose value matches the pattern given here. The format of each of
 421       these conditions is: \lam{binder = <pattern>}.
 422
 423       Typically, the binder is some placeholder bound in the \lam{<original
 424       expression>}, while the pattern contains some placeholders that are used in
 425       the \lam{transformed expression}.
 426
 427       Only if a top level binder exists that matches each binder and pattern,
 428       the transformation applies.
 429       \stopdesc
 430
 431       \startdesc{<transformed expression>}
 432       This is the expression template that is the result of the transformation. If, looking
 433       at the above three items, the transformation applies, the \lam{<original
 434       expression>} is completely replaced with the \lam{<transformed expression>}.
 435       We call this a template, because it can contain placeholders, referring to
 436       any placeholder bound by the \lam{<original expression>} or the
 437       \lam{<context conditions>}. The resulting expression will have those
 438       placeholders replaced by the values bound to them.
 439
 440       Any binder (lowercase) placeholder that has no value bound to it yet will be
 441       bound to (and replaced with) a fresh binder.
 442       \stopdesc
 443
 444       \startdesc{<context additions>}
 445       These are templates for new functions to add to the context. This is a way
 446       to have a transformation create new top level functions.
 447
 448       Each addition has the form \lam{binder = template}. As above, any
 449       placeholder in the addition is replaced with the value bound to it, and any
 450       binder placeholder that has no value bound to it yet will be bound to (and
 451       replaced with) a fresh binder.
 452       \stopdesc
 453
 454     As an example, we'll look at η-abstraction:
 455
 456     \starttrans
 457     E                 \lam{E :: a -> b}
 458     --------------    \lam{E} does not occur on a function position in an application
 459     λx.E x            \lam{E} is not a lambda abstraction.
 460     \stoptrans
 461
 462     η-abstraction is a well known transformation from lambda calculus. What
 463     this transformation does, is take any expression that has a function type
 464     and turn it into a lambda expression (giving an explicit name to the
 465     argument). There are some extra conditions that ensure that this
 466     transformation does not apply infinitely (which are not necessarily part
 467     of the conventional definition of η-abstraction).
 468
 469     Consider the following function, which is a fairly obvious way to specify a
 470     simple ALU (Note that \in{example}[ex:AddSubAlu] shows the normal form of this
 471     function). The parentheses around the \lam{+} and \lam{-} operators are
 472     commonly used in Haskell to show that the operators are used as normal
 473     functions, instead of \emph{infix} operators (\eg, the operators appear
 474     before their arguments, instead of in between).
 475
 476     \startlambda
 477     alu :: Bit -> Word -> Word -> Word
 478     alu = λopcode. case opcode of
 479       Low -> (+)
 480       High -> (-)
 481     \stoplambda
 482
 483     There are a few subexpressions in this function to which we could possibly
 484     apply the transformation. Since the pattern of the transformation is only
 485     the placeholder \lam{E}, any expression will match that. Whether the
 486     transformation applies to an expression is thus solely decided by the
 487     conditions to the right of the transformation.
 488
 489     We will look at each expression in the function in a top down manner. The
 490     first expression is the entire expression the function is bound to.
 491
 492     \startlambda
 493     λopcode. case opcode of
 494       Low -> (+)
 495       High -> (-)
 496     \stoplambda
 497
 498     As said, the expression pattern matches this. The type of this expression is
 499     \lam{Bit -> Word -> Word -> Word}, which matches \lam{a -> b} (Note that in
 500     this case \lam{a = Bit} and \lam{b = Word -> Word -> Word}).
 501
 502     Since this expression is at top level, it does not occur at a function
 503     position of an application. However, The expression is a lambda abstraction,
 504     so this transformation does not apply.
 505
 506     The next expression we could apply this transformation to, is the body of
 507     the lambda abstraction:
 508
 509     \startlambda
 510     case opcode of
 511       Low -> (+)
 512       High -> (-)
 513     \stoplambda
 514
 515     The type of this expression is \lam{Word -> Word -> Word}, which again
 516     matches \lam{a -> b}. The expression is the body of a lambda expression, so
 517     it does not occur at a function position of an application. Finally, the
 518     expression is not a lambda abstraction but a case expression, so all the
 519     conditions match. There are no context conditions to match, so the
 520     transformation applies.
 521
 522     By now, the placeholder \lam{E} is bound to the entire expression. The
 523     placeholder \lam{x}, which occurs in the replacement template, is not bound
 524     yet, so we need to generate a fresh binder for that. Let's use the binder
 525     \lam{a}. This results in the following replacement expression:
 526
 527     \startlambda
 528     λa.(case opcode of
 529       Low -> (+)
 530       High -> (-)) a
 531     \stoplambda
 532
 533     Continuing with this expression, we see that the transformation does not
 534     apply again (it is a lambda expression). Next we look at the body of this
 535     lambda abstraction:
 536
 537     \startlambda
 538     (case opcode of
 539       Low -> (+)
 540       High -> (-)) a
 541     \stoplambda
 542
 543     Here, the transformation does apply, binding \lam{E} to the entire
 544     expression and \lam{x} to the fresh binder \lam{b}, resulting in the
 545     replacement:
 546
 547     \startlambda
 548     λb.(case opcode of
 549       Low -> (+)
 550       High -> (-)) a b
 551     \stoplambda
 552
 553     Again, the transformation does not apply to this lambda abstraction, so we
 554     look at its body. For brevity, we'll put the case statement on one line from
 555     now on.
 556
 557     \startlambda
 558     (case opcode of Low -> (+); High -> (-)) a b
 559     \stoplambda
 560
 561     The type of this expression is \lam{Word}, so it does not match \lam{a -> b}
 562     and the transformation does not apply. Next, we have two options for the
 563     next expression to look at: The function position and argument position of
 564     the application. The expression in the argument position is \lam{b}, which
 565     has type \lam{Word}, so the transformation does not apply. The expression in
 566     the function position is:
 567
 568     \startlambda
 569     (case opcode of Low -> (+); High -> (-)) a
 570     \stoplambda
 571
 572     Obviously, the transformation does not apply here, since it occurs in
 573     function position (which makes the second condition false). In the same
 574     way the transformation does not apply to both components of this
 575     expression (\lam{case opcode of Low -> (+); High -> (-)} and \lam{a}), so
 576     we'll skip to the components of the case expression: The scrutinee and
 577     both alternatives. Since the opcode is not a function, it does not apply
 578     here.
 579
 580     The first alternative is \lam{(+)}. This expression has a function type
 581     (the operator still needs two arguments). It does not occur in function
 582     position of an application and it is not a lambda expression, so the
 583     transformation applies.
 584
 585     We look at the \lam{<original expression>} pattern, which is \lam{E}.
 586     This means we bind \lam{E} to \lam{(+)}. We then replace the expression
 587     with the \lam{<transformed expression>}, replacing all occurences of
 588     \lam{E} with \lam{(+)}. In the \lam{<transformed expression>}, the This gives us the replacement expression:
 589     \lam{λx.(+) x} (A lambda expression binding \lam{x}, with a body that
 590     applies the addition operator to \lam{x}).
 591
 592     The complete function then becomes:
 593     \startlambda
 594     (case opcode of Low -> λa1.(+) a1; High -> (-)) a
 595     \stoplambda
 596
 597     Now the transformation no longer applies to the complete first alternative
 598     (since it is a lambda expression). It does not apply to the addition
 599     operator again, since it is now in function position in an application. It
 600     does, however, apply to the application of the addition operator, since
 601     that is neither a lambda expression nor does it occur in function
 602     position. This means after one more application of the transformation, the
 603     function becomes:
 604
 605     \startlambda
 606     (case opcode of Low -> λa1.λb1.(+) a1 b1; High -> (-)) a
 607     \stoplambda
 608
 609     The other alternative is left as an exercise to the reader. The final
 610     function, after applying η-abstraction until it does no longer apply is:
 611
 612     \startlambda
 613     alu :: Bit -> Word -> Word -> Word
 614     alu = λopcode.λa.b. (case opcode of
 615       Low -> λa1.λb1 (+) a1 b1
 616       High -> λa2.λb2 (-) a2 b2) a b
 617     \stoplambda
 618
 619     \subsection{Transformation application}
 620       In this chapter we define a number of transformations, but how will we apply
 621       these? As stated before, our normal form is reached as soon as no
 622       transformation applies anymore. This means our application strategy is to
 623       simply apply any transformation that applies, and continuing to do that with
 624       the result of each transformation.
 625
 626       In particular, we define no particular order of transformations. Since
 627       transformation order should not influence the resulting normal form,
 628       this leaves the implementation free to choose any application order that
 629       results in an efficient implementation. Unfortunately this is not
 630       entirely true for the current set of transformations. See
 631       \in{section}[sec:normalization:non-determinism] for a discussion of this
 632       problem.
 633
 634       When applying a single transformation, we try to apply it to every (sub)expression
 635       in a function, not just the top level function body. This allows us to
 636       keep the transformation descriptions concise and powerful.
 637
 638     \subsection{Definitions}
 639       In the following sections, we will be using a number of functions and
 640       notations, which we will define here.
 641
 642       \subsubsection{Concepts}
 643         A \emph{global variable} is any variable (binder) that is bound at the
 644         top level of a program, or an external module. A \emph{local variable} is any
 645         other variable (\eg, variables local to a function, which can be bound by
 646         lambda abstractions, let expressions and pattern matches of case
 647         alternatives).  Note that this is a slightly different notion of global versus
 648         local than what \small{GHC} uses internally.
 649         \defref{global variable} \defref{local variable}
 650
 651         A \emph{hardware representable} (or just \emph{representable}) type or value
 652         is (a value of) a type that we can generate a signal for in hardware. For
 653         example, a bit, a vector of bits, a 32 bit unsigned word, etc. Values that are
 654         not runtime representable notably include (but are not limited to): Types,
 655         dictionaries, functions.
 656         \defref{representable}
 657
 658         A \emph{builtin function} is a function supplied by the Cλash framework, whose
 659         implementation is not valid Cλash. The implementation is of course valid
 660         Haskell, for simulation, but it is not expressable in Cλash.
 661         \defref{builtin function} \defref{user-defined function}
 662
 663       For these functions, Cλash has a \emph{builtin hardware translation}, so calls
 664       to these functions can still be translated. These are functions like
 665       \lam{map}, \lam{hwor} and \lam{length}.
 666
 667       A \emph{user-defined} function is a function for which we do have a Cλash
 668       implementation available.
 669
 670       \subsubsection{Predicates}
 671         Here, we define a number of predicates that can be used below to concisely
 672         specify conditions.\refdef{global variable}
 673
 674         \emph{gvar(expr)} is true when \emph{expr} is a variable that references a
 675         global variable. It is false when it references a local variable.
 676
 677         \refdef{local variable}\emph{lvar(expr)} is the complement of \emph{gvar}; it is true when \emph{expr}
 678         references a local variable, false when it references a global variable.
 679
 680         \refdef{representable}\emph{representable(expr)} or \emph{representable(var)} is true when
 681         \emph{expr} or \emph{var} is \emph{representable}.
 682
 683     \subsection[sec:normalization:uniq]{Binder uniqueness}
 684       A common problem in transformation systems, is binder uniqueness. When not
 685       considering this problem, it is easy to create transformations that mix up
 686       bindings and cause name collisions. Take for example, the following core
 687       expression:
 688
 689       \startlambda
 690       (λa.λb.λc. a * b * c) x c
 691       \stoplambda
 692
 693       By applying β-reduction (see \in{section}[sec:normalization:beta]) once,
 694       we can simplify this expression to:
 695
 696       \startlambda
 697       (λb.λc. x * b * c) c
 698       \stoplambda
 699
 700       Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
 701       binder. No harm done here. But note that we see multiple occurences of the
 702       \lam{c} binder. The first is a binding occurence, to which the second refers.
 703       The last, however refers to \emph{another} instance of \lam{c}, which is
 704       bound somewhere outside of this expression. Now, if we would apply beta
 705       reduction without taking heed of binder uniqueness, we would get:
 706
 707       \startlambda
 708       λc. x * c * c
 709       \stoplambda
 710
 711       This is obviously not what was supposed to happen! The root of this problem is
 712       the reuse of binders: Identical binders can be bound in different scopes, such
 713       that only the inner one is \quote{visible} in the inner expression. In the example
 714       above, the \lam{c} binder was bound outside of the expression and in the inner
 715       lambda expression. Inside that lambda expression, only the inner \lam{c} is
 716       visible.
 717
 718       There are a number of ways to solve this. \small{GHC} has isolated this
 719       problem to their binder substitution code, which performs \emph{deshadowing}
 720       during its expression traversal. This means that any binding that shadows
 721       another binding on a higher level is replaced by a new binder that does not
 722       shadow any other binding. This non-shadowing invariant is enough to prevent
 723       binder uniqueness problems in \small{GHC}.
 724
 725       In our transformation system, maintaining this non-shadowing invariant is
 726       a bit harder to do (mostly due to implementation issues, the prototype doesn't
 727       use \small{GHC}'s subsitution code). Also, the following points can be
 728       observed.
 729
 730       \startitemize
 731       \item Deshadowing does not guarantee overall uniqueness. For example, the
 732       following (slightly contrived) expression shows the identifier \lam{x} bound in
 733       two seperate places (and to different values), even though no shadowing
 734       occurs.
 735
 736       \startlambda
 737       (let x = 1 in x) + (let x = 2 in x)
 738       \stoplambda
 739
 740       \item In our normal form (and the resulting \small{VHDL}), all binders
 741       (signals) within the same function (entity) will end up in the same
 742       scope. To allow this, all binders within the same function should be
 743       unique.
 744
 745       \item When we know that all binders in an expression are unique, moving around
 746       or removing a subexpression will never cause any binder conflicts. If we have
 747       some way to generate fresh binders, introducing new subexpressions will not
 748       cause any problems either. The only way to cause conflicts is thus to
 749       duplicate an existing subexpression.
 750       \stopitemize
 751
 752       Given the above, our prototype maintains a unique binder invariant. This
 753       means that in any given moment during normalization, all binders \emph{within
 754       a single function} must be unique. To achieve this, we apply the following
 755       technique.
 756
 757       \todo{Define fresh binders and unique supplies}
 758
 759       \startitemize
 760       \item Before starting normalization, all binders in the function are made
 761       unique. This is done by generating a fresh binder for every binder used. This
 762       also replaces binders that did not cause any conflict, but it does ensure that
 763       all binders within the function are generated by the same unique supply.
 764       \refdef{fresh binder}
 765       \item Whenever a new binder must be generated, we generate a fresh binder that
 766       is guaranteed to be different from \emph{all binders generated so far}. This
 767       can thus never introduce duplication and will maintain the invariant.
 768       \item Whenever (a part of) an expression is duplicated (for example when
 769       inlining), all binders in the expression are replaced with fresh binders
 770       (using the same method as at the start of normalization). These fresh binders
 771       can never introduce duplication, so this will maintain the invariant.
 772       \item Whenever we move part of an expression around within the function, there
 773       is no need to do anything special. There is obviously no way to introduce
 774       duplication by moving expressions around. Since we know that each of the
 775       binders is already unique, there is no way to introduce (incorrect) shadowing
 776       either.
 777       \stopitemize
 778
 779   \section{Transform passes}
 780     In this section we describe the actual transforms.
 781
 782     Each transformation will be described informally first, explaining
 783     the need for and goal of the transformation. Then, we will formally define
 784     the transformation using the syntax introduced in
 785     \in{section}[sec:normalization:transformation].
 786
 787     \subsection{General cleanup}
 788       These transformations are general cleanup transformations, that aim to
 789       make expressions simpler. These transformations usually clean up the
 790        mess left behind by other transformations or clean up expressions to
 791        expose new transformation opportunities for other transformations.
 792
 793        Most of these transformations are standard optimizations in other
 794        compilers as well. However, in our compiler, most of these are not just
 795        optimizations, but they are required to get our program into intended
 796        normal form.
 797
 798         \placeintermezzo{}{
 799           \startframedtext[width=8cm,background=box,frame=no]
 800           \startalignment[center]
 801             {\tfa Substitution notation}
 802           \stopalignment
 803           \blank[medium]
 804
 805           In some of the transformations in this chapter, we need to perform
 806           substitution on an expression. Substitution means replacing every
 807           occurence of some expression (usually a variable reference) with
 808           another expression.
 809
 810           There have been a lot of different notations used in literature for
 811           specifying substitution. The notation that will be used in this report
 812           is the following:
 813
 814           \startlambda
 815             E[A=>B]
 816           \stoplambda
 817
 818           This means expression \lam{E} with all occurences of \lam{A} replaced
 819           with \lam{B}.
 820           \stopframedtext
 821         }
 822
 823       \defref{beta-reduction}
 824       \subsubsection[sec:normalization:beta]{β-reduction}
 825         β-reduction is a well known transformation from lambda calculus, where it is
 826         the main reduction step. It reduces applications of lambda abstractions,
 827         removing both the lambda abstraction and the application.
 828
 829         In our transformation system, this step helps to remove unwanted lambda
 830         abstractions (basically all but the ones at the top level). Other
 831         transformations (application propagation, non-representable inlining) make
 832         sure that most lambda abstractions will eventually be reducable by
 833         β-reduction.
 834
 835         Note that β-reduction also works on type lambda abstractions and type
 836         applications as well. This means the substitution below also works on
 837         type variables, in the case that the binder is a type variable and teh
 838         expression applied to is a type.
 839
 840         \starttrans
 841         (λx.E) M
 842         -----------------
 843         E[x=>M]
 844         \stoptrans
 845
 846         % And an example
 847         \startbuffer[from]
 848         (λa. 2 * a) (2 * b)
 849         \stopbuffer
 850
 851         \startbuffer[to]
 852         2 * (2 * b)
 853         \stopbuffer
 854
 855         \transexample{beta}{β-reduction}{from}{to}
 856
 857         \startbuffer[from]
 858         (λt.λa::t. a) @Int
 859         \stopbuffer
 860
 861         \startbuffer[to]
 862         (λa::Int. a)
 863         \stopbuffer
 864
 865         \transexample{beta-type}{β-reduction for type abstractions}{from}{to}
 866
 867       \subsubsection{Empty let removal}
 868         This transformation is simple: It removes recursive lets that have no bindings
 869         (which usually occurs when unused let binding removal removes the last
 870         binding from it).
 871
 872         Note that there is no need to define this transformation for
 873         non-recursive lets, since they always contain exactly one binding.
 874
 875         \starttrans
 876         letrec in M
 877         --------------
 878         M
 879         \stoptrans
 880
 881         \todo{Example}
 882
 883       \subsubsection[sec:normalization:simplelet]{Simple let binding removal}
 884         This transformation inlines simple let bindings, that bind some
 885         binder to some other binder instead of a more complex expression (\ie
 886         a = b).
 887
 888         This transformation is not needed to get an expression into intended
 889         normal form (since these bindings are part of the intended normal
 890         form), but makes the resulting \small{VHDL} a lot shorter.
 891
 892         \starttrans
 893         letrec
 894           a0 = E0
 895           \vdots
 896           ai = b
 897           \vdots
 898           an = En
 899         in
 900           M
 901         -----------------------------  \lam{b} is a variable reference
 902         letrec                         \lam{ai} ≠ \lam{b}
 903           a0 = E0 [ai=>b]
 904           \vdots
 905           ai-1 = Ei-1 [ai=>b]
 906           ai+1 = Ei+1 [ai=>b]
 907           \vdots
 908           an = En [ai=>b]
 909         in
 910           M[ai=>b]
 911         \stoptrans
 912
 913         \todo{example}
 914
 915       \subsubsection{Unused let binding removal}
 916         This transformation removes let bindings that are never used.
 917         Occasionally, \GHC's desugarer introduces some unused let bindings.
 918
 919         This normalization pass should really be unneeded to get into intended normal form
 920         (since unused bindings are not forbidden by the normal form), but in practice
 921         the desugarer or simplifier emits some unused bindings that cannot be
 922         normalized (e.g., calls to a \type{PatError}\todo{Check this name}). Also,
 923         this transformation makes the resulting \small{VHDL} a lot shorter.
 924
 925         \todo{Don't use old-style numerals in transformations}
 926         \starttrans
 927         letrec
 928           a0 = E0
 929           \vdots
 930           ai = Ei
 931           \vdots
 932           an = En
 933         in
 934           M                             \lam{ai} does not occur free in \lam{M}
 935         ----------------------------    \forall j, 0 ≤ j ≤ n, j ≠ i (\lam{ai} does not occur free in \lam{Ej})
 936         letrec
 937           a0 = E0
 938           \vdots
 939           ai-1 = Ei-1
 940           ai+1 = Ei+1
 941           \vdots
 942           an = En
 943         in
 944           M
 945         \stoptrans
 946
 947         \todo{Example}
 948
 949       \subsubsection{Cast propagation / simplification}
 950         This transform pushes casts down into the expression as far as possible.
 951         Since its exact role and need is not clear yet, this transformation is
 952         not yet specified.
 953
 954         \todo{Cast propagation}
 955
 956       \subsubsection{Top level binding inlining}
 957         This transform takes simple top level bindings generated by the
 958         \small{GHC} compiler. \small{GHC} sometimes generates very simple
 959         \quote{wrapper} bindings, which are bound to just a variable
 960         reference, or a partial application to constants or other variable
 961         references.
 962
 963         Note that this transformation is completely optional. It is not
 964         required to get any function into intended normal form, but it does help making
 965         the resulting VHDL output easier to read (since it removes a bunch of
 966         components that are really boring).
 967
 968         This transform takes any top level binding generated by the compiler,
 969         whose normalized form contains only a single let binding.
 970
 971         \starttrans
 972         x = λa0 ... λan.let y = E in y
 973         ~
 974         x
 975         --------------------------------------         \lam{x} is generated by the compiler
 976         λa0 ... λan.let y = E in y
 977         \stoptrans
 978
 979         \startbuffer[from]
 980         (+) :: Word -> Word -> Word
 981         (+) = GHC.Num.(+) @Word \$dNum
 982         ~
 983         (+) a b
 984         \stopbuffer
 985         \startbuffer[to]
 986         GHC.Num.(+) @ Alu.Word \$dNum a b
 987         \stopbuffer
 988
 989         \transexample{toplevelinline}{Top level binding inlining}{from}{to}
 990
 991         \in{Example}[ex:trans:toplevelinline] shows a typical application of
 992         the addition operator generated by \GHC. The type and dictionary
 993         arguments used here are described in
 994         \in{Section}[section:prototype:polymorphism].
 995
 996         Without this transformation, there would be a \lam{(+)} entity
 997         in the \VHDL which would just add its inputs. This generates a
 998         lot of overhead in the \VHDL, which is particularly annoying
 999         when browsing the generated RTL schematic (especially since most
1000         non-alphanumerics, like all characters in \lam{(+)}, are not
1001         allowed in \VHDL architecture names\footnote{Technically, it is
1002         allowed to use non-alphanumerics when using extended
1003         identifiers, but it seems that none of the tooling likes
1004         extended identifiers in filenames, so it effectively doesn't
1005         work.}, so the entity would be called \quote{w7aA7f} or
1006         something similarly unreadable and autogenerated).
1007
1008     \subsection{Program structure}
1009       These transformations are aimed at normalizing the overall structure
1010       into the intended form. This means ensuring there is a lambda abstraction
1011       at the top for every argument (input port or current state), putting all
1012       of the other value definitions in let bindings and making the final
1013       return value a simple variable reference.
1014
1015       \subsubsection[sec:normalization:eta]{η-abstraction}
1016         This transformation makes sure that all arguments of a function-typed
1017         expression are named, by introducing lambda expressions. When combined with
1018         β-reduction and non-representable binding inlining, all function-typed
1019         expressions should be lambda abstractions or global identifiers.
1020
1021         \starttrans
1022         E                 \lam{E :: a -> b}
1023         --------------    \lam{E} is not the first argument of an application.
1024         λx.E x            \lam{E} is not a lambda abstraction.
1025                           \lam{x} is a variable that does not occur free in \lam{E}.
1026         \stoptrans
1027
1028         \startbuffer[from]
1029         foo = λa.case a of
1030           True -> λb.mul b b
1031           False -> id
1032         \stopbuffer
1033
1034         \startbuffer[to]
1035         foo = λa.λx.(case a of
1036             True -> λb.mul b b
1037             False -> λy.id y) x
1038         \stopbuffer
1039
1040         \transexample{eta}{η-abstraction}{from}{to}
1041
1042       \subsubsection[sec:normalization:appprop]{Application propagation}
1043         This transformation is meant to propagate application expressions downwards
1044         into expressions as far as possible. This allows partial applications inside
1045         expressions to become fully applied and exposes new transformation
1046         opportunities for other transformations (like β-reduction and
1047         specialization).
1048
1049         Since all binders in our expression are unique (see
1050         \in{section}[sec:normalization:uniq]), there is no risk that we will
1051         introduce unintended shadowing by moving an expression into a lower
1052         scope. Also, since only move expression into smaller scopes (down into
1053         our expression), there is no risk of moving a variable reference out
1054         of the scope in which it is defined.
1055
1056         \starttrans
1057         (letrec binds in E) M
1058         ------------------------
1059         letrec binds in E M
1060         \stoptrans
1061
1062         % And an example
1063         \startbuffer[from]
1064         ( letrec
1065             val = 1
1066           in
1067             add val
1068         ) 3
1069         \stopbuffer
1070
1071         \startbuffer[to]
1072         letrec
1073           val = 1
1074         in
1075           add val 3
1076         \stopbuffer
1077
1078         \transexample{appproplet}{Application propagation for a let expression}{from}{to}
1079
1080         \starttrans
1081         (case x of
1082           p1 -> E1
1083           \vdots
1084           pn -> En) M
1085         -----------------
1086         case x of
1087           p1 -> E1 M
1088           \vdots
1089           pn -> En M
1090         \stoptrans
1091
1092         % And an example
1093         \startbuffer[from]
1094         ( case x of
1095             True -> id
1096             False -> neg
1097         ) 1
1098         \stopbuffer
1099
1100         \startbuffer[to]
1101         case x of
1102           True -> id 1
1103           False -> neg 1
1104         \stopbuffer
1105
1106         \transexample{apppropcase}{Application propagation for a case expression}{from}{to}
1107
1108       \subsubsection[sec:normalization:letrecurse]{Let recursification}
1109         This transformation makes all non-recursive lets recursive. In the
1110         end, we want a single recursive let in our normalized program, so all
1111         non-recursive lets can be converted. This also makes other
1112         transformations simpler: They can simply assume all lets are
1113         recursive.
1114
1115         \starttrans
1116         let
1117           a = E
1118         in
1119           M
1120         ------------------------------------------
1121         letrec
1122           a = E
1123         in
1124           M
1125         \stoptrans
1126
1127       \subsubsection{Let flattening}
1128         This transformation puts nested lets in the same scope, by lifting the
1129         binding(s) of the inner let into the outer let. Eventually, this will
1130         cause all let bindings to appear in the same scope.
1131
1132         This transformation only applies to recursive lets, since all
1133         non-recursive lets will be made recursive (see
1134         \in{section}[sec:normalization:letrecurse]).
1135
1136         Since we are joining two scopes together, there is no risk of moving a
1137         variable reference out of the scope where it is defined.
1138
1139         \starttrans
1140         letrec
1141           a0 = E0
1142           \vdots
1143           ai = (letrec bindings in M)
1144           \vdots
1145           an = En
1146         in
1147           N
1148         ------------------------------------------
1149         letrec
1150           a0 = E0
1151           \vdots
1152           ai = M
1153           \vdots
1154           an = En
1155           bindings
1156         in
1157           N
1158         \stoptrans
1159
1160         \startbuffer[from]
1161         letrec
1162           a = 1
1163           b = letrec
1164             x = a
1165             y = c
1166           in
1167             x + y
1168           c = 2
1169         in
1170           b
1171         \stopbuffer
1172         \startbuffer[to]
1173         letrec
1174           a = 1
1175           b = x + y
1176           c = 2
1177           x = a
1178           y = c
1179         in
1180           b
1181         \stopbuffer
1182
1183         \transexample{letflat}{Let flattening}{from}{to}
1184
1185       \subsubsection{Return value simplification}
1186         This transformation ensures that the return value of a function is always a
1187         simple local variable reference.
1188
1189         Currently implemented using lambda simplification, let simplification, and
1190         top simplification. Should change into something like the following, which
1191         works only on the result of a function instead of any subexpression. This is
1192         achieved by the contexts, like \lam{x = E}, though this is strictly not
1193         correct (you could read this as "if there is any function \lam{x} that binds
1194         \lam{E}, any \lam{E} can be transformed, while we only mean the \lam{E} that
1195         is bound by \lam{x}. This might need some extra notes or something).
1196
1197         Note that the return value is not simplified if its not representable.
1198         Otherwise, this would cause a direct loop with the inlining of
1199         unrepresentable bindings. If the return value is not
1200         representable because it has a function type, η-abstraction should
1201         make sure that this transformation will eventually apply. If the value
1202         is not representable for other reasons, the function result itself is
1203         not representable, meaning this function is not translatable anyway.
1204
1205         \starttrans
1206         x = E                            \lam{E} is representable
1207         ~                                \lam{E} is not a lambda abstraction
1208         E                                \lam{E} is not a let expression
1209         ---------------------------      \lam{E} is not a local variable reference
1210         letrec x = E in x
1211         \stoptrans
1212
1213         \starttrans
1214         x = λv0 ... λvn.E
1215         ~                                \lam{E} is representable
1216         E                                \lam{E} is not a let expression
1217         ---------------------------      \lam{E} is not a local variable reference
1218         letrec x = E in x
1219         \stoptrans
1220
1221         \starttrans
1222         x = λv0 ... λvn.let ... in E
1223         ~                                \lam{E} is representable
1224         E                                \lam{E} is not a local variable reference
1225         -----------------------------
1226         letrec x = E in x
1227         \stoptrans
1228
1229         \startbuffer[from]
1230         x = add 1 2
1231         \stopbuffer
1232
1233         \startbuffer[to]
1234         x = letrec x = add 1 2 in x
1235         \stopbuffer
1236
1237         \transexample{retvalsimpl}{Return value simplification}{from}{to}
1238
1239         \todo{More examples}
1240
1241     \subsection[sec:normalization:argsimpl]{Representable arguments simplification}
1242       This section contains just a single transformation that deals with
1243       representable arguments in applications. Non-representable arguments are
1244       handled by the transformations in
1245       \in{section}[sec:normalization:nonrep].
1246
1247       This transformation ensures that all representable arguments will become
1248       references to local variables. This ensures they will become references
1249       to local signals in the resulting \small{VHDL}, which is required due to
1250       limitations in the component instantiation code in \VHDL (one can only
1251       assign a signal or constant to an input port). By ensuring that all
1252       arguments are always simple variable references, we always have a signal
1253       available to map to the input ports.
1254
1255       To reduce a complex expression to a simple variable reference, we create
1256       a new let expression around the application, which binds the complex
1257       expression to a new variable. The original function is then applied to
1258       this variable.
1259
1260       \refdef{global variable}
1261       Note that references to \emph{global variables} (like a top level
1262       function without arguments, but also an argumentless dataconstructors
1263       like \lam{True}) are also simplified. Only local variables generate
1264       signals in the resulting architecture. Even though argumentless
1265       dataconstructors generate constants in generated \VHDL code and could be
1266       mapped to an input port directly, they are still simplified to make the
1267       normal form more regular.
1268
1269       \refdef{representable}
1270       \starttrans
1271       M N
1272       --------------------    \lam{N} is representable
1273       letrec x = N in M x     \lam{N} is not a local variable reference
1274       \stoptrans
1275       \refdef{local variable}
1276
1277       \startbuffer[from]
1278       add (add a 1) 1
1279       \stopbuffer
1280
1281       \startbuffer[to]
1282       letrec x = add a 1 in add x 1
1283       \stopbuffer
1284
1285       \transexample{argsimpl}{Argument simplification}{from}{to}
1286
1287     \subsection[sec:normalization:builtins]{Builtin functions}
1288       This section deals with (arguments to) builtin functions.  In the
1289       intended normal form definition\refdef{intended normal form definition}
1290       we can see that there are three sorts of arguments a builtin function
1291       can receive.
1292
1293       \startitemize[KR]
1294         \item A representable local variable reference. This is the most
1295         common argument to any function. The argument simplification
1296         transformation described in \in{section}[sec:normalization:argsimpl]
1297         makes sure that \emph{any} representable argument to \emph{any}
1298         function (including builtin functions) is turned into a local variable
1299         reference.
1300         \item (A partial application of) a top level function (either builtin on
1301         user-defined). The function extraction transformation described in
1302         this section takes care of turning every functiontyped argument into
1303         (a partial application of) a top level function.
1304         \item Any expression that is not representable and does not have a
1305         function type. Since these can be any expression, there is no
1306         transformation needed. Note that this category is exactly all
1307         expressions that are not transformed by the transformations for the
1308         previous two categories. This means that \emph{any} core expression
1309         that is used as an argument to a builtin function will be either
1310         transformed into one of the above categories, or end up in this
1311         categorie. In any case, the result is in normal form.
1312       \stopitemize
1313
1314       As noted, the argument simplification will handle any representable
1315       arguments to a builtin function. The following transformation is needed
1316       to handle non-representable arguments with a function type, all other
1317       non-representable arguments don't need any special handling.
1318
1319       \subsubsection[sec:normalization:funextract]{Function extraction}
1320         This transform deals with function-typed arguments to builtin
1321         functions.
1322         Since builtin functions cannot be specialized (see
1323         \in{section}[sec:normalization:specialize]) to remove the arguments,
1324         these arguments are extracted into a new global function instead. In
1325         other words, we create a new top level function that has exactly the
1326         extracted argument as its body. This greatly simplifies the
1327         translation rules needed for builtin functions, since they only need
1328         to handle (partial applications of) top level functions.
1329
1330         Any free variables occuring in the extracted arguments will become
1331         parameters to the new global function. The original argument is replaced
1332         with a reference to the new function, applied to any free variables from
1333         the original argument.
1334
1335         This transformation is useful when applying higher order builtin functions
1336         like \hs{map} to a lambda abstraction, for example. In this case, the code
1337         that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
1338         partial applications, not any other expression (such as lambda abstractions or
1339         even more complicated expressions).
1340
1341         \starttrans
1342         M N                     \lam{M} is (a partial aplication of) a builtin function.
1343         ---------------------   \lam{f0 ... fn} are all free local variables of \lam{N}
1344         M (x f0 ... fn)         \lam{N :: a -> b}
1345         ~                       \lam{N} is not a (partial application of) a top level function
1346         x = λf0 ... λfn.N
1347         \stoptrans
1348
1349         \startbuffer[from]
1350         addList = λb.λxs.map (λa . add a b) xs
1351         \stopbuffer
1352
1353         \startbuffer[to]
1354         addList = λb.λxs.map (f b) xs
1355         ~
1356         f = λb.λa.add a b
1357         \stopbuffer
1358
1359         \transexample{funextract}{Function extraction}{from}{to}
1360
1361         Note that the function \lam{f} will still need normalization after
1362         this.
1363
1364     \subsection{Case normalisation}
1365       \subsubsection{Scrutinee simplification}
1366         This transform ensures that the scrutinee of a case expression is always
1367         a simple variable reference.
1368
1369         \starttrans
1370         case E of
1371           alts
1372         -----------------        \lam{E} is not a local variable reference
1373         letrec x = E in
1374           case E of
1375             alts
1376         \stoptrans
1377
1378         \startbuffer[from]
1379         case (foo a) of
1380           True -> a
1381           False -> b
1382         \stopbuffer
1383
1384         \startbuffer[to]
1385         letrec x = foo a in
1386           case x of
1387             True -> a
1388             False -> b
1389         \stopbuffer
1390
1391         \transexample{letflat}{Case normalisation}{from}{to}
1392
1393
1394       \subsubsection{Case simplification}
1395         This transformation ensures that all case expressions become normal form. This
1396         means they will become one of:
1397         \startitemize
1398         \item An extractor case with a single alternative that picks a single field
1399         from a datatype, \eg \lam{case x of (a, b) -> a}.
1400         \item A selector case with multiple alternatives and only wild binders, that
1401         makes a choice between expressions based on the constructor of another
1402         expression, \eg \lam{case x of Low -> a; High -> b}.
1403         \stopitemize
1404
1405         \defref{wild binder}
1406         \starttrans
1407         case E of
1408           C0 v0,0 ... v0,m -> E0
1409           \vdots
1410           Cn vn,0 ... vn,m -> En
1411         --------------------------------------------------- \forall i \forall j, 0 ≤ i ≤ n, 0 ≤ i < m (\lam{wi,j} is a wild (unused) binder)
1412         letrec
1413           v0,0 = case E of C0 v0,0 .. v0,m -> v0,0
1414           \vdots
1415           v0,m = case E of C0 v0,0 .. v0,m -> v0,m
1416           \vdots
1417           vn,m = case E of Cn vn,0 .. vn,m -> vn,m
1418           x0 = E0
1419           \vdots
1420           xn = En
1421         in
1422           case E of
1423             C0 w0,0 ... w0,m -> x0
1424             \vdots
1425             Cn wn,0 ... wn,m -> xn
1426         \stoptrans
1427         \todo{Check the subscripts of this transformation}
1428
1429         Note that this transformation applies to case statements with any
1430         scrutinee. If the scrutinee is a complex expression, this might result
1431         in duplicate hardware. An extra condition to only apply this
1432         transformation when the scrutinee is already simple (effectively
1433         causing this transformation to be only applied after the scrutinee
1434         simplification transformation) might be in order.
1435
1436         \fxnote{This transformation specified like this is complicated and misses
1437         conditions to prevent looping with itself. Perhaps it should be split here for
1438         discussion?}
1439
1440         \startbuffer[from]
1441         case a of
1442           True -> add b 1
1443           False -> add b 2
1444         \stopbuffer
1445
1446         \startbuffer[to]
1447         letnonrec
1448           x0 = add b 1
1449           x1 = add b 2
1450         in
1451           case a of
1452             True -> x0
1453             False -> x1
1454         \stopbuffer
1455
1456         \transexample{selcasesimpl}{Selector case simplification}{from}{to}
1457
1458         \startbuffer[from]
1459         case a of
1460           (,) b c -> add b c
1461         \stopbuffer
1462         \startbuffer[to]
1463         letrec
1464           b = case a of (,) b c -> b
1465           c = case a of (,) b c -> c
1466           x0 = add b c
1467         in
1468           case a of
1469             (,) w0 w1 -> x0
1470         \stopbuffer
1471
1472         \transexample{excasesimpl}{Extractor case simplification}{from}{to}
1473
1474         \refdef{selector case}
1475         In \in{example}[ex:trans:excasesimpl] the case expression is expanded
1476         into multiple case expressions, including a pretty useless expression
1477         (that is neither a selector or extractor case). This case can be
1478         removed by the Case removal transformation in
1479         \in{section}[sec:transformation:caseremoval].
1480
1481       \subsubsection[sec:transformation:caseremoval]{Case removal}
1482         This transform removes any case statements with a single alternative and
1483         only wild binders.
1484
1485         These "useless" case statements are usually leftovers from case simplification
1486         on extractor case (see the previous example).
1487
1488         \starttrans
1489         case x of
1490           C v0 ... vm -> E
1491         ----------------------     \lam{\forall i, 0 ≤ i ≤ m} (\lam{vi} does not occur free in E)
1492         E
1493         \stoptrans
1494
1495         \startbuffer[from]
1496         case a of
1497           (,) w0 w1 -> x0
1498         \stopbuffer
1499
1500         \startbuffer[to]
1501         x0
1502         \stopbuffer
1503
1504         \transexample{caserem}{Case removal}{from}{to}
1505
1506     \subsection[sec:normalization:nonrep]{Removing unrepresentable values}
1507       The transformations in this section are aimed at making all the
1508       values used in our expression representable. There are two main
1509       transformations that are applied to \emph{all} unrepresentable let
1510       bindings and function arguments. These are meant to address three
1511       different kinds of unrepresentable values: Polymorphic values, higher
1512       order values and literals. The transformation are described generically:
1513       They apply to all non-representable values. However, non-representable
1514       values that don't fall into one of these three categories will be moved
1515       around by these transformations but are unlikely to completely
1516       disappear. They usually mean the program was not valid in the first
1517       place, because unsupported types were used (for example, a program using
1518       strings).
1519
1520       Each of these three categories will be detailed below, followed by the
1521       actual transformations.
1522
1523       \subsubsection{Removing Polymorphism}
1524         As noted in \in{section}[sec:prototype:polymporphism],
1525         polymorphism is made explicit in Core through type and
1526         dictionary arguments. To remove the polymorphism from a
1527         function, we can simply specialize the polymorphic function for
1528         the particular type applied to it. The same goes for dictionary
1529         arguments. To remove polymorphism from let bound values, we
1530         simply inline the let bindings that have a polymorphic type,
1531         which should (eventually) make sure that the polymorphic
1532         expression is applied to a type and/or dictionary, which can
1533         then be removed by β-reduction (\in{section}[sec:normalization:beta]).
1534
1535         Since both type and dictionary arguments are not representable,
1536         \refdef{representable}
1537         the non-representable argument specialization and
1538         non-representable let binding inlining transformations below
1539         take care of exactly this.
1540
1541         There is one case where polymorphism cannot be completely
1542         removed: Builtin functions are still allowed to be polymorphic
1543         (Since we have no function body that we could properly
1544         specialize). However, the code that generates \VHDL for builtin
1545         functions knows how to handle this, so this is not a problem.
1546
1547       \subsubsection{Defunctionalization}
1548         These transformations remove higher order expressions from our
1549         program, making all values first-order.
1550
1551         Higher order values are always introduced by lambda abstractions, none
1552         of the other Core expression elements can introduce a function type.
1553         However, other expressions can \emph{have} a function type, when they
1554         have a lambda expression in their body.
1555
1556         For example, the following expression is a higher order expression
1557         that is not a lambda expression itself:
1558
1559         \refdef{id function}
1560         \startlambda
1561           case x of
1562             High -> id
1563             Low -> λx.x
1564         \stoplambda
1565
1566         The reference to the \lam{id} function shows that we can introduce a
1567         higher order expression in our program without using a lambda
1568         expression directly. However, inside the definition of the \lam{id}
1569         function, we can be sure that a lambda expression is present.
1570
1571         Looking closely at the definition of our normal form in
1572         \in{section}[sec:normalization:intendednormalform], we can see that
1573         there are three possibilities for higher order values to appear in our
1574         intended normal form:
1575
1576         \startitemize[KR]
1577           \item[item:toplambda] Lambda abstractions can appear at the highest level of a
1578           top level function. These lambda abstractions introduce the
1579           arguments (input ports / current state) of the function.
1580           \item[item:builtinarg] (Partial applications of) top level functions can appear as an
1581           argument to a builtin function.
1582           \item[item:completeapp] (Partial applications of) top level functions can appear in
1583           function position of an application. Since a partial application
1584           cannot appear anywhere else (except as builtin function arguments),
1585           all partial applications are applied, meaning that all applications
1586           will become complete applications. However, since application of
1587           arguments happens one by one, in the expression:
1588           \startlambda
1589             f 1 2
1590           \stoplambda
1591           the subexpression \lam{f 1} has a function type. But this is
1592           allowed, since it is inside a complete application.
1593         \stopitemize
1594
1595         We will take a typical function with some higher order values as an
1596         example. The following function takes two arguments: a \lam{Bit} and a
1597         list of numbers. Depending on the first argument, each number in the
1598         list is doubled, or the list is returned unmodified. For the sake of
1599         the example, no polymorphism is shown. In reality, at least map would
1600         be polymorphic.
1601
1602         \startlambda
1603         λy.let double = λx. x + x in
1604              case y of
1605                 Low -> map double
1606                 High -> λz. z
1607         \stoplambda
1608
1609         This example shows a number of higher order values that we cannot
1610         translate to \VHDL directly. The \lam{double} binder bound in the let
1611         expression has a function type, as well as both of the alternatives of
1612         the case expression. The first alternative is a partial application of
1613         the \lam{map} builtin function, whereas the second alternative is a
1614         lambda abstraction.
1615
1616         To reduce all higher order values to one of the above items, a number
1617         of transformations we've already seen are used. The η-abstraction
1618         transformation from \in{section}[sec:normalization:eta] ensures all
1619         function arguments are introduced by lambda abstraction on the highest
1620         level of a function. These lambda arguments are allowed because of
1621         \in{item}[item:toplambda] above. After η-abstraction, our example
1622         becomes a bit bigger:
1623
1624         \startlambda
1625         λy.λq.(let double = λx. x + x in
1626                  case y of
1627                    Low -> map double
1628                    High -> λz. z
1629               ) q
1630         \stoplambda
1631
1632         η-abstraction also introduces extra applications (the application of
1633         the let expression to \lam{q} in the above example). These
1634         applications can then propagated down by the application propagation
1635         transformation (\in{section}[sec:normalization:appprop]). In our
1636         example, the \lam{q} and \lam{r} variable will be propagated into the
1637         let expression and then into the case expression:
1638
1639         \startlambda
1640         λy.λq.let double = λx. x + x in
1641                 case y of
1642                   Low -> map double q
1643                   High -> (λz. z) q
1644         \stoplambda
1645
1646         This propagation makes higher order values become applied (in
1647         particular both of the alternatives of the case now have a
1648         representable type. Completely applied top level functions (like the
1649         first alternative) are now no longer invalid (they fall under
1650         \in{item}[item:completeapp] above). (Completely) applied lambda
1651         abstractions can be removed by β-abstraction. For our example,
1652         applying β-abstraction results in the following:
1653
1654         \startlambda
1655         λy.λq.let double = λx. x + x in
1656                 case y of
1657                   Low -> map double q
1658                   High -> q
1659         \stoplambda
1660
1661         As you can see in our example, all of this moves applications towards
1662         the higher order values, but misses higher order functions bound by
1663         let expressions. The applications cannot be moved towards these values
1664         (since they can be used in multiple places), so the values will have
1665         to be moved towards the applications. This is achieved by inlining all
1666         higher order values bound by let applications, by the
1667         non-representable binding inlining transformation below. When applying
1668         it to our example, we get the following:
1669
1670         \startlambda
1671         λy.λq.case y of
1672                 Low -> map (λx. x + x) q
1673                 High -> q
1674         \stoplambda
1675
1676         We've nearly eliminated all unsupported higher order values from this
1677         expressions. The one that's remaining is the first argument to the
1678         \lam{map} function. Having higher order arguments to a builtin
1679         function like \lam{map} is allowed in the intended normal form, but
1680         only if the argument is a (partial application) of a top level
1681         function. This is easily done by introducing a new top level function
1682         and put the lambda abstraction inside. This is done by the function
1683         extraction transformation from
1684         \in{section}[sec:normalization:funextract].
1685
1686         \startlambda
1687         λy.λq.case y of
1688                 Low -> map func q
1689                 High -> q
1690         \stoplambda
1691
1692         This also introduces a new function, that we have called \lam{func}:
1693
1694         \startlambda
1695         func = λx. x + x
1696         \stoplambda
1697
1698         Note that this does not actually remove the lambda, but now it is a
1699         lambda at the highest level of a function, which is allowed in the
1700         intended normal form.
1701
1702         There is one case that has not been discussed yet. What if the
1703         \lam{map} function in the example above was not a builtin function
1704         but a user-defined function? Then extracting the lambda expression
1705         into a new function would not be enough, since user-defined functions
1706         can never have higher order arguments. For example, the following
1707         expression shows an example:
1708
1709         \startlambda
1710         twice :: (Word -> Word) -> Word -> Word
1711         twice = λf.λa.f (f a)
1712
1713         main = λa.app (λx. x + x) a
1714         \stoplambda
1715
1716         This example shows a function \lam{twice} that takes a function as a
1717         first argument and applies that function twice to the second argument.
1718         Again, we've made the function monomorphic for clarity, even though
1719         this function would be a lot more useful if it was polymorphic. The
1720         function \lam{main} uses \lam{twice} to apply a lambda epression twice.
1721
1722         When faced with a user defined function, a body is available for that
1723         function. This means we could create a specialized version of the
1724         function that only works for this particular higher order argument
1725         (\ie, we can just remove the argument and call the specialized
1726         function without the argument). This transformation is detailed below.
1727         Applying this transformation to the example gives:
1728
1729         \startlambda
1730         twice' :: Word -> Word
1731         twice' = λb.(λf.λa.f (f a)) (λx. x + x) b
1732
1733         main = λa.app' a
1734         \stoplambda
1735
1736         The \lam{main} function is now in normal form, since the only higher
1737         order value there is the top level lambda expression. The new
1738         \lam{twice'} function is a bit complex, but the entire original body of
1739         the original \lam{twice} function is wrapped in a lambda abstraction
1740         and applied to the argument we've specialized for (\lam{λx. x + x})
1741         and the other arguments. This complex expression can fortunately be
1742         effectively reduced by repeatedly applying β-reduction:
1743
1744         \startlambda
1745         twice' :: Word -> Word
1746         twice' = λb.(b + b) + (b + b)
1747         \stoplambda
1748
1749         This example also shows that the resulting normal form might not be as
1750         efficient as we might hope it to be (it is calculating \lam{(b + b)}
1751         twice). This is discussed in more detail in
1752         \in{section}[sec:normalization:duplicatework].
1753
1754       \subsubsection{Literals}
1755         There are a limited number of literals available in Haskell and Core.
1756         \refdef{enumerated types} When using (enumerating) algebraic
1757         datatypes, a literal is just a reference to the corresponding data
1758         constructor, which has a representable type (the algebraic datatype)
1759         and can be translated directly. This also holds for literals of the
1760         \hs{Bool} Haskell type, which is just an enumerated type.
1761
1762         There is, however, a second type of literal that does not have a
1763         representable type: Integer literals. Cλash supports using integer
1764         literals for all three integer types supported (\hs{SizedWord},
1765         \hs{SizedInt} and \hs{RangedWord}). This is implemented using
1766         Haskell's \hs{Num} typeclass, which offers a \hs{fromInteger} method
1767         that converts any \hs{Integer} to the Cλash datatypes.
1768
1769         When \GHC sees integer literals, it will automatically insert calls to
1770         the \hs{fromInteger} method in the resulting Core expression. For
1771         example, the following expression in Haskell creates a 32 bit unsigned
1772         word with the value 1. The explicit type signature is needed, since
1773         there is no context for \GHC to determine the type from otherwise.
1774
1775         \starthaskell
1776         1 :: SizedWord D32
1777         \stophaskell
1778
1779         This Haskell code results in the following Core expression:
1780
1781         \startlambda
1782         fromInteger @(SizedWord D32) \$dNum (smallInteger 10)
1783         \stoplambda
1784
1785         The literal 10 will have the type \lam{GHC.Prim.Int\#}, which is
1786         converted into an \lam{Integer} by \lam{smallInteger}. Finally, the
1787         \lam{fromInteger} function will finally convert this into a
1788         \lam{SizedWord D32}.
1789
1790         Both the \lam{GHC.Prim.Int\#} and \lam{Integer} types are not
1791         representable, and cannot be translated directly. Fortunately, there
1792         is no need to translate them, since \lam{fromInteger} is a builtin
1793         function that knows how to handle these values. However, this does
1794         require that the \lam{fromInteger} function is directly applied to
1795         these non-representable literal values, otherwise errors will occur.
1796         For example, the following expression is not in the intended normal
1797         form, since one of the let bindings has an unrepresentable type
1798         (\lam{Integer}):
1799
1800         \startlambda
1801         let l = smallInteger 10 in fromInteger @(SizedWord D32) \$dNum l
1802         \stoplambda
1803
1804         By inlining these let-bindings, we can ensure that unrepresentable
1805         literals bound by a let binding end up in an application of the
1806         appropriate builtin function, where they are allowed. Since it is
1807         possible that the application of that function is in a different
1808         function than the definition of the literal value, we will always need
1809         to specialize away any unrepresentable literals that are used as
1810         function arguments. The following two transformations do exactly this.
1811
1812       \subsubsection{Non-representable binding inlining}
1813         This transform inlines let bindings that are bound to a
1814         non-representable value. Since we can never generate a signal
1815         assignment for these bindings (we cannot declare a signal assignment
1816         with a non-representable type, for obvious reasons), we have no choice
1817         but to inline the binding to remove it.
1818
1819         As we have seen in the previous sections, inlining these bindings
1820         solves (part of) the polymorphism, higher order values and
1821         unrepresentable literals in an expression.
1822
1823         \starttrans
1824         letrec
1825           a0 = E0
1826           \vdots
1827           ai = Ei
1828           \vdots
1829           an = En
1830         in
1831           M
1832         --------------------------    \lam{Ei} has a non-representable type.
1833         letrec
1834           a0 = E0 [ai=>Ei] \vdots
1835           ai-1 = Ei-1 [ai=>Ei]
1836           ai+1 = Ei+1 [ai=>Ei]
1837           \vdots
1838           an = En [ai=>Ei]
1839         in
1840           M[ai=>Ei]
1841         \stoptrans
1842
1843         \startbuffer[from]
1844         letrec
1845           a = smallInteger 10
1846           inc = λb -> add b 1
1847           inc' = add 1
1848           x = fromInteger a
1849         in
1850           inc (inc' x)
1851         \stopbuffer
1852
1853         \startbuffer[to]
1854         letrec
1855           x = fromInteger (smallInteger 10)
1856         in
1857           (λb -> add b 1) (add 1 x)
1858         \stopbuffer
1859
1860         \transexample{nonrepinline}{Nonrepresentable binding inlining}{from}{to}
1861
1862       \subsubsection[sec:normalization:specialize]{Function specialization}
1863         This transform removes arguments to user-defined functions that are
1864         not representable at runtime. This is done by creating a
1865         \emph{specialized} version of the function that only works for one
1866         particular value of that argument (in other words, the argument can be
1867         removed).
1868
1869         Specialization means to create a specialized version of the called
1870         function, with one argument already filled in. As a simple example, in
1871         the following program (this is not actual Core, since it directly uses
1872         a literal with the unrepresentable type \lam{GHC.Prim.Int\#}).
1873
1874         \startlambda
1875         f = λa.λb.a + b
1876         inc = λa.f a 1
1877         \stoplambda
1878
1879         We could specialize the function \lam{f} against the literal argument
1880         1, with the following result:
1881
1882         \startlambda
1883         f' = λa.a + 1
1884         inc = λa.f' a
1885         \stoplambda
1886
1887         In some way, this transformation is similar to β-reduction, but it
1888         operates across function boundaries. It is also similar to
1889         non-representable let binding inlining above, since it sort of
1890         \quote{inlines} an expression into a called function.
1891
1892         Special care must be taken when the argument has any free variables.
1893         If this is the case, the original argument should not be removed
1894         completely, but replaced by all the free variables of the expression.
1895         In this way, the original expression can still be evaluated inside the
1896         new function.
1897
1898         To prevent us from propagating the same argument over and over, a
1899         simple local variable reference is not propagated (since is has
1900         exactly one free variable, itself, we would only replace that argument
1901         with itself).
1902
1903         This shows that any free local variables that are not runtime
1904         representable cannot be brought into normal form by this transform. We
1905         rely on an inlining or β-reduction transformation to replace such a
1906         variable with an expression we can propagate again.
1907
1908         \starttrans
1909         x = E
1910         ~
1911         x Y0 ... Yi ... Yn                               \lam{Yi} is not representable
1912         ---------------------------------------------    \lam{Yi} is not a local variable reference
1913         x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} are all free local vars of \lam{Yi}
1914         ~                                                \lam{T0 ... Tn} are the types of \lam{Y0 ... Yn}
1915         x' = λ(y0 :: T0) ... λ(yi-1 :: Ty-1). λf0 ... λfm. λ(yi+1 :: Ty+1) ...  λ(yn :: Tn).
1916               E y0 ... yi-1 Yi yi+1 ... yn
1917         \stoptrans
1918
1919         This is a bit of a complex transformation. It transforms an
1920         application of the function \lam{x}, where one of the arguments
1921         (\lam{Y_i}) is not representable. A new
1922         function \lam{x'} is created that wraps the body of the old function.
1923         The body of the new function becomes a number of nested lambda
1924         abstractions, one for each of the original arguments that are left
1925         unchanged.
1926
1927         The ith argument is replaced with the free variables of
1928         \lam{Y_i}. Note that we reuse the same binders as those used in
1929         \lam{Y_i}, since we can then just use \lam{Y_i} inside the new
1930         function body and have all of the variables it uses be in scope.
1931
1932         The argument that we are specializing for, \lam{Y_i}, is put inside
1933         the new function body. The old function body is applied to it. Since
1934         we use this new function only in place of an application with that
1935         particular argument \lam{Y_i}, behaviour should not change.
1936
1937         Note that the types of the arguments of our new function are taken
1938         from the types of the \emph{actual} arguments (\lam{T0 ... Tn}). This
1939         means that any polymorphism in the arguments is removed, even when the
1940         corresponding explicit type lambda is not removed
1941         yet.\refdef{type lambda}
1942
1943         \todo{Examples. Perhaps reference the previous sections}
1944
1945
1946   \section{Unsolved problems}
1947     The above system of transformations has been implemented in the prototype
1948     and seems to work well to compile simple and more complex examples of
1949     hardware descriptions. \todo{Ref christiaan?} However, this normalization
1950     system has not seen enough review and work to be complete and work for
1951     every Core expression that is supplied to it. A number of problems
1952     have already been identified and are discussed in this section.
1953
1954     \subsection[sec:normalization:duplicatework]{Work duplication}
1955         A possible problem of β-reduction is that it could duplicate work.
1956         When the expression applied is not a simple variable reference, but
1957         requires calculation and the binder the lambda abstraction binds to
1958         is used more than once, more hardware might be generated than strictly
1959         needed.
1960
1961         As an example, consider the expression:
1962
1963         \startlambda
1964         (λx. x + x) (a * b)
1965         \stoplambda
1966
1967         When applying β-reduction to this expression, we get:
1968
1969         \startlambda
1970         (a * b) + (a * b)
1971         \stoplambda
1972
1973         which of course calculates \lam{(a * b)} twice.
1974
1975         A possible solution to this would be to use the following alternative
1976         transformation, which is of course no longer normal β-reduction. The
1977         followin transformation has not been tested in the prototype, but is
1978         given here for future reference:
1979
1980         \starttrans
1981         (λx.E) M
1982         -----------------
1983         letrec x = M in E
1984         \stoptrans
1985
1986         This doesn't seem like much of an improvement, but it does get rid of
1987         the lambda expression (and the associated higher order value), while
1988         at the same time introducing a new let binding. Since the result of
1989         every application or case expression must be bound by a let expression
1990         in the intended normal form anyway, this is probably not a problem. If
1991         the argument happens to be a variable reference, then simple let
1992         binding removal (\in{section}[sec:normalization:simplelet]) will
1993         remove it, making the result identical to that of the original
1994         β-reduction transformation.
1995
1996         When also applying argument simplification to the above example, we
1997         get the following expression:
1998
1999         \startlambda
2000         let y = (a * b)
2001             z = (a * b)
2002         in y + z
2003         \stoplambda
2004
2005         Looking at this, we could imagine an alternative approach: Create a
2006         transformation that removes let bindings that bind identical values.
2007         In the above expression, the \lam{y} and \lam{z} variables could be
2008         merged together, resulting in the more efficient expression:
2009
2010         \startlambda
2011         let y = (a * b) in y + y
2012         \stoplambda
2013
2014       \subsection[sec:normalization:non-determinism]{Non-determinism}
2015         As an example, again consider the following expression:
2016
2017         \startlambda
2018         (λx. x + x) (a * b)
2019         \stoplambda
2020
2021         We can apply both β-reduction (\in{section}[sec:normalization:beta])
2022         as well as argument simplification
2023         (\in{section}[sec:normalization:argsimpl]) to this expression.
2024
2025         When applying argument simplification first and then β-reduction, we
2026         get the following expression:
2027
2028         \startlambda
2029         let y = (a * b) in y + y
2030         \stoplambda
2031
2032         When applying β-reduction first and then argument simplification, we
2033         get the following expression:
2034
2035         \startlambda
2036         let y = (a * b)
2037             z = (a * b)
2038         in y + z
2039         \stoplambda
2040
2041         As you can see, this is a different expression. This means that the
2042         order of expressions, does in fact change the resulting normal form,
2043         which is something that we would like to avoid. In this particular
2044         case one of the alternatives is even clearly more efficient, so we
2045         would of course like the more efficient form to be the normal form.
2046
2047         For this particular problem, the solutions for duplication of work
2048         seem from the previous section seem to fix the determinism of our
2049         transformation system as well. However, it is likely that there are
2050         other occurences of this problem.
2051
2052       \subsection{Casts}
2053         We do not fully understand the use of cast expressions in Core, so
2054         there are probably expressions involving cast expressions that cannot
2055         be brought into intended normal form by this transformation system.
2056
2057         The uses of casts in the core system should be investigated more and
2058         transformations will probably need updating to handle them in all
2059         cases.
2060
2061   \section[sec:normalization:properties]{Provable properties}
2062     When looking at the system of transformations outlined above, there are a
2063     number of questions that we can ask ourselves. The main question is of course:
2064     \quote{Does our system work as intended?}. We can split this question into a
2065     number of subquestions:
2066
2067     \startitemize[KR]
2068     \item[q:termination] Does our system \emph{terminate}? Since our system will
2069     keep running as long as transformations apply, there is an obvious risk that
2070     it will keep running indefinitely. This typically happens when one
2071     transformation produces a result that is transformed back to the original
2072     by another transformation, or when one or more transformations keep
2073     expanding some expression.
2074     \item[q:soundness] Is our system \emph{sound}? Since our transformations
2075     continuously modify the expression, there is an obvious risk that the final
2076     normal form will not be equivalent to the original program: Its meaning could
2077     have changed.
2078     \item[q:completeness] Is our system \emph{complete}? Since we have a complex
2079     system of transformations, there is an obvious risk that some expressions will
2080     not end up in our intended normal form, because we forgot some transformation.
2081     In other words: Does our transformation system result in our intended normal
2082     form for all possible inputs?
2083     \item[q:determinism] Is our system \emph{deterministic}? Since we have defined
2084     no particular order in which the transformation should be applied, there is an
2085     obvious risk that different transformation orderings will result in
2086     \emph{different} normal forms. They might still both be intended normal forms
2087     (if our system is \emph{complete}) and describe correct hardware (if our
2088     system is \emph{sound}), so this property is less important than the previous
2089     three: The translator would still function properly without it.
2090     \stopitemize
2091
2092     Unfortunately, the final transformation system has only been
2093     developed in the final part of the research, leaving no more time
2094     for verifying these properties. In fact, it is likely that the
2095     current transformation system still violates some of these
2096     properties in some cases and should be improved (or extra conditions
2097     on the input hardware descriptions should be formulated).
2098
2099     This is most likely the case with the completeness and determinism
2100     properties, perhaps als the termination property. The soundness
2101     property probably holds, since it is easier to manually verify (each
2102     transformation can be reviewed separately).
2103
2104     Even though no complete proofs have been made, some ideas for
2105     possible proof strategies are shown below.
2106
2107     \subsection{Graph representation}
2108       Before looking into how to prove these properties, we'll look at our
2109       transformation system from a graph perspective. The nodes of the graph are
2110       all possible Core expressions. The (directed) edges of the graph are
2111       transformations. When a transformation α applies to an expression \lam{A} to
2112       produce an expression \lam{B}, we add an edge from the node for \lam{A} to the
2113       node for \lam{B}, labeled α.
2114
2115       \startuseMPgraphic{TransformGraph}
2116         save a, b, c, d;
2117
2118         % Nodes
2119         newCircle.a(btex \lam{(λx.λy. (+) x y) 1} etex);
2120         newCircle.b(btex \lam{λy. (+) 1 y} etex);
2121         newCircle.c(btex \lam{(λx.(+) x) 1} etex);
2122         newCircle.d(btex \lam{(+) 1} etex);
2123
2124         b.c = origin;
2125         c.c = b.c + (4cm, 0cm);
2126         a.c = midpoint(b.c, c.c) + (0cm, 4cm);
2127         d.c = midpoint(b.c, c.c) - (0cm, 3cm);
2128
2129         % β-conversion between a and b
2130         ncarc.a(a)(b) "name(bred)";
2131         ObjLabel.a(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
2132         ncarc.b(b)(a) "name(bexp)", "linestyle(dashed withdots)";
2133         ObjLabel.b(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
2134
2135         % η-conversion between a and c
2136         ncarc.a(a)(c) "name(ered)";
2137         ObjLabel.a(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
2138         ncarc.c(c)(a) "name(eexp)", "linestyle(dashed withdots)";
2139         ObjLabel.c(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
2140
2141         % η-conversion between b and d
2142         ncarc.b(b)(d) "name(ered)";
2143         ObjLabel.b(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
2144         ncarc.d(d)(b) "name(eexp)", "linestyle(dashed withdots)";
2145         ObjLabel.d(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
2146
2147         % β-conversion between c and d
2148         ncarc.c(c)(d) "name(bred)";
2149         ObjLabel.c(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
2150         ncarc.d(d)(c) "name(bexp)", "linestyle(dashed withdots)";
2151         ObjLabel.d(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
2152
2153         % Draw objects and lines
2154         drawObj(a, b, c, d);
2155       \stopuseMPgraphic
2156
2157       \placeexample[right][ex:TransformGraph]{Partial graph of a lambda calculus
2158       system with β and η reduction (solid lines) and expansion (dotted lines).}
2159           \boxedgraphic{TransformGraph}
2160
2161       Of course our graph is unbounded, since we can construct an infinite amount of
2162       Core expressions. Also, there might potentially be multiple edges between two
2163       given nodes (with different labels), though seems unlikely to actually happen
2164       in our system.
2165
2166       See \in{example}[ex:TransformGraph] for the graph representation of a very
2167       simple lambda calculus that contains just the expressions \lam{(λx.λy. (+) x
2168       y) 1}, \lam{λy. (+) 1 y}, \lam{(λx.(+) x) 1} and \lam{(+) 1}. The
2169       transformation system consists of β-reduction and η-reduction (solid edges) or
2170       β-expansion and η-expansion (dotted edges).
2171
2172       \todo{Define β-reduction and η-reduction?}
2173
2174       Note that the normal form of such a system consists of the set of nodes
2175       (expressions) without outgoing edges, since those are the expression to which
2176       no transformation applies anymore. We call this set of nodes the \emph{normal
2177       set}. The set of nodes containing expressions in intended normal
2178       form \refdef{intended normal form} is called the \emph{intended
2179       normal set}.
2180
2181       From such a graph, we can derive some properties easily:
2182       \startitemize[KR]
2183         \item A system will \emph{terminate} if there is no path of infinite length
2184         in the graph (this includes cycles, but can also happen without cycles).
2185         \item Soundness is not easily represented in the graph.
2186         \item A system is \emph{complete} if all of the nodes in the normal set have
2187         the intended normal form. The inverse (that all of the nodes outside of
2188         the normal set are \emph{not} in the intended normal form) is not
2189         strictly required. In other words, our normal set must be a
2190         subset of the intended normal form, but they do not need to be
2191         the same set.
2192         form.
2193         \item A system is deterministic if all paths starting at a particular
2194         node, which end in a node in the normal set, end at the same node.
2195       \stopitemize
2196
2197       When looking at the \in{example}[ex:TransformGraph], we see that the system
2198       terminates for both the reduction and expansion systems (but note that, for
2199       expansion, this is only true because we've limited the possible
2200       expressions.  In comlete lambda calculus, there would be a path from
2201       \lam{(λx.λy. (+) x y) 1} to \lam{(λx.λy.(λz.(+) z) x y) 1} to
2202       \lam{(λx.λy.(λz.(λq.(+) q) z) x y) 1} etc.)
2203
2204       If we would consider the system with both expansion and reduction, there
2205       would no longer be termination either, since there would be cycles all
2206       over the place.
2207
2208       The reduction and expansion systems have a normal set of containing just
2209       \lam{(+) 1} or \lam{(λx.λy. (+) x y) 1} respectively. Since all paths in
2210       either system end up in these normal forms, both systems are \emph{complete}.
2211       Also, since there is only one node in the normal set, it must obviously be
2212       \emph{deterministic} as well.
2213
2214     \todo{Add content to these sections}
2215     \subsection{Termination}
2216       In general, proving termination of an arbitrary program is a very
2217       hard problem. \todo{Ref about arbitrary termination} Fortunately,
2218       we only have to prove termination for our specific transformation
2219       system.
2220
2221       A common approach for these kinds of proofs is to associate a
2222       measure with each possible expression in our system. If we can
2223       show that each transformation strictly decreases this measure
2224       (\ie, the expression transformed to has a lower measure than the
2225       expression transformed from).  \todo{ref about measure-based
2226       termination proofs / analysis}
2227
2228       A good measure for a system consisting of just β-reduction would
2229       be the number of lambda expressions in the expression. Since every
2230       application of β-reduction removes a lambda abstraction (and there
2231       is always a bounded number of lambda abstractions in every
2232       expression) we can easily see that a transformation system with
2233       just β-reduction will always terminate.
2234
2235       For our complete system, this measure would be fairly complex
2236       (probably the sum of a lot of things). Since the (conditions on)
2237       our transformations are pretty complex, we would need to include
2238       both simple things like the number of let expressions as well as
2239       more complex things like the number of case expressions that are
2240       not yet in normal form.
2241
2242       No real attempt has been made at finding a suitable measure for
2243       our system yet.
2244
2245     \subsection{Soundness}
2246       Soundness is a property that can be proven for each transformation
2247       separately. Since our system only runs separate transformations
2248       sequentially, if each of our transformations leaves the
2249       \emph{meaning} of the expression unchanged, then the entire system
2250       will of course leave the meaning unchanged and is thus
2251       \emph{sound}.
2252
2253       The current prototype has only been verified in an ad-hoc fashion
2254       by inspecting (the code for) each transformation. A more formal
2255       verification would be more appropriate.
2256
2257       To be able to formally show that each transformation properly
2258       preserves the meaning of every expression, we require an exact
2259       definition of the \emph{meaning} of every expression, so we can
2260       compare them. Currently there seems to be no formal definition of
2261       the meaning or semantics of \GHC's core language, only informal
2262       descriptions are available.
2263
2264       It should be possible to have a single formal definition of
2265       meaning for Core for both normal Core compilation by \GHC and for
2266       our compilation to \VHDL. The main difference seems to be that in
2267       hardware every expression is always evaluated, while in software
2268       it is only evaluated if needed, but it should be possible to
2269       assign a meaning to core expressions that assumes neither.
2270
2271       Since each of the transformations can be applied to any
2272       subexpression as well, there is a constraint on our meaning
2273       definition: The meaning of an expression should depend only on the
2274       meaning of subexpressions, not on the expressions themselves. For
2275       example, the meaning of the application in \lam{f (let x = 4 in
2276       x)} should be the same as the meaning of the application in \lam{f
2277       4}, since the argument subexpression has the same meaning (though
2278       the actual expression is different).
2279
2280     \subsection{Completeness}
2281       Proving completeness is probably not hard, but it could be a lot
2282       of work. We have seen above that to prove completeness, we must
2283       show that the normal set of our graph representation is a subset
2284       of the intended normal set.
2285
2286       However, it is hard to systematically generate or reason about the
2287       normal set, since it is defined as any nodes to which no
2288       transformation applies. To determine this set, each transformation
2289       must be considered and when a transformation is added, the entire
2290       set should be re-evaluated. This means it is hard to show that
2291       each node in the normal set is also in the intended normal set.
2292       Reasoning about our intended normal set is easier, since we know
2293       how to generate it from its definition. \refdef{intended normal
2294       form definition}.
2295
2296       Fortunately, we can also prove the complement (which is
2297       equivalent, since $A \subseteq B \Leftrightarrow \overline{B}
2298       \subseteq \overline{A}$): Show that the set of nodes not in
2299       intended normal form is a subset of the set of nodes not in normal
2300       form. In other words, show that for every expression that is not
2301       in intended normal form, that there is at least one transformation
2302       that applies to it (since that means it is not in normal form
2303       either and since $A \subseteq C \Leftrightarrow \forall x (x \in A
2304       \rightarrow x \in C)$).
2305
2306       By systematically reviewing the entire Core language definition
2307       along with the intended normal form definition (both of which have
2308       a similar structure), it should be possible to identify all
2309       possible (sets of) core expressions that are not in intended
2310       normal form and identify a transformation that applies to it.
2311
2312       This approach is especially useful for proving completeness of our
2313       system, since if expressions exist to which none of the
2314       transformations apply (\ie if the system is not yet complete), it
2315       is immediately clear which expressions these are and adding
2316       (or modifying) transformations to fix this should be relatively
2317       easy.
2318
2319       As observed above, applying this approach is a lot of work, since
2320       we need to check every (set of) transformation(s) separately.
2321
2322       \todo{Perhaps do a few steps of the proofs as proof-of-concept}
2323
2324 % vim: set sw=2 sts=2 expandtab: