Chapters/Normalization.tex

   1 \chapter[chap:normalization]{Normalization}
   2   % A helper to print a single example in the half the page width. The example
   3   % text should be in a buffer whose name is given in an argument.
   4   %
   5   % The align=right option really does left-alignment, but without the program
   6   % will end up on a single line. The strut=no option prevents a bunch of empty
   7   % space at the start of the frame.
   8   \define[1]\example{
   9     \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
  10       \setuptyping[option=LAM,style=sans,before=,after=,strip=auto]
  11       \typebuffer[#1]
  12       \setuptyping[option=none,style=\tttf,strip=auto]
  13     }
  14   }
  15
  16   \define[4]\transexample{
  17     \placeexample[here][ex:trans:#1]{#2}
  18     \startcombination[2*1]
  19       {\example{#3}}{Original program}
  20       {\example{#4}}{Transformed program}
  21     \stopcombination
  22   }
  23
  24   The first step in the core to \small{VHDL} translation process, is normalization. We
  25   aim to bring the core description into a simpler form, which we can
  26   subsequently translate into \small{VHDL} easily. This normal form is needed because
  27   the full core language is more expressive than \small{VHDL} in some
  28   areas (higher order expressions, limited polymorphism using type
  29   classes, etc.) and because core can describe expressions that do not
  30   have a direct hardware interpretation.
  31
  32   \section{Normal form}
  33     The transformations described here have a well-defined goal: To bring the
  34     program in a well-defined form that is directly translatable to
  35     \VHDL, while fully preserving the semantics of the program. We refer
  36     to this form as the \emph{normal form} of the program. The formal
  37     definition of this normal form is quite simple:
  38
  39     \placedefinition{}{\startboxed A program is in \emph{normal form} if none of the
  40     transformations from this chapter apply.\stopboxed}
  41
  42     Of course, this is an \quote{easy} definition of the normal form, since our
  43     program will end up in normal form automatically. The more interesting part is
  44     to see if this normal form actually has the properties we would like it to
  45     have.
  46
  47     But, before getting into more definitions and details about this normal form,
  48     let's try to get a feeling for it first. The easiest way to do this is by
  49     describing the things we want to not have in a normal form.
  50
  51     \startitemize
  52       \item Any \emph{polymorphism} must be removed. When laying down hardware, we
  53       can't generate any signals that can have multiple types. All types must be
  54       completely known to generate hardware.
  55
  56       \item All \emph{higher order} constructions must be removed. We can't
  57       generate a hardware signal that contains a function, so all values,
  58       arguments and return values used must be first order.
  59
  60       \item All complex \emph{nested scopes} must be removed. In the \small{VHDL}
  61       description, every signal is in a single scope. Also, full expressions are
  62       not supported everywhere (in particular port maps can only map signal
  63       names and constants, not complete expressions). To make the \small{VHDL}
  64       generation easy, a separate binder must be bound to ever application or
  65       other expression.
  66     \stopitemize
  67
  68     \todo{Intermezzo: functions vs plain values}
  69
  70     A very simple example of a program in normal form is given in
  71     \in{example}[ex:MulSum]. As you can see, all arguments to the function (which
  72     will become input ports in the generated \VHDL) are at the outer level.
  73     This means that the body of the inner lambda abstraction is never a
  74     function, but always a plain value.
  75
  76     As the body of the inner lambda abstraction, we see a single (recursive)
  77     let expression, that binds two variables (\lam{mul} and \lam{sum}). These
  78     variables will be signals in the generated \VHDL, bound to the output port
  79     of the \lam{*} and \lam{+} components.
  80
  81     The final line (the \quote{return value} of the function) selects the
  82     \lam{sum} signal to be the output port of the function. This \quote{return
  83     value} can always only be a variable reference, never a more complex
  84     expression.
  85
  86     \todo{Add generated VHDL}
  87
  88     \startbuffer[MulSum]
  89     alu :: Bit -> Word -> Word -> Word
  90     alu = λa.λb.λc.
  91         let
  92           mul = (*) a b
  93           sum = (+) mul c
  94         in
  95           sum
  96     \stopbuffer
  97
  98     \startuseMPgraphic{MulSum}
  99       save a, b, c, mul, add, sum;
 100
 101       % I/O ports
 102       newCircle.a(btex $a$ etex) "framed(false)";
 103       newCircle.b(btex $b$ etex) "framed(false)";
 104       newCircle.c(btex $c$ etex) "framed(false)";
 105       newCircle.sum(btex $sum$ etex) "framed(false)";
 106
 107       % Components
 108       newCircle.mul(btex * etex);
 109       newCircle.add(btex + etex);
 110
 111       a.c      - b.c   = (0cm, 2cm);
 112       b.c      - c.c   = (0cm, 2cm);
 113       add.c            = c.c + (2cm, 0cm);
 114       mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
 115       sum.c            = add.c + (2cm, 0cm);
 116       c.c              = origin;
 117
 118       % Draw objects and lines
 119       drawObj(a, b, c, mul, add, sum);
 120
 121       ncarc(a)(mul) "arcangle(15)";
 122       ncarc(b)(mul) "arcangle(-15)";
 123       ncline(c)(add);
 124       ncline(mul)(add);
 125       ncline(add)(sum);
 126     \stopuseMPgraphic
 127
 128     \placeexample[here][ex:MulSum]{Simple architecture consisting of a
 129     multiplier and a subtractor.}
 130       \startcombination[2*1]
 131         {\typebufferlam{MulSum}}{Core description in normal form.}
 132         {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
 133       \stopcombination
 134
 135     \in{Example}[ex:MulSum] showed a function that just applied two
 136     other functions (multiplication and addition), resulting in a simple
 137     architecture with two components and some connections.  There is of
 138     course also some mechanism for choice in the normal form. In a
 139     normal Core program, the \emph{case} expression can be used in a few
 140     different ways to describe choice. In normal form, this is limited
 141     to a very specific form.
 142
 143     \in{Example}[ex:AddSubAlu] shows an example describing a
 144     simple \small{ALU}, which chooses between two operations based on an opcode
 145     bit. The main structure is similar to \in{example}[ex:MulSum], but this
 146     time the \lam{res} variable is bound to a case expression. This case
 147     expression scrutinizes the variable \lam{opcode} (and scrutinizing more
 148     complex expressions is not supported). The case expression can select a
 149     different variable based on the constructor of \lam{opcode}.
 150     \refdef{case expression}
 151
 152     \startbuffer[AddSubAlu]
 153     alu :: Bit -> Word -> Word -> Word
 154     alu = λopcode.λa.λb.
 155         let
 156           res1 = (+) a b
 157           res2 = (-) a b
 158           res = case opcode of
 159             Low -> res1
 160             High -> res2
 161         in
 162           res
 163     \stopbuffer
 164
 165     \startuseMPgraphic{AddSubAlu}
 166       save opcode, a, b, add, sub, mux, res;
 167
 168       % I/O ports
 169       newCircle.opcode(btex $opcode$ etex) "framed(false)";
 170       newCircle.a(btex $a$ etex) "framed(false)";
 171       newCircle.b(btex $b$ etex) "framed(false)";
 172       newCircle.res(btex $res$ etex) "framed(false)";
 173       % Components
 174       newCircle.add(btex + etex);
 175       newCircle.sub(btex - etex);
 176       newMux.mux;
 177
 178       opcode.c - a.c   = (0cm, 2cm);
 179       add.c    - a.c   = (4cm, 0cm);
 180       sub.c    - b.c   = (4cm, 0cm);
 181       a.c      - b.c   = (0cm, 3cm);
 182       mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
 183       res.c    - mux.c = (1.5cm, 0cm);
 184       b.c              = origin;
 185
 186       % Draw objects and lines
 187       drawObj(opcode, a, b, res, add, sub, mux);
 188
 189       ncline(a)(add) "posA(e)";
 190       ncline(b)(sub) "posA(e)";
 191       nccurve(a)(sub) "posA(e)", "angleA(0)";
 192       nccurve(b)(add) "posA(e)", "angleA(0)";
 193       nccurve(add)(mux) "posB(inpa)", "angleB(0)";
 194       nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
 195       nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
 196       ncline(mux)(res) "posA(out)";
 197     \stopuseMPgraphic
 198
 199     \placeexample[here][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
 200       \startcombination[2*1]
 201         {\typebufferlam{AddSubAlu}}{Core description in normal form.}
 202         {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
 203       \stopcombination
 204
 205     As a more complete example, consider
 206     \in{example}[ex:NormalComplete]. This example shows everything that
 207     is allowed in normal form, except for builtin higher order functions
 208     (like \lam{map}). The graphical version of the architecture contains
 209     a slightly simplified version, since the state tuple packing and
 210     unpacking have been left out. Instead, two separate registers are
 211     drawn. Also note that most synthesis tools will further optimize
 212     this architecture by removing the multiplexers at the register input
 213     and instead put some gates in front of the register's clock input,
 214     but we want to show the architecture as close to the description as
 215     possible.
 216
 217     As you can see from the previous examples, the generation of the final
 218     architecture from the normal form is straightforward. In each of the
 219     examples, there is a direct match between the normal form structure,
 220     the generated VHDL and the architecture shown in the images.
 221
 222     \startbuffer[NormalComplete]
 223       regbank :: Bit
 224                  -> Word
 225                  -> State (Word, Word)
 226                  -> (State (Word, Word), Word)
 227
 228       -- All arguments are an inital lambda (address, data, packed state)
 229       regbank = λa.λd.λsp.
 230       -- There are nested let expressions at top level
 231       let
 232         -- Unpack the state by coercion (\eg, cast from
 233         -- State (Word, Word) to (Word, Word))
 234         s = sp ▶ (Word, Word)
 235         -- Extract both registers from the state
 236         r1 = case s of (a, b) -> a
 237         r2 = case s of (a, b) -> b
 238         -- Calling some other user-defined function.
 239         d' = foo d
 240         -- Conditional connections
 241         out = case a of
 242           High -> r1
 243           Low -> r2
 244         r1' = case a of
 245           High -> d'
 246           Low -> r1
 247         r2' = case a of
 248           High -> r2
 249           Low -> d'
 250         -- Packing a tuple
 251         s' = (,) r1' r2'
 252         -- pack the state by coercion (\eg, cast from
 253         -- (Word, Word) to State (Word, Word))
 254         sp' = s' ▶ State (Word, Word)
 255         -- Pack our return value
 256         res = (,) sp' out
 257       in
 258         -- The actual result
 259         res
 260     \stopbuffer
 261
 262     \startuseMPgraphic{NormalComplete}
 263       save a, d, r, foo, muxr, muxout, out;
 264
 265       % I/O ports
 266       newCircle.a(btex \lam{a} etex) "framed(false)";
 267       newCircle.d(btex \lam{d} etex) "framed(false)";
 268       newCircle.out(btex \lam{out} etex) "framed(false)";
 269       % Components
 270       %newCircle.add(btex + etex);
 271       newBox.foo(btex \lam{foo} etex);
 272       newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
 273       newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
 274       newMux.muxr1;
 275       % Reflect over the vertical axis
 276       reflectObj(muxr1)((0,0), (0,1));
 277       newMux.muxr2;
 278       newMux.muxout;
 279       rotateObj(muxout)(-90);
 280
 281       d.c               = foo.c + (0cm, 1.5cm);
 282       a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
 283       foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
 284       muxr1.c           = r1.c + (0cm, 2cm);
 285       muxr2.c           = r2.c + (0cm, 2cm);
 286       r2.c              = r1.c + (4cm, 0cm);
 287       r1.c              = origin;
 288       muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
 289       out.c             = muxout.c - (0cm, 1.5cm);
 290
 291     %  % Draw objects and lines
 292       drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
 293
 294       ncline(d)(foo);
 295       nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
 296       nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
 297       nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
 298       nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
 299       nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
 300       nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
 301       nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
 302       nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
 303       % Connect port a
 304       nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
 305       nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
 306       nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
 307       ncline(muxout)(out) "posA(out)";
 308     \stopuseMPgraphic
 309
 310     \todo{Don't split registers in this image?}
 311     \placeexample[here][ex:NormalComplete]{Simple architecture consisting of an adder and a
 312     subtractor.}
 313       \startcombination[2*1]
 314         {\typebufferlam{NormalComplete}}{Core description in normal form.}
 315         {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
 316       \stopcombination
 317
 318
 319
 320     \subsection[sec:normalization:intendednormalform]{Intended normal form definition}
 321       Now we have some intuition for the normal form, we can describe how we want
 322       the normal form to look like in a slightly more formal manner. The following
 323       EBNF-like description captures most of the intended structure (and
 324       generates a subset of GHC's core format).
 325
 326       There are two things missing: Cast expressions are sometimes
 327       allowed by the prototype, but not specified here and the below
 328       definition allows uses of state that cannot be translated to \VHDL
 329       properly. These two problems are discussed in
 330       \in{section}[sec:normalization:castproblems] and
 331       \in{section}[sec:normalization:stateproblems] respectively.
 332
 333       Some clauses have an expression listed behind them in parentheses.
 334       These are conditions that need to apply to the clause. The
 335       predicates used there (\lam{lvar()}, \lam{representable()},
 336       \lam{gvar()}) will be defined in
 337       \in{section}[sec:normalization:predicates].
 338
 339       An expression is in normal form if it matches the first
 340       definition, \emph{normal}.
 341
 342       \todo{Fix indentation}
 343       \startbuffer[IntendedNormal]
 344       \italic{normal} := \italic{lambda}
 345       \italic{lambda} := λvar.\italic{lambda}                        (representable(var))
 346                       | \italic{toplet}
 347       \italic{toplet} := letrec [\italic{binding}...] in var         (representable(var))
 348       \italic{binding} := var = \italic{rhs}                         (representable(rhs))
 349                        -- State packing and unpacking by coercion
 350                        | var0 = var1 ▶ State ty                      (lvar(var1))
 351                        | var0 = var1 ▶ ty                            (var1 :: State ty ∧ lvar(var1))
 352       \italic{rhs} := \italic{userapp}
 353                    | \italic{builtinapp}
 354                    -- Extractor case
 355                    | case var of C a0 ... an -> ai                   (lvar(var))
 356                    -- Selector case
 357                    | case var of                                     (lvar(var))
 358                       [ DEFAULT -> var ]                             (lvar(var))
 359                       C0 w0,0 ... w0,n -> var0
 360                       \vdots
 361                       Cm wm,0 ... wm,n -> varm                       (\forall{}i \forall{}j, wi,j \neq vari, lvar(vari))
 362       \italic{userapp} := \italic{userfunc}
 363                        | \italic{userapp} {userarg}
 364       \italic{userfunc} := var                                       (gvar(var))
 365       \italic{userarg} := var                                        (lvar(var))
 366       \italic{builtinapp} := \italic{builtinfunc}
 367                           | \italic{builtinapp} \italic{builtinarg}
 368       \italic{builtinfunc} := var                                    (bvar(var))
 369       \italic{builtinarg} := var                                     (representable(var) ∧ lvar(var))
 370                           | \italic{partapp}                         (partapp :: a -> b)
 371                           | \italic{coreexpr}                        (¬representable(coreexpr) ∧ ¬(coreexpr :: a -> b))
 372       \italic{partapp} := \italic{userapp}
 373                        | \italic{builtinapp}
 374       \stopbuffer
 375
 376       \placedefinition[][def:IntendedNormal]{Definition of the intended nnormal form using an \small{EBNF}-like syntax.}
 377           {\defref{intended normal form definition}
 378            \typebufferlam{IntendedNormal}}
 379
 380       When looking at such a program from a hardware perspective, the
 381       top level lambda abstractions define the input ports. Lambda
 382       abstractions cannot appear anywhere else. The variable reference
 383       in the body of the recursive let expression is the output port.
 384       Most function applications bound by the let expression define a
 385       component instantiation, where the input and output ports are
 386       mapped to local signals or arguments. Some of the others use a
 387       builtin construction (\eg the \lam{case} expression) or call a
 388       builtin function (\eg \lam{+} or \lam{map}). For these, a
 389       hardcoded \small{VHDL} translation is available.
 390
 391   \section[sec:normalization:transformation]{Transformation notation}
 392     To be able to concisely present transformations, we use a specific format
 393     for them. It is a simple format, similar to one used in logic reasoning.
 394
 395     Such a transformation description looks like the following.
 396
 397     \starttrans
 398     <context conditions>
 399     ~
 400     <original expression>
 401     --------------------------          <expression conditions>
 402     <transformed expresssion>
 403     ~
 404     <context additions>
 405     \stoptrans
 406
 407     This format desribes a transformation that applies to \lam{<original
 408     expresssion>} and transforms it into \lam{<transformed expression>}, assuming
 409     that all conditions are satisfied. In this format, there are a number of placeholders
 410     in pointy brackets, most of which should be rather obvious in their meaning.
 411     Nevertheless, we will more precisely specify their meaning below:
 412
 413       \startdesc{<original expression>} The expression pattern that will be matched
 414       against (subexpressions of) the expression to be transformed. We call this a
 415       pattern, because it can contain \emph{placeholders} (variables), which match
 416       any expression or binder. Any such placeholder is said to be \emph{bound} to
 417       the expression it matches. It is convention to use an uppercase letter (\eg
 418       \lam{M} or \lam{E}) to refer to any expression (including a simple variable
 419       reference) and lowercase letters (\eg \lam{v} or \lam{b}) to refer to
 420       (references to) binders.
 421
 422       For example, the pattern \lam{a + B} will match the expression
 423       \lam{v + (2 * w)} (binding \lam{a} to \lam{v} and \lam{B} to
 424       \lam{(2 * w)}), but not \lam{(2 * w) + v}.
 425       \stopdesc
 426
 427       \startdesc{<expression conditions>}
 428       These are extra conditions on the expression that is matched. These
 429       conditions can be used to further limit the cases in which the
 430       transformation applies, commonly to prevent a transformation from
 431       causing a loop with itself or another transformation.
 432
 433       Only if these conditions are \emph{all} satisfied, the transformation
 434       applies.
 435       \stopdesc
 436
 437       \startdesc{<context conditions>}
 438       These are a number of extra conditions on the context of the function. In
 439       particular, these conditions can require some (other) top level function to be
 440       present, whose value matches the pattern given here. The format of each of
 441       these conditions is: \lam{binder = <pattern>}.
 442
 443       Typically, the binder is some placeholder bound in the \lam{<original
 444       expression>}, while the pattern contains some placeholders that are used in
 445       the \lam{transformed expression}.
 446
 447       Only if a top level binder exists that matches each binder and pattern,
 448       the transformation applies.
 449       \stopdesc
 450
 451       \startdesc{<transformed expression>}
 452       This is the expression template that is the result of the transformation. If, looking
 453       at the above three items, the transformation applies, the \lam{<original
 454       expression>} is completely replaced by the \lam{<transformed expression>}.
 455       We call this a template, because it can contain placeholders, referring to
 456       any placeholder bound by the \lam{<original expression>} or the
 457       \lam{<context conditions>}. The resulting expression will have those
 458       placeholders replaced by the values bound to them.
 459
 460       Any binder (lowercase) placeholder that has no value bound to it yet will be
 461       bound to (and replaced with) a fresh binder.
 462       \stopdesc
 463
 464       \startdesc{<context additions>}
 465       These are templates for new functions to be added to the context.
 466       This is a way to let a transformation create new top level
 467       functions.
 468
 469       Each addition has the form \lam{binder = template}. As above, any
 470       placeholder in the addition is replaced with the value bound to it, and any
 471       binder placeholder that has no value bound to it yet will be bound to (and
 472       replaced with) a fresh binder.
 473       \stopdesc
 474
 475     To understand this notation better, the step by step application of
 476     the η-abstraction transformation to a simple \small{ALU} will be
 477     shown. Consider η-abstraction, described using above notation as
 478     follows:
 479
 480     \starttrans
 481     E                 \lam{E :: a -> b}
 482     --------------    \lam{E} does not occur on a function position in an application
 483     λx.E x            \lam{E} is not a lambda abstraction.
 484     \stoptrans
 485
 486     η-abstraction is a well known transformation from lambda calculus. What
 487     this transformation does, is take any expression that has a function type
 488     and turn it into a lambda expression (giving an explicit name to the
 489     argument). There are some extra conditions that ensure that this
 490     transformation does not apply infinitely (which are not necessarily part
 491     of the conventional definition of η-abstraction).
 492
 493     Consider the following function, in Core notation, which is a fairly obvious way to specify a
 494     simple \small{ALU} (Note that it is not yet in normal form, but
 495     \in{example}[ex:AddSubAlu] shows the normal form of this function).
 496     The parentheses around the \lam{+} and \lam{-} operators are
 497     commonly used in Haskell to show that the operators are used as
 498     normal functions, instead of \emph{infix} operators (\eg, the
 499     operators appear before their arguments, instead of in between).
 500
 501     \startlambda
 502     alu :: Bit -> Word -> Word -> Word
 503     alu = λopcode. case opcode of
 504       Low -> (+)
 505       High -> (-)
 506     \stoplambda
 507
 508     There are a few subexpressions in this function to which we could possibly
 509     apply the transformation. Since the pattern of the transformation is only
 510     the placeholder \lam{E}, any expression will match that. Whether the
 511     transformation applies to an expression is thus solely decided by the
 512     conditions to the right of the transformation.
 513
 514     We will look at each expression in the function in a top down manner. The
 515     first expression is the entire expression the function is bound to.
 516
 517     \startlambda
 518     λopcode. case opcode of
 519       Low -> (+)
 520       High -> (-)
 521     \stoplambda
 522
 523     As said, the expression pattern matches this. The type of this expression is
 524     \lam{Bit -> Word -> Word -> Word}, which matches \lam{a -> b} (Note that in
 525     this case \lam{a = Bit} and \lam{b = Word -> Word -> Word}).
 526
 527     Since this expression is at top level, it does not occur at a function
 528     position of an application. However, The expression is a lambda abstraction,
 529     so this transformation does not apply.
 530
 531     The next expression we could apply this transformation to, is the body of
 532     the lambda abstraction:
 533
 534     \startlambda
 535     case opcode of
 536       Low -> (+)
 537       High -> (-)
 538     \stoplambda
 539
 540     The type of this expression is \lam{Word -> Word -> Word}, which again
 541     matches \lam{a -> b}. The expression is the body of a lambda expression, so
 542     it does not occur at a function position of an application. Finally, the
 543     expression is not a lambda abstraction but a case expression, so all the
 544     conditions match. There are no context conditions to match, so the
 545     transformation applies.
 546
 547     By now, the placeholder \lam{E} is bound to the entire expression. The
 548     placeholder \lam{x}, which occurs in the replacement template, is not bound
 549     yet, so we need to generate a fresh binder for that. Let's use the binder
 550     \lam{a}. This results in the following replacement expression:
 551
 552     \startlambda
 553     λa.(case opcode of
 554       Low -> (+)
 555       High -> (-)) a
 556     \stoplambda
 557
 558     Continuing with this expression, we see that the transformation does not
 559     apply again (it is a lambda expression). Next we look at the body of this
 560     lambda abstraction:
 561
 562     \startlambda
 563     (case opcode of
 564       Low -> (+)
 565       High -> (-)) a
 566     \stoplambda
 567
 568     Here, the transformation does apply, binding \lam{E} to the entire
 569     expression (which has type \lam{Word -> Word}) and binding \lam{x}
 570     to the fresh binder \lam{b}, resulting in the replacement:
 571
 572     \startlambda
 573     λb.(case opcode of
 574       Low -> (+)
 575       High -> (-)) a b
 576     \stoplambda
 577
 578     The transformation does not apply to this lambda abstraction, so we
 579     look at its body. For brevity, we'll put the case expression on one line from
 580     now on.
 581
 582     \startlambda
 583     (case opcode of Low -> (+); High -> (-)) a b
 584     \stoplambda
 585
 586     The type of this expression is \lam{Word}, so it does not match \lam{a -> b}
 587     and the transformation does not apply. Next, we have two options for the
 588     next expression to look at: The function position and argument position of
 589     the application. The expression in the argument position is \lam{b}, which
 590     has type \lam{Word}, so the transformation does not apply. The expression in
 591     the function position is:
 592
 593     \startlambda
 594     (case opcode of Low -> (+); High -> (-)) a
 595     \stoplambda
 596
 597     Obviously, the transformation does not apply here, since it occurs in
 598     function position (which makes the second condition false). In the same
 599     way the transformation does not apply to both components of this
 600     expression (\lam{case opcode of Low -> (+); High -> (-)} and \lam{a}), so
 601     we'll skip to the components of the case expression: The scrutinee and
 602     both alternatives. Since the opcode is not a function, it does not apply
 603     here.
 604
 605     The first alternative is \lam{(+)}. This expression has a function type
 606     (the operator still needs two arguments). It does not occur in function
 607     position of an application and it is not a lambda expression, so the
 608     transformation applies.
 609
 610     We look at the \lam{<original expression>} pattern, which is \lam{E}.
 611     This means we bind \lam{E} to \lam{(+)}. We then replace the expression
 612     with the \lam{<transformed expression>}, replacing all occurences of
 613     \lam{E} with \lam{(+)}. In the \lam{<transformed expression>}, the This gives us the replacement expression:
 614     \lam{λx.(+) x} (A lambda expression binding \lam{x}, with a body that
 615     applies the addition operator to \lam{x}).
 616
 617     The complete function then becomes:
 618     \startlambda
 619     (case opcode of Low -> λa1.(+) a1; High -> (-)) a
 620     \stoplambda
 621
 622     Now the transformation no longer applies to the complete first alternative
 623     (since it is a lambda expression). It does not apply to the addition
 624     operator again, since it is now in function position in an application. It
 625     does, however, apply to the application of the addition operator, since
 626     that is neither a lambda expression nor does it occur in function
 627     position. This means after one more application of the transformation, the
 628     function becomes:
 629
 630     \startlambda
 631     (case opcode of Low -> λa1.λb1.(+) a1 b1; High -> (-)) a
 632     \stoplambda
 633
 634     The other alternative is left as an exercise to the reader. The final
 635     function, after applying η-abstraction until it does no longer apply is:
 636
 637     \startlambda
 638     alu :: Bit -> Word -> Word -> Word
 639     alu = λopcode.λa.b. (case opcode of
 640       Low -> λa1.λb1 (+) a1 b1
 641       High -> λa2.λb2 (-) a2 b2) a b
 642     \stoplambda
 643
 644     \subsection{Transformation application}
 645       In this chapter we define a number of transformations, but how will we apply
 646       these? As stated before, our normal form is reached as soon as no
 647       transformation applies anymore. This means our application strategy is to
 648       simply apply any transformation that applies, and continuing to do that with
 649       the result of each transformation.
 650
 651       In particular, we define no particular order of transformations. Since
 652       transformation order should not influence the resulting normal form,
 653       this leaves the implementation free to choose any application order that
 654       results in an efficient implementation. Unfortunately this is not
 655       entirely true for the current set of transformations. See
 656       \in{section}[sec:normalization:non-determinism] for a discussion of this
 657       problem.
 658
 659       When applying a single transformation, we try to apply it to every (sub)expression
 660       in a function, not just the top level function body. This allows us to
 661       keep the transformation descriptions concise and powerful.
 662
 663     \subsection{Definitions}
 664       A \emph{global variable} is any variable (binder) that is bound at the
 665       top level of a program, or an external module. A \emph{local variable} is any
 666       other variable (\eg, variables local to a function, which can be bound by
 667       lambda abstractions, let expressions and pattern matches of case
 668       alternatives). This is a slightly different notion of global versus
 669       local than what \small{GHC} uses internally, but for our purposes
 670       the distinction \GHC makes is not useful.
 671       \defref{global variable} \defref{local variable}
 672
 673       A \emph{hardware representable} (or just \emph{representable}) type or value
 674       is (a value of) a type that we can generate a signal for in hardware. For
 675       example, a bit, a vector of bits, a 32 bit unsigned word, etc. Values that are
 676       not runtime representable notably include (but are not limited to): Types,
 677       dictionaries, functions.
 678       \defref{representable}
 679
 680       A \emph{builtin function} is a function supplied by the Cλash framework, whose
 681       implementation is not valid Cλash. The implementation is of course valid
 682       Haskell, for simulation, but it is not expressable in Cλash.
 683       \defref{builtin function} \defref{user-defined function}
 684
 685       For these functions, Cλash has a \emph{builtin hardware translation}, so calls
 686       to these functions can still be translated. These are functions like
 687       \lam{map}, \lam{hwor} and \lam{length}.
 688
 689       A \emph{user-defined} function is a function for which we do have a Cλash
 690       implementation available.
 691
 692       \subsubsection[sec:normalization:predicates]{Predicates}
 693         Here, we define a number of predicates that can be used below to concisely
 694         specify conditions.
 695
 696         \emph{gvar(expr)} is true when \emph{expr} is a variable that references a
 697         global variable. It is false when it references a local variable.
 698
 699         \emph{lvar(expr)} is the complement of \emph{gvar}; it is true when \emph{expr}
 700         references a local variable, false when it references a global variable.
 701
 702         \emph{representable(expr)} is true when \emph{expr} is \emph{representable}.
 703
 704     \subsection[sec:normalization:uniq]{Binder uniqueness}
 705       A common problem in transformation systems, is binder uniqueness. When not
 706       considering this problem, it is easy to create transformations that mix up
 707       bindings and cause name collisions. Take for example, the following core
 708       expression:
 709
 710       \startlambda
 711       (λa.λb.λc. a * b * c) x c
 712       \stoplambda
 713
 714       By applying β-reduction (see \in{section}[sec:normalization:beta]) once,
 715       we can simplify this expression to:
 716
 717       \startlambda
 718       (λb.λc. x * b * c) c
 719       \stoplambda
 720
 721       Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
 722       binder. No harm done here. But note that we see multiple occurences of the
 723       \lam{c} binder. The first is a binding occurence, to which the second refers.
 724       The last, however refers to \emph{another} instance of \lam{c}, which is
 725       bound somewhere outside of this expression. Now, if we would apply beta
 726       reduction without taking heed of binder uniqueness, we would get:
 727
 728       \startlambda
 729       λc. x * c * c
 730       \stoplambda
 731
 732       This is obviously not what was supposed to happen! The root of this problem is
 733       the reuse of binders: Identical binders can be bound in different,
 734       but overlapping scopes. Any variable reference in those
 735       overlapping scopes then refers to the variable bound in the inner
 736       (smallest) scope. There is not way to refer to the variable in the
 737       outer scope. This effect is usually referred to as
 738       \emph{shadowing}: When a binder is bound in a scope where the
 739       binder already had a value, the inner binding is said to
 740       \emph{shadow} the outer binding. In the example above, the \lam{c}
 741       binder was bound outside of the expression and in the inner lambda
 742       expression. Inside that lambda expression, only the inner \lam{c}
 743       can be accessed.
 744
 745       There are a number of ways to solve this. \small{GHC} has isolated this
 746       problem to their binder substitution code, which performs \emph{deshadowing}
 747       during its expression traversal. This means that any binding that shadows
 748       another binding on a higher level is replaced by a new binder that does not
 749       shadow any other binding. This non-shadowing invariant is enough to prevent
 750       binder uniqueness problems in \small{GHC}.
 751
 752       In our transformation system, maintaining this non-shadowing invariant is
 753       a bit harder to do (mostly due to implementation issues, the prototype doesn't
 754       use \small{GHC}'s subsitution code). Also, the following points can be
 755       observed.
 756
 757       \startitemize
 758       \item Deshadowing does not guarantee overall uniqueness. For example, the
 759       following (slightly contrived) expression shows the identifier \lam{x} bound in
 760       two seperate places (and to different values), even though no shadowing
 761       occurs.
 762
 763       \startlambda
 764       (let x = 1 in x) + (let x = 2 in x)
 765       \stoplambda
 766
 767       \item In our normal form (and the resulting \small{VHDL}), all binders
 768       (signals) within the same function (entity) will end up in the same
 769       scope. To allow this, all binders within the same function should be
 770       unique.
 771
 772       \item When we know that all binders in an expression are unique, moving around
 773       or removing a subexpression will never cause any binder conflicts. If we have
 774       some way to generate fresh binders, introducing new subexpressions will not
 775       cause any problems either. The only way to cause conflicts is thus to
 776       duplicate an existing subexpression.
 777       \stopitemize
 778
 779       Given the above, our prototype maintains a unique binder invariant. This
 780       means that in any given moment during normalization, all binders \emph{within
 781       a single function} must be unique. To achieve this, we apply the following
 782       technique.
 783
 784       \todo{Define fresh binders and unique supplies}
 785
 786       \startitemize
 787       \item Before starting normalization, all binders in the function are made
 788       unique. This is done by generating a fresh binder for every binder used. This
 789       also replaces binders that did not cause any conflict, but it does ensure that
 790       all binders within the function are generated by the same unique supply.
 791       \refdef{fresh binder}
 792       \item Whenever a new binder must be generated, we generate a fresh binder that
 793       is guaranteed to be different from \emph{all binders generated so far}. This
 794       can thus never introduce duplication and will maintain the invariant.
 795       \item Whenever (a part of) an expression is duplicated (for example when
 796       inlining), all binders in the expression are replaced with fresh binders
 797       (using the same method as at the start of normalization). These fresh binders
 798       can never introduce duplication, so this will maintain the invariant.
 799       \item Whenever we move part of an expression around within the function, there
 800       is no need to do anything special. There is obviously no way to introduce
 801       duplication by moving expressions around. Since we know that each of the
 802       binders is already unique, there is no way to introduce (incorrect) shadowing
 803       either.
 804       \stopitemize
 805
 806   \section{Transform passes}
 807     In this section we describe the actual transforms.
 808
 809     Each transformation will be described informally first, explaining
 810     the need for and goal of the transformation. Then, we will formally define
 811     the transformation using the syntax introduced in
 812     \in{section}[sec:normalization:transformation].
 813
 814     \subsection{General cleanup}
 815       These transformations are general cleanup transformations, that aim to
 816       make expressions simpler. These transformations usually clean up the
 817        mess left behind by other transformations or clean up expressions to
 818        expose new transformation opportunities for other transformations.
 819
 820        Most of these transformations are standard optimizations in other
 821        compilers as well. However, in our compiler, most of these are not just
 822        optimizations, but they are required to get our program into intended
 823        normal form.
 824
 825         \placeintermezzo{}{
 826           \defref{substitution notation}
 827           \startframedtext[width=8cm,background=box,frame=no]
 828           \startalignment[center]
 829             {\tfa Substitution notation}
 830           \stopalignment
 831           \blank[medium]
 832
 833           In some of the transformations in this chapter, we need to perform
 834           substitution on an expression. Substitution means replacing every
 835           occurence of some expression (usually a variable reference) with
 836           another expression.
 837
 838           There have been a lot of different notations used in literature for
 839           specifying substitution. The notation that will be used in this report
 840           is the following:
 841
 842           \startlambda
 843             E[A=>B]
 844           \stoplambda
 845
 846           This means expression \lam{E} with all occurences of \lam{A} replaced
 847           with \lam{B}.
 848           \stopframedtext
 849         }
 850
 851       \subsubsection[sec:normalization:beta]{β-reduction}
 852         β-reduction is a well known transformation from lambda calculus, where it is
 853         the main reduction step. It reduces applications of lambda abstractions,
 854         removing both the lambda abstraction and the application.
 855
 856         In our transformation system, this step helps to remove unwanted lambda
 857         abstractions (basically all but the ones at the top level). Other
 858         transformations (application propagation, non-representable inlining) make
 859         sure that most lambda abstractions will eventually be reducable by
 860         β-reduction.
 861
 862         Note that β-reduction also works on type lambda abstractions and type
 863         applications as well. This means the substitution below also works on
 864         type variables, in the case that the binder is a type variable and teh
 865         expression applied to is a type.
 866
 867         \starttrans
 868         (λx.E) M
 869         -----------------
 870         E[x=>M]
 871         \stoptrans
 872
 873         % And an example
 874         \startbuffer[from]
 875         (λa. 2 * a) (2 * b)
 876         \stopbuffer
 877
 878         \startbuffer[to]
 879         2 * (2 * b)
 880         \stopbuffer
 881
 882         \transexample{beta}{β-reduction}{from}{to}
 883
 884         \startbuffer[from]
 885         (λt.λa::t. a) @Int
 886         \stopbuffer
 887
 888         \startbuffer[to]
 889         (λa::Int. a)
 890         \stopbuffer
 891
 892         \transexample{beta-type}{β-reduction for type abstractions}{from}{to}
 893
 894       \subsubsection{Empty let removal}
 895         This transformation is simple: It removes recursive lets that have no bindings
 896         (which usually occurs when unused let binding removal removes the last
 897         binding from it).
 898
 899         Note that there is no need to define this transformation for
 900         non-recursive lets, since they always contain exactly one binding.
 901
 902         \starttrans
 903         letrec in M
 904         --------------
 905         M
 906         \stoptrans
 907
 908         \todo{Example}
 909
 910       \subsubsection[sec:normalization:simplelet]{Simple let binding removal}
 911         This transformation inlines simple let bindings, that bind some
 912         binder to some other binder instead of a more complex expression (\ie
 913         a = b).
 914
 915         This transformation is not needed to get an expression into intended
 916         normal form (since these bindings are part of the intended normal
 917         form), but makes the resulting \small{VHDL} a lot shorter.
 918
 919         \refdef{substitution notation}
 920         \starttrans
 921         letrec
 922           a0 = E0
 923           \vdots
 924           ai = b
 925           \vdots
 926           an = En
 927         in
 928           M
 929         -----------------------------  \lam{b} is a variable reference
 930         letrec                         \lam{ai} ≠ \lam{b}
 931           a0 = E0 [ai=>b]
 932           \vdots
 933           ai-1 = Ei-1 [ai=>b]
 934           ai+1 = Ei+1 [ai=>b]
 935           \vdots
 936           an = En [ai=>b]
 937         in
 938           M[ai=>b]
 939         \stoptrans
 940
 941         \todo{example}
 942
 943       \subsubsection{Unused let binding removal}
 944         This transformation removes let bindings that are never used.
 945         Occasionally, \GHC's desugarer introduces some unused let bindings.
 946
 947         This normalization pass should really be not be necessary to get
 948         into intended normal form (since the intended normal form
 949         definition \refdef{intended normal form definition} does not
 950         require that every binding is used), but in practice the
 951         desugarer or simplifier emits some bindings that cannot be
 952         normalized (e.g., calls to a
 953         \hs{Control.Exception.Base.patError}) but are not used anywhere
 954         either. To prevent the \VHDL generation from breaking on these
 955         artifacts, this transformation removes them.
 956
 957         \todo{Don't use old-style numerals in transformations}
 958         \starttrans
 959         letrec
 960           a0 = E0
 961           \vdots
 962           ai = Ei
 963           \vdots
 964           an = En
 965         in
 966           M                             \lam{ai} does not occur free in \lam{M}
 967         ----------------------------    \forall j, 0 ≤ j ≤ n, j ≠ i (\lam{ai} does not occur free in \lam{Ej})
 968         letrec
 969           a0 = E0
 970           \vdots
 971           ai-1 = Ei-1
 972           ai+1 = Ei+1
 973           \vdots
 974           an = En
 975         in
 976           M
 977         \stoptrans
 978
 979         \todo{Example}
 980
 981       \subsubsection{Cast propagation / simplification}
 982         This transform pushes casts down into the expression as far as
 983         possible. This transformation has been added to make a few
 984         specific corner cases work, but it is not clear yet if this
 985         transformation handles cast expressions completely or in the
 986         right way. See \in{section}[sec:normalization:castproblems].
 987
 988         \starttrans
 989         (let binds in E) ▶ T
 990         -------------------------
 991         let binds in (E ▶ T)
 992         \stoptrans
 993
 994         \starttrans
 995         (case S of
 996           p0 -> E0
 997           \vdots
 998           pn -> En
 999         ) ▶ T
1000         -------------------------
1001         case S of
1002           p0 -> E0 ▶ T
1003           \vdots
1004           pn -> En ▶ T
1005         \stoptrans
1006
1007       \subsubsection{Top level binding inlining}
1008         \refdef{top level binding}
1009         This transform takes simple top level bindings generated by the
1010         \small{GHC} compiler. \small{GHC} sometimes generates very simple
1011         \quote{wrapper} bindings, which are bound to just a variable
1012         reference, or contain just a (partial) function appliation with
1013         the type and dictionary arguments filled in (such as the
1014         \lam{(+)} in the example below).
1015
1016         Note that this transformation is completely optional. It is not
1017         required to get any function into intended normal form, but it does help making
1018         the resulting VHDL output easier to read (since it removes a bunch of
1019         components that are really boring).
1020
1021         This transform takes any top level binding generated by \GHC,
1022         whose normalized form contains only a single let binding.
1023
1024         \starttrans
1025         x = λa0 ... λan.let y = E in y
1026         ~
1027         x
1028         --------------------------------------         \lam{x} is generated by the compiler
1029         λa0 ... λan.let y = E in y
1030         \stoptrans
1031
1032         \startbuffer[from]
1033         (+) :: Word -> Word -> Word
1034         (+) = GHC.Num.(+) @Word \$dNum
1035         ~
1036         (+) a b
1037         \stopbuffer
1038         \startbuffer[to]
1039         GHC.Num.(+) @ Alu.Word \$dNum a b
1040         \stopbuffer
1041
1042         \transexample{toplevelinline}{Top level binding inlining}{from}{to}
1043
1044         \in{Example}[ex:trans:toplevelinline] shows a typical application of
1045         the addition operator generated by \GHC. The type and dictionary
1046         arguments used here are described in
1047         \in{Section}[section:prototype:polymorphism].
1048
1049         Without this transformation, there would be a \lam{(+)} entity
1050         in the \VHDL which would just add its inputs. This generates a
1051         lot of overhead in the \VHDL, which is particularly annoying
1052         when browsing the generated RTL schematic (especially since most
1053         non-alphanumerics, like all characters in \lam{(+)}, are not
1054         allowed in \VHDL architecture names\footnote{Technically, it is
1055         allowed to use non-alphanumerics when using extended
1056         identifiers, but it seems that none of the tooling likes
1057         extended identifiers in filenames, so it effectively doesn't
1058         work.}, so the entity would be called \quote{w7aA7f} or
1059         something similarly unreadable and autogenerated).
1060
1061     \subsection{Program structure}
1062       These transformations are aimed at normalizing the overall structure
1063       into the intended form. This means ensuring there is a lambda abstraction
1064       at the top for every argument (input port or current state), putting all
1065       of the other value definitions in let bindings and making the final
1066       return value a simple variable reference.
1067
1068       \subsubsection[sec:normalization:eta]{η-abstraction}
1069         This transformation makes sure that all arguments of a function-typed
1070         expression are named, by introducing lambda expressions. When combined with
1071         β-reduction and non-representable binding inlining, all function-typed
1072         expressions should be lambda abstractions or global identifiers.
1073
1074         \starttrans
1075         E                 \lam{E :: a -> b}
1076         --------------    \lam{E} does not occur on a function position in an application
1077         λx.E x            \lam{E} is not a lambda abstraction.
1078         \stoptrans
1079
1080         \startbuffer[from]
1081         foo = λa.case a of
1082           True -> λb.mul b b
1083           False -> id
1084         \stopbuffer
1085
1086         \startbuffer[to]
1087         foo = λa.λx.(case a of
1088             True -> λb.mul b b
1089             False -> λy.id y) x
1090         \stopbuffer
1091
1092         \transexample{eta}{η-abstraction}{from}{to}
1093
1094       \subsubsection[sec:normalization:appprop]{Application propagation}
1095         This transformation is meant to propagate application expressions downwards
1096         into expressions as far as possible. This allows partial applications inside
1097         expressions to become fully applied and exposes new transformation
1098         opportunities for other transformations (like β-reduction and
1099         specialization).
1100
1101         Since all binders in our expression are unique (see
1102         \in{section}[sec:normalization:uniq]), there is no risk that we will
1103         introduce unintended shadowing by moving an expression into a lower
1104         scope. Also, since only move expression into smaller scopes (down into
1105         our expression), there is no risk of moving a variable reference out
1106         of the scope in which it is defined.
1107
1108         \starttrans
1109         (letrec binds in E) M
1110         ------------------------
1111         letrec binds in E M
1112         \stoptrans
1113
1114         % And an example
1115         \startbuffer[from]
1116         ( letrec
1117             val = 1
1118           in
1119             add val
1120         ) 3
1121         \stopbuffer
1122
1123         \startbuffer[to]
1124         letrec
1125           val = 1
1126         in
1127           add val 3
1128         \stopbuffer
1129
1130         \transexample{appproplet}{Application propagation for a let expression}{from}{to}
1131
1132         \starttrans
1133         (case x of
1134           p0 -> E0
1135           \vdots
1136           pn -> En) M
1137         -----------------
1138         case x of
1139           p0 -> E0 M
1140           \vdots
1141           pn -> En M
1142         \stoptrans
1143
1144         % And an example
1145         \startbuffer[from]
1146         ( case x of
1147             True -> id
1148             False -> neg
1149         ) 1
1150         \stopbuffer
1151
1152         \startbuffer[to]
1153         case x of
1154           True -> id 1
1155           False -> neg 1
1156         \stopbuffer
1157
1158         \transexample{apppropcase}{Application propagation for a case expression}{from}{to}
1159
1160       \subsubsection[sec:normalization:letrecurse]{Let recursification}
1161         This transformation makes all non-recursive lets recursive. In the
1162         end, we want a single recursive let in our normalized program, so all
1163         non-recursive lets can be converted. This also makes other
1164         transformations simpler: They can simply assume all lets are
1165         recursive.
1166
1167         \starttrans
1168         let
1169           a = E
1170         in
1171           M
1172         ------------------------------------------
1173         letrec
1174           a = E
1175         in
1176           M
1177         \stoptrans
1178
1179       \subsubsection{Let flattening}
1180         This transformation puts nested lets in the same scope, by lifting the
1181         binding(s) of the inner let into the outer let. Eventually, this will
1182         cause all let bindings to appear in the same scope.
1183
1184         This transformation only applies to recursive lets, since all
1185         non-recursive lets will be made recursive (see
1186         \in{section}[sec:normalization:letrecurse]).
1187
1188         Since we are joining two scopes together, there is no risk of moving a
1189         variable reference out of the scope where it is defined.
1190
1191         \starttrans
1192         letrec
1193           a0 = E0
1194           \vdots
1195           ai = (letrec bindings in M)
1196           \vdots
1197           an = En
1198         in
1199           N
1200         ------------------------------------------
1201         letrec
1202           a0 = E0
1203           \vdots
1204           ai = M
1205           \vdots
1206           an = En
1207           bindings
1208         in
1209           N
1210         \stoptrans
1211
1212         \startbuffer[from]
1213         letrec
1214           a = 1
1215           b = letrec
1216             x = a
1217             y = c
1218           in
1219             x + y
1220           c = 2
1221         in
1222           b
1223         \stopbuffer
1224         \startbuffer[to]
1225         letrec
1226           a = 1
1227           b = x + y
1228           c = 2
1229           x = a
1230           y = c
1231         in
1232           b
1233         \stopbuffer
1234
1235         \transexample{letflat}{Let flattening}{from}{to}
1236
1237       \subsubsection{Return value simplification}
1238         This transformation ensures that the return value of a function is always a
1239         simple local variable reference.
1240
1241         This transformation only applies to the entire body of a
1242         function instead of any subexpression in a function. This is
1243         achieved by the contexts, like \lam{x = E}, though this is
1244         strictly not correct (you could read this as "if there is any
1245         function \lam{x} that binds \lam{E}, any \lam{E} can be
1246         transformed, while we only mean the \lam{E} that is bound by
1247         \lam{x}).
1248
1249         Note that the return value is not simplified if its not
1250         representable.  Otherwise, this would cause a direct loop with
1251         the inlining of unrepresentable bindings. If the return value is
1252         not representable because it has a function type, η-abstraction
1253         should make sure that this transformation will eventually apply.
1254         If the value is not representable for other reasons, the
1255         function result itself is not representable, meaning this
1256         function is not translatable anyway.
1257
1258         \starttrans
1259         x = E                            \lam{E} is representable
1260         ~                                \lam{E} is not a lambda abstraction
1261         E                                \lam{E} is not a let expression
1262         ---------------------------      \lam{E} is not a local variable reference
1263         letrec x = E in x
1264         \stoptrans
1265
1266         \starttrans
1267         x = λv0 ... λvn.E
1268         ~                                \lam{E} is representable
1269         E                                \lam{E} is not a let expression
1270         ---------------------------      \lam{E} is not a local variable reference
1271         letrec x = E in x
1272         \stoptrans
1273
1274         \starttrans
1275         x = λv0 ... λvn.let ... in E
1276         ~                                \lam{E} is representable
1277         E                                \lam{E} is not a local variable reference
1278         -----------------------------
1279         letrec x = E in x
1280         \stoptrans
1281
1282         \startbuffer[from]
1283         x = add 1 2
1284         \stopbuffer
1285
1286         \startbuffer[to]
1287         x = letrec x = add 1 2 in x
1288         \stopbuffer
1289
1290         \transexample{retvalsimpl}{Return value simplification}{from}{to}
1291
1292         \todo{More examples}
1293
1294     \subsection[sec:normalization:argsimpl]{Representable arguments simplification}
1295       This section contains just a single transformation that deals with
1296       representable arguments in applications. Non-representable arguments are
1297       handled by the transformations in
1298       \in{section}[sec:normalization:nonrep].
1299
1300       This transformation ensures that all representable arguments will become
1301       references to local variables. This ensures they will become references
1302       to local signals in the resulting \small{VHDL}, which is required due to
1303       limitations in the component instantiation code in \VHDL (one can only
1304       assign a signal or constant to an input port). By ensuring that all
1305       arguments are always simple variable references, we always have a signal
1306       available to map to the input ports.
1307
1308       To reduce a complex expression to a simple variable reference, we create
1309       a new let expression around the application, which binds the complex
1310       expression to a new variable. The original function is then applied to
1311       this variable.
1312
1313       \refdef{global variable}
1314       Note that references to \emph{global variables} (like a top level
1315       function without arguments, but also an argumentless dataconstructors
1316       like \lam{True}) are also simplified. Only local variables generate
1317       signals in the resulting architecture. Even though argumentless
1318       dataconstructors generate constants in generated \VHDL code and could be
1319       mapped to an input port directly, they are still simplified to make the
1320       normal form more regular.
1321
1322       \refdef{representable}
1323       \starttrans
1324       M N
1325       --------------------    \lam{N} is representable
1326       letrec x = N in M x     \lam{N} is not a local variable reference
1327       \stoptrans
1328       \refdef{local variable}
1329
1330       \startbuffer[from]
1331       add (add a 1) 1
1332       \stopbuffer
1333
1334       \startbuffer[to]
1335       letrec x = add a 1 in add x 1
1336       \stopbuffer
1337
1338       \transexample{argsimpl}{Argument simplification}{from}{to}
1339
1340     \subsection[sec:normalization:builtins]{Builtin functions}
1341       This section deals with (arguments to) builtin functions.  In the
1342       intended normal form definition\refdef{intended normal form definition}
1343       we can see that there are three sorts of arguments a builtin function
1344       can receive.
1345
1346       \startitemize[KR]
1347         \item A representable local variable reference. This is the most
1348         common argument to any function. The argument simplification
1349         transformation described in \in{section}[sec:normalization:argsimpl]
1350         makes sure that \emph{any} representable argument to \emph{any}
1351         function (including builtin functions) is turned into a local variable
1352         reference.
1353         \item (A partial application of) a top level function (either builtin on
1354         user-defined). The function extraction transformation described in
1355         this section takes care of turning every functiontyped argument into
1356         (a partial application of) a top level function.
1357         \item Any expression that is not representable and does not have a
1358         function type. Since these can be any expression, there is no
1359         transformation needed. Note that this category is exactly all
1360         expressions that are not transformed by the transformations for the
1361         previous two categories. This means that \emph{any} core expression
1362         that is used as an argument to a builtin function will be either
1363         transformed into one of the above categories, or end up in this
1364         categorie. In any case, the result is in normal form.
1365       \stopitemize
1366
1367       As noted, the argument simplification will handle any representable
1368       arguments to a builtin function. The following transformation is needed
1369       to handle non-representable arguments with a function type, all other
1370       non-representable arguments don't need any special handling.
1371
1372       \subsubsection[sec:normalization:funextract]{Function extraction}
1373         This transform deals with function-typed arguments to builtin
1374         functions.
1375         Since builtin functions cannot be specialized (see
1376         \in{section}[sec:normalization:specialize]) to remove the arguments,
1377         these arguments are extracted into a new global function instead. In
1378         other words, we create a new top level function that has exactly the
1379         extracted argument as its body. This greatly simplifies the
1380         translation rules needed for builtin functions, since they only need
1381         to handle (partial applications of) top level functions.
1382
1383         Any free variables occuring in the extracted arguments will become
1384         parameters to the new global function. The original argument is replaced
1385         with a reference to the new function, applied to any free variables from
1386         the original argument.
1387
1388         This transformation is useful when applying higher order builtin functions
1389         like \hs{map} to a lambda abstraction, for example. In this case, the code
1390         that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
1391         partial applications, not any other expression (such as lambda abstractions or
1392         even more complicated expressions).
1393
1394         \starttrans
1395         M N                     \lam{M} is (a partial aplication of) a builtin function.
1396         ---------------------   \lam{f0 ... fn} are all free local variables of \lam{N}
1397         M (x f0 ... fn)         \lam{N :: a -> b}
1398         ~                       \lam{N} is not a (partial application of) a top level function
1399         x = λf0 ... λfn.N
1400         \stoptrans
1401
1402         \startbuffer[from]
1403         addList = λb.λxs.map (λa . add a b) xs
1404         \stopbuffer
1405
1406         \startbuffer[to]
1407         addList = λb.λxs.map (f b) xs
1408         ~
1409         f = λb.λa.add a b
1410         \stopbuffer
1411
1412         \transexample{funextract}{Function extraction}{from}{to}
1413
1414         Note that the function \lam{f} will still need normalization after
1415         this.
1416
1417     \subsection{Case normalisation}
1418       \subsubsection{Scrutinee simplification}
1419         This transform ensures that the scrutinee of a case expression is always
1420         a simple variable reference.
1421
1422         \starttrans
1423         case E of
1424           alts
1425         -----------------        \lam{E} is not a local variable reference
1426         letrec x = E in
1427           case x of
1428             alts
1429         \stoptrans
1430
1431         \startbuffer[from]
1432         case (foo a) of
1433           True -> a
1434           False -> b
1435         \stopbuffer
1436
1437         \startbuffer[to]
1438         letrec x = foo a in
1439           case x of
1440             True -> a
1441             False -> b
1442         \stopbuffer
1443
1444         \transexample{letflat}{Case normalisation}{from}{to}
1445
1446
1447       \subsubsection{Case normalization}
1448         This transformation ensures that all case expressions get a form
1449         that is allowed by the intended normal form. This means they
1450         will become one of: \refdef{intended normal form definition}
1451         \startitemize
1452         \item An extractor case with a single alternative that picks a field
1453         from a datatype, \eg \lam{case x of (a, b) -> a}.
1454         \item A selector case with multiple alternatives and only wild binders, that
1455         makes a choice between expressions based on the constructor of another
1456         expression, \eg \lam{case x of Low -> a; High -> b}.
1457         \stopitemize
1458
1459         For an arbitrary case, that has \lam{n} alternatives, with
1460         \lam{m} binders in each alternatives, this will result in \lam{m
1461         * n} extractor case expression to get at each variable, \lam{n}
1462         let bindings for each of the alternatives' value and a single
1463         selector case to select the right value out of these.
1464
1465         Technically, the defintion of this transformation would require
1466         that the constructor for every alternative has exactly the same
1467         amount (\lam{m}) of arguments, but of course this transformation
1468         also applies when this is not the case.
1469
1470         \starttrans
1471         case E of
1472           C0 v0,0 ... v0,m -> E0
1473           \vdots
1474           Cn vn,0 ... vn,m -> En
1475         --------------------------------------------------- \forall i \forall j, 0 ≤ i ≤ n, 0 ≤ i < m (\lam{wi,j} is a wild (unused) binder)
1476         letrec                                              The case expression is not an extractor case
1477           v0,0 = case E of C0 x0,0 .. x0,m -> x0,0          The case expression is not a selector case
1478           \vdots
1479           v0,m = case E of C0 x0,0 .. x0,m -> x0,m
1480           \vdots
1481           vn,m = case E of Cn xn,0 .. xn,m -> xn,m
1482           y0 = E0
1483           \vdots
1484           yn = En
1485         in
1486           case E of
1487             C0 w0,0 ... w0,m -> y0
1488             \vdots
1489             Cn wn,0 ... wn,m -> yn
1490         \stoptrans
1491
1492         \refdef{wild binder}
1493         Note that this transformation applies to case expressions with any
1494         scrutinee. If the scrutinee is a complex expression, this might
1495         result in duplication of work (hardware). An extra condition to
1496         only apply this transformation when the scrutinee is already
1497         simple (effectively causing this transformation to be only
1498         applied after the scrutinee simplification transformation) might
1499         be in order.
1500
1501         \startbuffer[from]
1502         case a of
1503           True -> add b 1
1504           False -> add b 2
1505         \stopbuffer
1506
1507         \startbuffer[to]
1508         letrec
1509           x0 = add b 1
1510           x1 = add b 2
1511         in
1512           case a of
1513             True -> x0
1514             False -> x1
1515         \stopbuffer
1516
1517         \transexample{selcasesimpl}{Selector case simplification}{from}{to}
1518
1519         \startbuffer[from]
1520         case a of
1521           (,) b c -> add b c
1522         \stopbuffer
1523         \startbuffer[to]
1524         letrec
1525           b = case a of (,) b c -> b
1526           c = case a of (,) b c -> c
1527           x0 = add b c
1528         in
1529           case a of
1530             (,) w0 w1 -> x0
1531         \stopbuffer
1532
1533         \transexample{excasesimpl}{Extractor case simplification}{from}{to}
1534
1535         \refdef{selector case}
1536         In \in{example}[ex:trans:excasesimpl] the case expression is expanded
1537         into multiple case expressions, including a pretty useless expression
1538         (that is neither a selector or extractor case). This case can be
1539         removed by the Case removal transformation in
1540         \in{section}[sec:transformation:caseremoval].
1541
1542       \subsubsection[sec:transformation:caseremoval]{Case removal}
1543         This transform removes any case expression with a single alternative and
1544         only wild binders.\refdef{wild binder}
1545
1546         These "useless" case expressions are usually leftovers from case simplification
1547         on extractor case (see the previous example).
1548
1549         \starttrans
1550         case x of
1551           C v0 ... vm -> E
1552         ----------------------     \lam{\forall i, 0 ≤ i ≤ m} (\lam{vi} does not occur free in E)
1553         E
1554         \stoptrans
1555
1556         \startbuffer[from]
1557         case a of
1558           (,) w0 w1 -> x0
1559         \stopbuffer
1560
1561         \startbuffer[to]
1562         x0
1563         \stopbuffer
1564
1565         \transexample{caserem}{Case removal}{from}{to}
1566
1567     \subsection[sec:normalization:nonrep]{Removing unrepresentable values}
1568       The transformations in this section are aimed at making all the
1569       values used in our expression representable. There are two main
1570       transformations that are applied to \emph{all} unrepresentable let
1571       bindings and function arguments. These are meant to address three
1572       different kinds of unrepresentable values: Polymorphic values, higher
1573       order values and literals. The transformation are described generically:
1574       They apply to all non-representable values. However, non-representable
1575       values that don't fall into one of these three categories will be moved
1576       around by these transformations but are unlikely to completely
1577       disappear. They usually mean the program was not valid in the first
1578       place, because unsupported types were used (for example, a program using
1579       strings).
1580
1581       Each of these three categories will be detailed below, followed by the
1582       actual transformations.
1583
1584       \subsubsection{Removing Polymorphism}
1585         As noted in \in{section}[sec:prototype:polymporphism],
1586         polymorphism is made explicit in Core through type and
1587         dictionary arguments. To remove the polymorphism from a
1588         function, we can simply specialize the polymorphic function for
1589         the particular type applied to it. The same goes for dictionary
1590         arguments. To remove polymorphism from let bound values, we
1591         simply inline the let bindings that have a polymorphic type,
1592         which should (eventually) make sure that the polymorphic
1593         expression is applied to a type and/or dictionary, which can
1594         then be removed by β-reduction (\in{section}[sec:normalization:beta]).
1595
1596         Since both type and dictionary arguments are not representable,
1597         \refdef{representable}
1598         the non-representable argument specialization and
1599         non-representable let binding inlining transformations below
1600         take care of exactly this.
1601
1602         There is one case where polymorphism cannot be completely
1603         removed: Builtin functions are still allowed to be polymorphic
1604         (Since we have no function body that we could properly
1605         specialize). However, the code that generates \VHDL for builtin
1606         functions knows how to handle this, so this is not a problem.
1607
1608       \subsubsection{Defunctionalization}
1609         These transformations remove higher order expressions from our
1610         program, making all values first-order.
1611
1612         Higher order values are always introduced by lambda abstractions, none
1613         of the other Core expression elements can introduce a function type.
1614         However, other expressions can \emph{have} a function type, when they
1615         have a lambda expression in their body.
1616
1617         For example, the following expression is a higher order expression
1618         that is not a lambda expression itself:
1619
1620         \refdef{id function}
1621         \startlambda
1622           case x of
1623             High -> id
1624             Low -> λx.x
1625         \stoplambda
1626
1627         The reference to the \lam{id} function shows that we can introduce a
1628         higher order expression in our program without using a lambda
1629         expression directly. However, inside the definition of the \lam{id}
1630         function, we can be sure that a lambda expression is present.
1631
1632         Looking closely at the definition of our normal form in
1633         \in{section}[sec:normalization:intendednormalform], we can see that
1634         there are three possibilities for higher order values to appear in our
1635         intended normal form:
1636
1637         \startitemize[KR]
1638           \item[item:toplambda] Lambda abstractions can appear at the highest level of a
1639           top level function. These lambda abstractions introduce the
1640           arguments (input ports / current state) of the function.
1641           \item[item:builtinarg] (Partial applications of) top level functions can appear as an
1642           argument to a builtin function.
1643           \item[item:completeapp] (Partial applications of) top level functions can appear in
1644           function position of an application. Since a partial application
1645           cannot appear anywhere else (except as builtin function arguments),
1646           all partial applications are applied, meaning that all applications
1647           will become complete applications. However, since application of
1648           arguments happens one by one, in the expression:
1649           \startlambda
1650             f 1 2
1651           \stoplambda
1652           the subexpression \lam{f 1} has a function type. But this is
1653           allowed, since it is inside a complete application.
1654         \stopitemize
1655
1656         We will take a typical function with some higher order values as an
1657         example. The following function takes two arguments: a \lam{Bit} and a
1658         list of numbers. Depending on the first argument, each number in the
1659         list is doubled, or the list is returned unmodified. For the sake of
1660         the example, no polymorphism is shown. In reality, at least map would
1661         be polymorphic.
1662
1663         \startlambda
1664         λy.let double = λx. x + x in
1665              case y of
1666                 Low -> map double
1667                 High -> λz. z
1668         \stoplambda
1669
1670         This example shows a number of higher order values that we cannot
1671         translate to \VHDL directly. The \lam{double} binder bound in the let
1672         expression has a function type, as well as both of the alternatives of
1673         the case expression. The first alternative is a partial application of
1674         the \lam{map} builtin function, whereas the second alternative is a
1675         lambda abstraction.
1676
1677         To reduce all higher order values to one of the above items, a number
1678         of transformations we've already seen are used. The η-abstraction
1679         transformation from \in{section}[sec:normalization:eta] ensures all
1680         function arguments are introduced by lambda abstraction on the highest
1681         level of a function. These lambda arguments are allowed because of
1682         \in{item}[item:toplambda] above. After η-abstraction, our example
1683         becomes a bit bigger:
1684
1685         \startlambda
1686         λy.λq.(let double = λx. x + x in
1687                  case y of
1688                    Low -> map double
1689                    High -> λz. z
1690               ) q
1691         \stoplambda
1692
1693         η-abstraction also introduces extra applications (the application of
1694         the let expression to \lam{q} in the above example). These
1695         applications can then propagated down by the application propagation
1696         transformation (\in{section}[sec:normalization:appprop]). In our
1697         example, the \lam{q} and \lam{r} variable will be propagated into the
1698         let expression and then into the case expression:
1699
1700         \startlambda
1701         λy.λq.let double = λx. x + x in
1702                 case y of
1703                   Low -> map double q
1704                   High -> (λz. z) q
1705         \stoplambda
1706
1707         This propagation makes higher order values become applied (in
1708         particular both of the alternatives of the case now have a
1709         representable type). Completely applied top level functions (like the
1710         first alternative) are now no longer invalid (they fall under
1711         \in{item}[item:completeapp] above). (Completely) applied lambda
1712         abstractions can be removed by β-abstraction. For our example,
1713         applying β-abstraction results in the following:
1714
1715         \startlambda
1716         λy.λq.let double = λx. x + x in
1717                 case y of
1718                   Low -> map double q
1719                   High -> q
1720         \stoplambda
1721
1722         As you can see in our example, all of this moves applications towards
1723         the higher order values, but misses higher order functions bound by
1724         let expressions. The applications cannot be moved towards these values
1725         (since they can be used in multiple places), so the values will have
1726         to be moved towards the applications. This is achieved by inlining all
1727         higher order values bound by let applications, by the
1728         non-representable binding inlining transformation below. When applying
1729         it to our example, we get the following:
1730
1731         \startlambda
1732         λy.λq.case y of
1733                 Low -> map (λx. x + x) q
1734                 High -> q
1735         \stoplambda
1736
1737         We've nearly eliminated all unsupported higher order values from this
1738         expressions. The one that's remaining is the first argument to the
1739         \lam{map} function. Having higher order arguments to a builtin
1740         function like \lam{map} is allowed in the intended normal form, but
1741         only if the argument is a (partial application) of a top level
1742         function. This is easily done by introducing a new top level function
1743         and put the lambda abstraction inside. This is done by the function
1744         extraction transformation from
1745         \in{section}[sec:normalization:funextract].
1746
1747         \startlambda
1748         λy.λq.case y of
1749                 Low -> map func q
1750                 High -> q
1751         \stoplambda
1752
1753         This also introduces a new function, that we have called \lam{func}:
1754
1755         \startlambda
1756         func = λx. x + x
1757         \stoplambda
1758
1759         Note that this does not actually remove the lambda, but now it is a
1760         lambda at the highest level of a function, which is allowed in the
1761         intended normal form.
1762
1763         There is one case that has not been discussed yet. What if the
1764         \lam{map} function in the example above was not a builtin function
1765         but a user-defined function? Then extracting the lambda expression
1766         into a new function would not be enough, since user-defined functions
1767         can never have higher order arguments. For example, the following
1768         expression shows an example:
1769
1770         \startlambda
1771         twice :: (Word -> Word) -> Word -> Word
1772         twice = λf.λa.f (f a)
1773
1774         main = λa.app (λx. x + x) a
1775         \stoplambda
1776
1777         This example shows a function \lam{twice} that takes a function as a
1778         first argument and applies that function twice to the second argument.
1779         Again, we've made the function monomorphic for clarity, even though
1780         this function would be a lot more useful if it was polymorphic. The
1781         function \lam{main} uses \lam{twice} to apply a lambda epression twice.
1782
1783         When faced with a user defined function, a body is available for that
1784         function. This means we could create a specialized version of the
1785         function that only works for this particular higher order argument
1786         (\ie, we can just remove the argument and call the specialized
1787         function without the argument). This transformation is detailed below.
1788         Applying this transformation to the example gives:
1789
1790         \startlambda
1791         twice' :: Word -> Word
1792         twice' = λb.(λf.λa.f (f a)) (λx. x + x) b
1793
1794         main = λa.app' a
1795         \stoplambda
1796
1797         The \lam{main} function is now in normal form, since the only higher
1798         order value there is the top level lambda expression. The new
1799         \lam{twice'} function is a bit complex, but the entire original body of
1800         the original \lam{twice} function is wrapped in a lambda abstraction
1801         and applied to the argument we've specialized for (\lam{λx. x + x})
1802         and the other arguments. This complex expression can fortunately be
1803         effectively reduced by repeatedly applying β-reduction:
1804
1805         \startlambda
1806         twice' :: Word -> Word
1807         twice' = λb.(b + b) + (b + b)
1808         \stoplambda
1809
1810         This example also shows that the resulting normal form might not be as
1811         efficient as we might hope it to be (it is calculating \lam{(b + b)}
1812         twice). This is discussed in more detail in
1813         \in{section}[sec:normalization:duplicatework].
1814
1815       \subsubsection{Literals}
1816         There are a limited number of literals available in Haskell and Core.
1817         \refdef{enumerated types} When using (enumerating) algebraic
1818         datatypes, a literal is just a reference to the corresponding data
1819         constructor, which has a representable type (the algebraic datatype)
1820         and can be translated directly. This also holds for literals of the
1821         \hs{Bool} Haskell type, which is just an enumerated type.
1822
1823         There is, however, a second type of literal that does not have a
1824         representable type: Integer literals. Cλash supports using integer
1825         literals for all three integer types supported (\hs{SizedWord},
1826         \hs{SizedInt} and \hs{RangedWord}). This is implemented using
1827         Haskell's \hs{Num} typeclass, which offers a \hs{fromInteger} method
1828         that converts any \hs{Integer} to the Cλash datatypes.
1829
1830         When \GHC sees integer literals, it will automatically insert calls to
1831         the \hs{fromInteger} method in the resulting Core expression. For
1832         example, the following expression in Haskell creates a 32 bit unsigned
1833         word with the value 1. The explicit type signature is needed, since
1834         there is no context for \GHC to determine the type from otherwise.
1835
1836         \starthaskell
1837         1 :: SizedWord D32
1838         \stophaskell
1839
1840         This Haskell code results in the following Core expression:
1841
1842         \startlambda
1843         fromInteger @(SizedWord D32) \$dNum (smallInteger 10)
1844         \stoplambda
1845
1846         The literal 10 will have the type \lam{GHC.Prim.Int\#}, which is
1847         converted into an \lam{Integer} by \lam{smallInteger}. Finally, the
1848         \lam{fromInteger} function will finally convert this into a
1849         \lam{SizedWord D32}.
1850
1851         Both the \lam{GHC.Prim.Int\#} and \lam{Integer} types are not
1852         representable, and cannot be translated directly. Fortunately, there
1853         is no need to translate them, since \lam{fromInteger} is a builtin
1854         function that knows how to handle these values. However, this does
1855         require that the \lam{fromInteger} function is directly applied to
1856         these non-representable literal values, otherwise errors will occur.
1857         For example, the following expression is not in the intended normal
1858         form, since one of the let bindings has an unrepresentable type
1859         (\lam{Integer}):
1860
1861         \startlambda
1862         let l = smallInteger 10 in fromInteger @(SizedWord D32) \$dNum l
1863         \stoplambda
1864
1865         By inlining these let-bindings, we can ensure that unrepresentable
1866         literals bound by a let binding end up in an application of the
1867         appropriate builtin function, where they are allowed. Since it is
1868         possible that the application of that function is in a different
1869         function than the definition of the literal value, we will always need
1870         to specialize away any unrepresentable literals that are used as
1871         function arguments. The following two transformations do exactly this.
1872
1873       \subsubsection{Non-representable binding inlining}
1874         This transform inlines let bindings that are bound to a
1875         non-representable value. Since we can never generate a signal
1876         assignment for these bindings (we cannot declare a signal assignment
1877         with a non-representable type, for obvious reasons), we have no choice
1878         but to inline the binding to remove it.
1879
1880         As we have seen in the previous sections, inlining these bindings
1881         solves (part of) the polymorphism, higher order values and
1882         unrepresentable literals in an expression.
1883
1884         \refdef{substitution notation}
1885         \starttrans
1886         letrec
1887           a0 = E0
1888           \vdots
1889           ai = Ei
1890           \vdots
1891           an = En
1892         in
1893           M
1894         --------------------------    \lam{Ei} has a non-representable type.
1895         letrec
1896           a0 = E0 [ai=>Ei] \vdots
1897           ai-1 = Ei-1 [ai=>Ei]
1898           ai+1 = Ei+1 [ai=>Ei]
1899           \vdots
1900           an = En [ai=>Ei]
1901         in
1902           M[ai=>Ei]
1903         \stoptrans
1904
1905         \startbuffer[from]
1906         letrec
1907           a = smallInteger 10
1908           inc = λb -> add b 1
1909           inc' = add 1
1910           x = fromInteger a
1911         in
1912           inc (inc' x)
1913         \stopbuffer
1914
1915         \startbuffer[to]
1916         letrec
1917           x = fromInteger (smallInteger 10)
1918         in
1919           (λb -> add b 1) (add 1 x)
1920         \stopbuffer
1921
1922         \transexample{nonrepinline}{Nonrepresentable binding inlining}{from}{to}
1923
1924       \subsubsection[sec:normalization:specialize]{Function specialization}
1925         This transform removes arguments to user-defined functions that are
1926         not representable at runtime. This is done by creating a
1927         \emph{specialized} version of the function that only works for one
1928         particular value of that argument (in other words, the argument can be
1929         removed).
1930
1931         Specialization means to create a specialized version of the called
1932         function, with one argument already filled in. As a simple example, in
1933         the following program (this is not actual Core, since it directly uses
1934         a literal with the unrepresentable type \lam{GHC.Prim.Int\#}).
1935
1936         \startlambda
1937         f = λa.λb.a + b
1938         inc = λa.f a 1
1939         \stoplambda
1940
1941         We could specialize the function \lam{f} against the literal argument
1942         1, with the following result:
1943
1944         \startlambda
1945         f' = λa.a + 1
1946         inc = λa.f' a
1947         \stoplambda
1948
1949         In some way, this transformation is similar to β-reduction, but it
1950         operates across function boundaries. It is also similar to
1951         non-representable let binding inlining above, since it sort of
1952         \quote{inlines} an expression into a called function.
1953
1954         Special care must be taken when the argument has any free variables.
1955         If this is the case, the original argument should not be removed
1956         completely, but replaced by all the free variables of the expression.
1957         In this way, the original expression can still be evaluated inside the
1958         new function.
1959
1960         To prevent us from propagating the same argument over and over, a
1961         simple local variable reference is not propagated (since is has
1962         exactly one free variable, itself, we would only replace that argument
1963         with itself).
1964
1965         This shows that any free local variables that are not runtime
1966         representable cannot be brought into normal form by this transform. We
1967         rely on an inlining or β-reduction transformation to replace such a
1968         variable with an expression we can propagate again.
1969
1970         \starttrans
1971         x = E
1972         ~
1973         x Y0 ... Yi ... Yn                               \lam{Yi} is not representable
1974         ---------------------------------------------    \lam{Yi} is not a local variable reference
1975         x' Y0 ... Yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} are all free local vars of \lam{Yi}
1976         ~                                                \lam{T0 ... Tn} are the types of \lam{Y0 ... Yn}
1977         x' = λ(y0 :: T0) ... λ(yi-1 :: Ty-1).
1978              λf0 ... λfm.
1979              λ(yi+1 :: Ty+1) ...  λ(yn :: Tn).
1980                E y0 ... yi-1 Yi yi+1 ... yn
1981         \stoptrans
1982
1983         This is a bit of a complex transformation. It transforms an
1984         application of the function \lam{x}, where one of the arguments
1985         (\lam{Y_i}) is not representable. A new
1986         function \lam{x'} is created that wraps the body of the old function.
1987         The body of the new function becomes a number of nested lambda
1988         abstractions, one for each of the original arguments that are left
1989         unchanged.
1990
1991         The ith argument is replaced with the free variables of
1992         \lam{Y_i}. Note that we reuse the same binders as those used in
1993         \lam{Y_i}, since we can then just use \lam{Y_i} inside the new
1994         function body and have all of the variables it uses be in scope.
1995
1996         The argument that we are specializing for, \lam{Y_i}, is put inside
1997         the new function body. The old function body is applied to it. Since
1998         we use this new function only in place of an application with that
1999         particular argument \lam{Y_i}, behaviour should not change.
2000
2001         Note that the types of the arguments of our new function are taken
2002         from the types of the \emph{actual} arguments (\lam{T0 ... Tn}). This
2003         means that any polymorphism in the arguments is removed, even when the
2004         corresponding explicit type lambda is not removed
2005         yet.
2006
2007         \todo{Examples. Perhaps reference the previous sections}
2008
2009   \section{Unsolved problems}
2010     The above system of transformations has been implemented in the prototype
2011     and seems to work well to compile simple and more complex examples of
2012     hardware descriptions. \todo{Ref christiaan?} However, this normalization
2013     system has not seen enough review and work to be complete and work for
2014     every Core expression that is supplied to it. A number of problems
2015     have already been identified and are discussed in this section.
2016
2017     \subsection[sec:normalization:duplicatework]{Work duplication}
2018         A possible problem of β-reduction is that it could duplicate work.
2019         When the expression applied is not a simple variable reference, but
2020         requires calculation and the binder the lambda abstraction binds to
2021         is used more than once, more hardware might be generated than strictly
2022         needed.
2023
2024         As an example, consider the expression:
2025
2026         \startlambda
2027         (λx. x + x) (a * b)
2028         \stoplambda
2029
2030         When applying β-reduction to this expression, we get:
2031
2032         \startlambda
2033         (a * b) + (a * b)
2034         \stoplambda
2035
2036         which of course calculates \lam{(a * b)} twice.
2037
2038         A possible solution to this would be to use the following alternative
2039         transformation, which is of course no longer normal β-reduction. The
2040         followin transformation has not been tested in the prototype, but is
2041         given here for future reference:
2042
2043         \starttrans
2044         (λx.E) M
2045         -----------------
2046         letrec x = M in E
2047         \stoptrans
2048
2049         This doesn't seem like much of an improvement, but it does get rid of
2050         the lambda expression (and the associated higher order value), while
2051         at the same time introducing a new let binding. Since the result of
2052         every application or case expression must be bound by a let expression
2053         in the intended normal form anyway, this is probably not a problem. If
2054         the argument happens to be a variable reference, then simple let
2055         binding removal (\in{section}[sec:normalization:simplelet]) will
2056         remove it, making the result identical to that of the original
2057         β-reduction transformation.
2058
2059         When also applying argument simplification to the above example, we
2060         get the following expression:
2061
2062         \startlambda
2063         let y = (a * b)
2064             z = (a * b)
2065         in y + z
2066         \stoplambda
2067
2068         Looking at this, we could imagine an alternative approach: Create a
2069         transformation that removes let bindings that bind identical values.
2070         In the above expression, the \lam{y} and \lam{z} variables could be
2071         merged together, resulting in the more efficient expression:
2072
2073         \startlambda
2074         let y = (a * b) in y + y
2075         \stoplambda
2076
2077       \subsection[sec:normalization:non-determinism]{Non-determinism}
2078         As an example, again consider the following expression:
2079
2080         \startlambda
2081         (λx. x + x) (a * b)
2082         \stoplambda
2083
2084         We can apply both β-reduction (\in{section}[sec:normalization:beta])
2085         as well as argument simplification
2086         (\in{section}[sec:normalization:argsimpl]) to this expression.
2087
2088         When applying argument simplification first and then β-reduction, we
2089         get the following expression:
2090
2091         \startlambda
2092         let y = (a * b) in y + y
2093         \stoplambda
2094
2095         When applying β-reduction first and then argument simplification, we
2096         get the following expression:
2097
2098         \startlambda
2099         let y = (a * b)
2100             z = (a * b)
2101         in y + z
2102         \stoplambda
2103
2104         As you can see, this is a different expression. This means that the
2105         order of expressions, does in fact change the resulting normal form,
2106         which is something that we would like to avoid. In this particular
2107         case one of the alternatives is even clearly more efficient, so we
2108         would of course like the more efficient form to be the normal form.
2109
2110         For this particular problem, the solutions for duplication of work
2111         seem from the previous section seem to fix the determinism of our
2112         transformation system as well. However, it is likely that there are
2113         other occurences of this problem.
2114
2115       \subsection[sec:normalization:castproblems]{Casts}
2116         We do not fully understand the use of cast expressions in Core, so
2117         there are probably expressions involving cast expressions that cannot
2118         be brought into intended normal form by this transformation system.
2119
2120         The uses of casts in the core system should be investigated more and
2121         transformations will probably need updating to handle them in all
2122         cases.
2123
2124       \subsection[sec:normalization:stateproblems]{Normalization of stateful descriptions}
2125         Currently, the intended normal form definition\refdef{intended
2126         normal form definition} offers enough freedom to describe all
2127         valid stateful descriptions, but is not limiting enough. It is
2128         possible to write descriptions which are in intended normal
2129         form, but cannot be translated into \VHDL in a meaningful way
2130         (\eg, a function that swaps two substates in its result, or a
2131         function that changes a substate itself instead of passing it to
2132         a subfunction).
2133
2134         It is now up to the programmer to not do anything funny with
2135         these state values, whereas the normalization just tries not to
2136         mess up the flow of state values. In practice, there are
2137         situations where a Core program that \emph{could} be a valid
2138         stateful description is not translateable by the prototype. This
2139         most often happens when statefulness is mixed with pattern
2140         matching, causing a state input to be unpacked multiple times or
2141         be unpacked and repacked only in some of the code paths.
2142
2143         Without going into detail about the exact problems (of which
2144         there are probably more than have shown up so far), it seems
2145         unlikely that these problems can be solved entirely by just
2146         improving the \VHDL state generation in the final stage. The
2147         normalization stage seems the best place to apply the rewriting
2148         needed to support more complex stateful descriptions. This does
2149         of course mean that the intended normal form definition must be
2150         extended as well to be more specific about how state handling
2151         should look like in normal form.
2152         \in{Section}[sec:prototype:statelimits] already contains a
2153         tight description of the limitations on the use of state
2154         variables, which could be adapted into the intended normal form.
2155
2156   \section[sec:normalization:properties]{Provable properties}
2157     When looking at the system of transformations outlined above, there are a
2158     number of questions that we can ask ourselves. The main question is of course:
2159     \quote{Does our system work as intended?}. We can split this question into a
2160     number of subquestions:
2161
2162     \startitemize[KR]
2163     \item[q:termination] Does our system \emph{terminate}? Since our system will
2164     keep running as long as transformations apply, there is an obvious risk that
2165     it will keep running indefinitely. This typically happens when one
2166     transformation produces a result that is transformed back to the original
2167     by another transformation, or when one or more transformations keep
2168     expanding some expression.
2169     \item[q:soundness] Is our system \emph{sound}? Since our transformations
2170     continuously modify the expression, there is an obvious risk that the final
2171     normal form will not be equivalent to the original program: Its meaning could
2172     have changed.
2173     \item[q:completeness] Is our system \emph{complete}? Since we have a complex
2174     system of transformations, there is an obvious risk that some expressions will
2175     not end up in our intended normal form, because we forgot some transformation.
2176     In other words: Does our transformation system result in our intended normal
2177     form for all possible inputs?
2178     \item[q:determinism] Is our system \emph{deterministic}? Since we have defined
2179     no particular order in which the transformation should be applied, there is an
2180     obvious risk that different transformation orderings will result in
2181     \emph{different} normal forms. They might still both be intended normal forms
2182     (if our system is \emph{complete}) and describe correct hardware (if our
2183     system is \emph{sound}), so this property is less important than the previous
2184     three: The translator would still function properly without it.
2185     \stopitemize
2186
2187     Unfortunately, the final transformation system has only been
2188     developed in the final part of the research, leaving no more time
2189     for verifying these properties. In fact, it is likely that the
2190     current transformation system still violates some of these
2191     properties in some cases and should be improved (or extra conditions
2192     on the input hardware descriptions should be formulated).
2193
2194     This is most likely the case with the completeness and determinism
2195     properties, perhaps als the termination property. The soundness
2196     property probably holds, since it is easier to manually verify (each
2197     transformation can be reviewed separately).
2198
2199     Even though no complete proofs have been made, some ideas for
2200     possible proof strategies are shown below.
2201
2202     \subsection{Graph representation}
2203       Before looking into how to prove these properties, we'll look at
2204       transformation systems from a graph perspective. We will first define
2205       the graph view and then illustrate it using a simple example from lambda
2206       calculus (which is a different system than the Cλash normalization
2207       system). The nodes of the graph are all possible Core expressions. The
2208       (directed) edges of the graph are transformations. When a transformation
2209       α applies to an expression \lam{A} to produce an expression \lam{B}, we
2210       add an edge from the node for \lam{A} to the node for \lam{B}, labeled
2211       α.
2212
2213       \startuseMPgraphic{TransformGraph}
2214         save a, b, c, d;
2215
2216         % Nodes
2217         newCircle.a(btex \lam{(λx.λy. (+) x y) 1} etex);
2218         newCircle.b(btex \lam{λy. (+) 1 y} etex);
2219         newCircle.c(btex \lam{(λx.(+) x) 1} etex);
2220         newCircle.d(btex \lam{(+) 1} etex);
2221
2222         b.c = origin;
2223         c.c = b.c + (4cm, 0cm);
2224         a.c = midpoint(b.c, c.c) + (0cm, 4cm);
2225         d.c = midpoint(b.c, c.c) - (0cm, 3cm);
2226
2227         % β-conversion between a and b
2228         ncarc.a(a)(b) "name(bred)";
2229         ObjLabel.a(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
2230         ncarc.b(b)(a) "name(bexp)", "linestyle(dashed withdots)";
2231         ObjLabel.b(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
2232
2233         % η-conversion between a and c
2234         ncarc.a(a)(c) "name(ered)";
2235         ObjLabel.a(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
2236         ncarc.c(c)(a) "name(eexp)", "linestyle(dashed withdots)";
2237         ObjLabel.c(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
2238
2239         % η-conversion between b and d
2240         ncarc.b(b)(d) "name(ered)";
2241         ObjLabel.b(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
2242         ncarc.d(d)(b) "name(eexp)", "linestyle(dashed withdots)";
2243         ObjLabel.d(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
2244
2245         % β-conversion between c and d
2246         ncarc.c(c)(d) "name(bred)";
2247         ObjLabel.c(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
2248         ncarc.d(d)(c) "name(bexp)", "linestyle(dashed withdots)";
2249         ObjLabel.d(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
2250
2251         % Draw objects and lines
2252         drawObj(a, b, c, d);
2253       \stopuseMPgraphic
2254
2255       \placeexample[right][ex:TransformGraph]{Partial graph of a lambda calculus
2256       system with β and η reduction (solid lines) and expansion (dotted lines).}
2257           \boxedgraphic{TransformGraph}
2258
2259       Of course the graph for Cλash is unbounded, since we can construct an
2260       infinite amount of Core expressions. Also, there might potentially be
2261       multiple edges between two given nodes (with different labels), though
2262       seems unlikely to actually happen in our system.
2263
2264       See \in{example}[ex:TransformGraph] for the graph representation of a very
2265       simple lambda calculus that contains just the expressions \lam{(λx.λy. (+) x
2266       y) 1}, \lam{λy. (+) 1 y}, \lam{(λx.(+) x) 1} and \lam{(+) 1}. The
2267       transformation system consists of β-reduction and η-reduction (solid edges) or
2268       β-expansion and η-expansion (dotted edges).
2269
2270       \todo{Define β-reduction and η-reduction?}
2271
2272       Note that the normal form of such a system consists of the set of nodes
2273       (expressions) without outgoing edges, since those are the expressions to which
2274       no transformation applies anymore. We call this set of nodes the \emph{normal
2275       set}. The set of nodes containing expressions in intended normal
2276       form \refdef{intended normal form} is called the \emph{intended
2277       normal set}.
2278
2279       From such a graph, we can derive some properties easily:
2280       \startitemize[KR]
2281         \item A system will \emph{terminate} if there is no path of infinite length
2282         in the graph (this includes cycles, but can also happen without cycles).
2283         \item Soundness is not easily represented in the graph.
2284         \item A system is \emph{complete} if all of the nodes in the normal set have
2285         the intended normal form. The inverse (that all of the nodes outside of
2286         the normal set are \emph{not} in the intended normal form) is not
2287         strictly required. In other words, our normal set must be a
2288         subset of the intended normal form, but they do not need to be
2289         the same set.
2290         form.
2291         \item A system is deterministic if all paths starting at a particular
2292         node, which end in a node in the normal set, end at the same node.
2293       \stopitemize
2294
2295       When looking at the \in{example}[ex:TransformGraph], we see that the system
2296       terminates for both the reduction and expansion systems (but note that, for
2297       expansion, this is only true because we've limited the possible
2298       expressions.  In comlete lambda calculus, there would be a path from
2299       \lam{(λx.λy. (+) x y) 1} to \lam{(λx.λy.(λz.(+) z) x y) 1} to
2300       \lam{(λx.λy.(λz.(λq.(+) q) z) x y) 1} etc.)
2301
2302       If we would consider the system with both expansion and reduction, there
2303       would no longer be termination either, since there would be cycles all
2304       over the place.
2305
2306       The reduction and expansion systems have a normal set of containing just
2307       \lam{(+) 1} or \lam{(λx.λy. (+) x y) 1} respectively. Since all paths in
2308       either system end up in these normal forms, both systems are \emph{complete}.
2309       Also, since there is only one node in the normal set, it must obviously be
2310       \emph{deterministic} as well.
2311
2312     \subsection{Termination}
2313       In general, proving termination of an arbitrary program is a very
2314       hard problem. \todo{Ref about arbitrary termination} Fortunately,
2315       we only have to prove termination for our specific transformation
2316       system.
2317
2318       A common approach for these kinds of proofs is to associate a
2319       measure with each possible expression in our system. If we can
2320       show that each transformation strictly decreases this measure
2321       (\ie, the expression transformed to has a lower measure than the
2322       expression transformed from).  \todo{ref about measure-based
2323       termination proofs / analysis}
2324
2325       A good measure for a system consisting of just β-reduction would
2326       be the number of lambda expressions in the expression. Since every
2327       application of β-reduction removes a lambda abstraction (and there
2328       is always a bounded number of lambda abstractions in every
2329       expression) we can easily see that a transformation system with
2330       just β-reduction will always terminate.
2331
2332       For our complete system, this measure would be fairly complex
2333       (probably the sum of a lot of things). Since the (conditions on)
2334       our transformations are pretty complex, we would need to include
2335       both simple things like the number of let expressions as well as
2336       more complex things like the number of case expressions that are
2337       not yet in normal form.
2338
2339       No real attempt has been made at finding a suitable measure for
2340       our system yet.
2341
2342     \subsection{Soundness}
2343       Soundness is a property that can be proven for each transformation
2344       separately. Since our system only runs separate transformations
2345       sequentially, if each of our transformations leaves the
2346       \emph{meaning} of the expression unchanged, then the entire system
2347       will of course leave the meaning unchanged and is thus
2348       \emph{sound}.
2349
2350       The current prototype has only been verified in an ad-hoc fashion
2351       by inspecting (the code for) each transformation. A more formal
2352       verification would be more appropriate.
2353
2354       To be able to formally show that each transformation properly
2355       preserves the meaning of every expression, we require an exact
2356       definition of the \emph{meaning} of every expression, so we can
2357       compare them. A definition of the operational semantics of \GHC's Core
2358       language is available \cite[sulzmann07], but this does not seem
2359       sufficient for our goals (but it is a good start).
2360
2361       It should be possible to have a single formal definition of
2362       meaning for Core for both normal Core compilation by \GHC and for
2363       our compilation to \VHDL. The main difference seems to be that in
2364       hardware every expression is always evaluated, while in software
2365       it is only evaluated if needed, but it should be possible to
2366       assign a meaning to core expressions that assumes neither.
2367
2368       Since each of the transformations can be applied to any
2369       subexpression as well, there is a constraint on our meaning
2370       definition: The meaning of an expression should depend only on the
2371       meaning of subexpressions, not on the expressions themselves. For
2372       example, the meaning of the application in \lam{f (let x = 4 in
2373       x)} should be the same as the meaning of the application in \lam{f
2374       4}, since the argument subexpression has the same meaning (though
2375       the actual expression is different).
2376
2377     \subsection{Completeness}
2378       Proving completeness is probably not hard, but it could be a lot
2379       of work. We have seen above that to prove completeness, we must
2380       show that the normal set of our graph representation is a subset
2381       of the intended normal set.
2382
2383       However, it is hard to systematically generate or reason about the
2384       normal set, since it is defined as any nodes to which no
2385       transformation applies. To determine this set, each transformation
2386       must be considered and when a transformation is added, the entire
2387       set should be re-evaluated. This means it is hard to show that
2388       each node in the normal set is also in the intended normal set.
2389       Reasoning about our intended normal set is easier, since we know
2390       how to generate it from its definition. \refdef{intended normal
2391       form definition}.
2392
2393       Fortunately, we can also prove the complement (which is
2394       equivalent, since $A \subseteq B \Leftrightarrow \overline{B}
2395       \subseteq \overline{A}$): Show that the set of nodes not in
2396       intended normal form is a subset of the set of nodes not in normal
2397       form. In other words, show that for every expression that is not
2398       in intended normal form, that there is at least one transformation
2399       that applies to it (since that means it is not in normal form
2400       either and since $A \subseteq C \Leftrightarrow \forall x (x \in A
2401       \rightarrow x \in C)$).
2402
2403       By systematically reviewing the entire Core language definition
2404       along with the intended normal form definition (both of which have
2405       a similar structure), it should be possible to identify all
2406       possible (sets of) core expressions that are not in intended
2407       normal form and identify a transformation that applies to it.
2408
2409       This approach is especially useful for proving completeness of our
2410       system, since if expressions exist to which none of the
2411       transformations apply (\ie if the system is not yet complete), it
2412       is immediately clear which expressions these are and adding
2413       (or modifying) transformations to fix this should be relatively
2414       easy.
2415
2416       As observed above, applying this approach is a lot of work, since
2417       we need to check every (set of) transformation(s) separately.
2418
2419       \todo{Perhaps do a few steps of the proofs as proof-of-concept}
2420
2421 % vim: set sw=2 sts=2 expandtab: