Chapters/Normalization.tex

   1 \chapter[chap:normalization]{Normalization}
   2   % A helper to print a single example in the half the page width. The example
   3   % text should be in a buffer whose name is given in an argument.
   4   %
   5   % The align=right option really does left-alignment, but without the program
   6   % will end up on a single line. The strut=no option prevents a bunch of empty
   7   % space at the start of the frame.
   8   \define[1]\example{
   9     \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
  10       \setuptyping[option=LAM,style=sans,before=,after=,strip=auto]
  11       \typebuffer[#1]
  12       \setuptyping[option=none,style=\tttf,strip=auto]
  13     }
  14   }
  15
  16   \define[3]\transexample{
  17     \placeexample[here]{#1}
  18     \startcombination[2*1]
  19       {\example{#2}}{Original program}
  20       {\example{#3}}{Transformed program}
  21     \stopcombination
  22   }
  23
  24   The first step in the core to \small{VHDL} translation process, is normalization. We
  25   aim to bring the core description into a simpler form, which we can
  26   subsequently translate into \small{VHDL} easily. This normal form is needed because
  27   the full core language is more expressive than \small{VHDL} in some areas and because
  28   core can describe expressions that do not have a direct hardware
  29   interpretation.
  30
  31   TODO: Describe core properties not supported in \small{VHDL}, and describe how the
  32   \small{VHDL} we want to generate should look like.
  33
  34   \section{Normal form}
  35     The transformations described here have a well-defined goal: To bring the
  36     program in a well-defined form that is directly translatable to hardware,
  37     while fully preserving the semantics of the program. We refer to this form as
  38     the \emph{normal form} of the program. The formal definition of this normal
  39     form is quite simple:
  40
  41     \placedefinition{}{A program is in \emph{normal form} if none of the
  42     transformations from this chapter apply.}
  43
  44     Of course, this is an \quote{easy} definition of the normal form, since our
  45     program will end up in normal form automatically. The more interesting part is
  46     to see if this normal form actually has the properties we would like it to
  47     have.
  48
  49     But, before getting into more definitions and details about this normal form,
  50     let's try to get a feeling for it first. The easiest way to do this is by
  51     describing the things we want to not have in a normal form.
  52
  53     \startitemize
  54       \item Any \emph{polymorphism} must be removed. When laying down hardware, we
  55       can't generate any signals that can have multiple types. All types must be
  56       completely known to generate hardware.
  57
  58       \item Any \emph{higher order} constructions must be removed. We can't
  59       generate a hardware signal that contains a function, so all values,
  60       arguments and returns values used must be first order.
  61
  62       \item Any complex \emph{nested scopes} must be removed. In the \small{VHDL}
  63       description, every signal is in a single scope. Also, full expressions are
  64       not supported everywhere (in particular port maps can only map signal names,
  65       not expressions). To make the \small{VHDL} generation easy, all values must be bound
  66       on the \quote{top level}.
  67     \stopitemize
  68
  69     TODO: Intermezzo: functions vs plain values
  70
  71     A very simple example of a program in normal form is given in
  72     \in{example}[ex:MulSum]. As you can see, all arguments to the function (which
  73     will become input ports in the final hardware) are at the top. This means that
  74     the body of the final lambda abstraction is never a function, but always a
  75     plain value.
  76
  77     After the lambda abstractions, we see a single let expression, that binds two
  78     variables (\lam{mul} and \lam{sum}). These variables will be signals in the
  79     final hardware, bound to the output port of the \lam{*} and \lam{+}
  80     components.
  81
  82     The final line (the \quote{return value} of the function) selects the
  83     \lam{sum} signal to be the output port of the function. This \quote{return
  84     value} can always only be a variable reference, never a more complex
  85     expression.
  86
  87     \startbuffer[MulSum]
  88     alu :: Bit -> Word -> Word -> Word
  89     alu = λa.λb.λc.
  90         let
  91           mul = (*) a b
  92           sum = (+) mul c
  93         in
  94           sum
  95     \stopbuffer
  96
  97     \startuseMPgraphic{MulSum}
  98       save a, b, c, mul, add, sum;
  99
 100       % I/O ports
 101       newCircle.a(btex $a$ etex) "framed(false)";
 102       newCircle.b(btex $b$ etex) "framed(false)";
 103       newCircle.c(btex $c$ etex) "framed(false)";
 104       newCircle.sum(btex $res$ etex) "framed(false)";
 105
 106       % Components
 107       newCircle.mul(btex - etex);
 108       newCircle.add(btex + etex);
 109
 110       a.c      - b.c   = (0cm, 2cm);
 111       b.c      - c.c   = (0cm, 2cm);
 112       add.c            = c.c + (2cm, 0cm);
 113       mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
 114       sum.c            = add.c + (2cm, 0cm);
 115       c.c              = origin;
 116
 117       % Draw objects and lines
 118       drawObj(a, b, c, mul, add, sum);
 119
 120       ncarc(a)(mul) "arcangle(15)";
 121       ncarc(b)(mul) "arcangle(-15)";
 122       ncline(c)(add);
 123       ncline(mul)(add);
 124       ncline(add)(sum);
 125     \stopuseMPgraphic
 126
 127     \placeexample[here][ex:MulSum]{Simple architecture consisting of an adder and a
 128     subtractor.}
 129       \startcombination[2*1]
 130         {\typebufferlam{MulSum}}{Core description in normal form.}
 131         {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
 132       \stopcombination
 133
 134     The previous example described composing an architecture by calling other
 135     functions (operators), resulting in a simple architecture with component and
 136     connection. There is of course also some mechanism for choice in the normal
 137     form. In a normal Core program, the \emph{case} expression can be used in a
 138     few different ways to describe choice. In normal form, this is limited to a
 139     very specific form.
 140
 141     \in{Example}[ex:AddSubAlu] shows an example describing a
 142     simple \small{ALU}, which chooses between two operations based on an opcode
 143     bit. The main structure is the same as in \in{example}[ex:MulSum], but this
 144     time the \lam{res} variable is bound to a case expression. This case
 145     expression scrutinizes the variable \lam{opcode} (and scrutinizing more
 146     complex expressions is not supported). The case expression can select a
 147     different variable based on the constructor of \lam{opcode}.
 148
 149     \startbuffer[AddSubAlu]
 150     alu :: Bit -> Word -> Word -> Word
 151     alu = λopcode.λa.λb.
 152         let
 153           res1 = (+) a b
 154           res2 = (-) a b
 155           res = case opcode of
 156             Low -> res1
 157             High -> res2
 158         in
 159           res
 160     \stopbuffer
 161
 162     \startuseMPgraphic{AddSubAlu}
 163       save opcode, a, b, add, sub, mux, res;
 164
 165       % I/O ports
 166       newCircle.opcode(btex $opcode$ etex) "framed(false)";
 167       newCircle.a(btex $a$ etex) "framed(false)";
 168       newCircle.b(btex $b$ etex) "framed(false)";
 169       newCircle.res(btex $res$ etex) "framed(false)";
 170       % Components
 171       newCircle.add(btex + etex);
 172       newCircle.sub(btex - etex);
 173       newMux.mux;
 174
 175       opcode.c - a.c   = (0cm, 2cm);
 176       add.c    - a.c   = (4cm, 0cm);
 177       sub.c    - b.c   = (4cm, 0cm);
 178       a.c      - b.c   = (0cm, 3cm);
 179       mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
 180       res.c    - mux.c = (1.5cm, 0cm);
 181       b.c              = origin;
 182
 183       % Draw objects and lines
 184       drawObj(opcode, a, b, res, add, sub, mux);
 185
 186       ncline(a)(add) "posA(e)";
 187       ncline(b)(sub) "posA(e)";
 188       nccurve(a)(sub) "posA(e)", "angleA(0)";
 189       nccurve(b)(add) "posA(e)", "angleA(0)";
 190       nccurve(add)(mux) "posB(inpa)", "angleB(0)";
 191       nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
 192       nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
 193       ncline(mux)(res) "posA(out)";
 194     \stopuseMPgraphic
 195
 196     \placeexample[here][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
 197       \startcombination[2*1]
 198         {\typebufferlam{AddSubAlu}}{Core description in normal form.}
 199         {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
 200       \stopcombination
 201
 202     As a more complete example, consider \in{example}[ex:NormalComplete]. This
 203     example contains everything that is supported in normal form, with the
 204     exception of builtin higher order functions. The graphical version of the
 205     architecture contains a slightly simplified version, since the state tuple
 206     packing and unpacking have been left out. Instead, two seperate registers are
 207     drawn. Also note that most synthesis tools will further optimize this
 208     architecture by removing the multiplexers at the register input and replace
 209     them with some logic in the clock inputs, but we want to show the architecture
 210     as close to the description as possible.
 211
 212     \startbuffer[NormalComplete]
 213       regbank :: Bit
 214                  -> Word
 215                  -> State (Word, Word)
 216                  -> (State (Word, Word), Word)
 217
 218       -- All arguments are an inital lambda
 219       regbank = λa.λd.λsp.
 220       -- There are nested let expressions at top level
 221       let
 222         -- Unpack the state by coercion (\eg, cast from
 223         -- State (Word, Word) to (Word, Word))
 224         s = sp :: (Word, Word)
 225         -- Extract both registers from the state
 226         r1 = case s of (fst, snd) -> fst
 227         r2 = case s of (fst, snd) -> snd
 228         -- Calling some other user-defined function.
 229         d' = foo d
 230         -- Conditional connections
 231         out = case a of
 232           High -> r1
 233           Low -> r2
 234         r1' = case a of
 235           High -> d'
 236           Low -> r1
 237         r2' = case a of
 238           High -> r2
 239           Low -> d'
 240         -- Packing a tuple
 241         s' = (,) r1' r2'
 242         -- pack the state by coercion (\eg, cast from
 243         -- (Word, Word) to State (Word, Word))
 244         sp' = s' :: State (Word, Word)
 245         -- Pack our return value
 246         res = (,) sp' out
 247       in
 248         -- The actual result
 249         res
 250     \stopbuffer
 251
 252     \startuseMPgraphic{NormalComplete}
 253       save a, d, r, foo, muxr, muxout, out;
 254
 255       % I/O ports
 256       newCircle.a(btex \lam{a} etex) "framed(false)";
 257       newCircle.d(btex \lam{d} etex) "framed(false)";
 258       newCircle.out(btex \lam{out} etex) "framed(false)";
 259       % Components
 260       %newCircle.add(btex + etex);
 261       newBox.foo(btex \lam{foo} etex);
 262       newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
 263       newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
 264       newMux.muxr1;
 265       % Reflect over the vertical axis
 266       reflectObj(muxr1)((0,0), (0,1));
 267       newMux.muxr2;
 268       newMux.muxout;
 269       rotateObj(muxout)(-90);
 270
 271       d.c               = foo.c + (0cm, 1.5cm);
 272       a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
 273       foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
 274       muxr1.c           = r1.c + (0cm, 2cm);
 275       muxr2.c           = r2.c + (0cm, 2cm);
 276       r2.c              = r1.c + (4cm, 0cm);
 277       r1.c              = origin;
 278       muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
 279       out.c             = muxout.c - (0cm, 1.5cm);
 280
 281     %  % Draw objects and lines
 282       drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
 283
 284       ncline(d)(foo);
 285       nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
 286       nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
 287       nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
 288       nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
 289       nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
 290       nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
 291       nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
 292       nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
 293       % Connect port a
 294       nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
 295       nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
 296       nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
 297       ncline(muxout)(out) "posA(out)";
 298     \stopuseMPgraphic
 299
 300     \placeexample[here][ex:NormalComplete]{Simple architecture consisting of an adder and a
 301     subtractor.}
 302       \startcombination[2*1]
 303         {\typebufferlam{NormalComplete}}{Core description in normal form.}
 304         {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
 305       \stopcombination
 306
 307     \subsection{Intended normal form definition}
 308       Now we have some intuition for the normal form, we can describe how we want
 309       the normal form to look like in a slightly more formal manner. The following
 310       EBNF-like description completely captures the intended structure (and
 311       generates a subset of GHC's core format).
 312
 313       Some clauses have an expression listed in parentheses. These are conditions
 314       that need to apply to the clause.
 315
 316       \startlambda
 317       \italic{normal} = \italic{lambda}
 318       \italic{lambda} = λvar.\italic{lambda} (representable(var))
 319                       | \italic{toplet}
 320       \italic{toplet} = letrec [\italic{binding}...] in var (representable(varvar))
 321       \italic{binding} = var = \italic{rhs} (representable(rhs))
 322                        -- State packing and unpacking by coercion
 323                        | var0 = var1 :: State ty (lvar(var1))
 324                        | var0 = var1 :: ty (var0 :: State ty) (lvar(var1))
 325       \italic{rhs} = userapp
 326                    | builtinapp
 327                    -- Extractor case
 328                    | case var of C a0 ... an -> ai (lvar(var))
 329                    -- Selector case
 330                    | case var of (lvar(var))
 331                       DEFAULT -> var0 (lvar(var0))
 332                       C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar))
 333       \italic{userapp} = \italic{userfunc}
 334                        | \italic{userapp} {userarg}
 335       \italic{userfunc} = var (gvar(var))
 336       \italic{userarg} = var (lvar(var))
 337       \italic{builtinapp} = \italic{builtinfunc}
 338                           | \italic{builtinapp} \italic{builtinarg}
 339       \italic{builtinfunc} = var (bvar(var))
 340       \italic{builtinarg} = \italic{coreexpr}
 341       \stoplambda
 342
 343       -- TODO: Limit builtinarg further
 344
 345       -- TODO: There can still be other casts around (which the code can handle,
 346       e.g., ignore), which still need to be documented here.
 347
 348       -- TODO: Note about the selector case. It just supports Bit and Bool
 349       currently, perhaps it should be generalized in the normal form?
 350
 351       When looking at such a program from a hardware perspective, the top level
 352       lambda's define the input ports. The value produced by the let expression is
 353       the output port. Most function applications bound by the let expression
 354       define a component instantiation, where the input and output ports are mapped
 355       to local signals or arguments. Some of the others use a builtin
 356       construction (\eg the \lam{case} statement) or call a builtin function
 357       (\eg \lam{add} or \lam{sub}). For these, a hardcoded \small{VHDL} translation is
 358       available.
 359
 360   \section{Transformation notation}
 361     To be able to concisely present transformations, we use a specific format to
 362     them. It is a simple format, similar to one used in logic reasoning.
 363
 364     Such a transformation description looks like the following.
 365
 366     \starttrans
 367     <context conditions>
 368     ~
 369     <original expression>
 370     --------------------------          <expression conditions>
 371     <transformed expresssion>
 372     ~
 373     <context additions>
 374     \stoptrans
 375
 376     This format desribes a transformation that applies to \lam{original
 377     expresssion} and transforms it into \lam{transformed expression}, assuming
 378     that all conditions apply. In this format, there are a number of placeholders
 379     in pointy brackets, most of which should be rather obvious in their meaning.
 380     Nevertheless, we will more precisely specify their meaning below:
 381
 382       \startdesc{<original expression>} The expression pattern that will be matched
 383       against (subexpressions of) the expression to be transformed. We call this a
 384       pattern, because it can contain \emph{placeholders} (variables), which match
 385       any expression or binder. Any such placeholder is said to be \emph{bound} to
 386       the expression it matches. It is convention to use an uppercase latter (\eg
 387       \lam{M} or \lam{E} to refer to any expression (including a simple variable
 388       reference) and lowercase letters (\eg \lam{v} or \lam{b}) to refer to
 389       (references to) binders.
 390
 391       For example, the pattern \lam{a + B} will match the expression
 392       \lam{v + (2 * w)} (and bind \lam{a} to \lam{v} and \lam{B} to
 393       \lam{(2 * 2)}), but not \lam{v + (2 * w)}.
 394       \stopdesc
 395
 396       \startdesc{<expression conditions>}
 397       These are extra conditions on the expression that is matched. These
 398       conditions can be used to further limit the cases in which the
 399       transformation applies, in particular to prevent a transformation from
 400       causing a loop with itself or another transformation.
 401
 402       Only if these if these conditions are \emph{all} true, this transformation
 403       applies.
 404       \stopdesc
 405
 406       \startdesc{<context conditions>}
 407       These are a number of extra conditions on the context of the function. In
 408       particular, these conditions can require some other top level function to be
 409       present, whose value matches the pattern given here. The format of each of
 410       these conditions is: \lam{binder = <pattern>}.
 411
 412       Typically, the binder is some placeholder bound in the \lam{<original
 413       expression>}, while the pattern contains some placeholders that are used in
 414       the \lam{transformed expression}.
 415
 416       Only if a top level binder exists that matches each binder and pattern, this
 417       transformation applies.
 418       \stopdesc
 419
 420       \startdesc{<transformed expression>}
 421       This is the expression template that is the result of the transformation. If, looking
 422       at the above three items, the transformation applies, the \lam{original
 423       expression} is completely replaced with the \lam{<transformed expression>}.
 424       We call this a template, because it can contain placeholders, referring to
 425       any placeholder bound by the \lam{<original expression>} or the
 426       \lam{<context conditions>}. The resulting expression will have those
 427       placeholders replaced by the values bound to them.
 428
 429       Any binder (lowercase) placeholder that has no value bound to it yet will be
 430       bound to (and replaced with) a fresh binder.
 431       \stopdesc
 432
 433       \startdesc{<context additions>}
 434       These are templates for new functions to add to the context. This is a way
 435       to have a transformation create new top level functiosn.
 436
 437       Each addition has the form \lam{binder = template}. As above, any
 438       placeholder in the addition is replaced with the value bound to it, and any
 439       binder placeholder that has no value bound to it yet will be bound to (and
 440       replaced with) a fresh binder.
 441       \stopdesc
 442
 443     As an example, we'll look at η-abstraction:
 444
 445     \starttrans
 446     E                 \lam{E :: a -> b}
 447     --------------    \lam{E} does not occur on a function position in an application
 448     λx.E x            \lam{E} is not a lambda abstraction.
 449     \stoptrans
 450
 451     Consider the following function, which is a fairly obvious way to specify a
 452     simple ALU (Note \at{example}[ex:AddSubAlu] is the normal form of this
 453     function):
 454
 455     \startlambda
 456     alu :: Bit -> Word -> Word -> Word
 457     alu = λopcode. case opcode of
 458       Low -> (+)
 459       High -> (-)
 460     \stoplambda
 461
 462     There are a few subexpressions in this function to which we could possibly
 463     apply the transformation. Since the pattern of the transformation is only
 464     the placeholder \lam{E}, any expression will match that. Whether the
 465     transformation applies to an expression is thus solely decided by the
 466     conditions to the right of the transformation.
 467
 468     We will look at each expression in the function in a top down manner. The
 469     first expression is the entire expression the function is bound to.
 470
 471     \startlambda
 472     λopcode. case opcode of
 473       Low -> (+)
 474       High -> (-)
 475     \stoplambda
 476
 477     As said, the expression pattern matches this. The type of this expression is
 478     \lam{Bit -> Word -> Word -> Word}, which matches \lam{a -> b} (Note that in
 479     this case \lam{a = Bit} and \lam{b = Word -> Word -> Word}).
 480
 481     Since this expression is at top level, it does not occur at a function
 482     position of an application. However, The expression is a lambda abstraction,
 483     so this transformation does not apply.
 484
 485     The next expression we could apply this transformation to, is the body of
 486     the lambda abstraction:
 487
 488     \startlambda
 489     case opcode of
 490       Low -> (+)
 491       High -> (-)
 492     \stoplambda
 493
 494     The type of this expression is \lam{Word -> Word -> Word}, which again
 495     matches \lam{a -> b}. The expression is the body of a lambda expression, so
 496     it does not occur at a function position of an application. Finally, the
 497     expression is not a lambda abstraction but a case expression, so all the
 498     conditions match. There are no context conditions to match, so the
 499     transformation applies.
 500
 501     By now, the placeholder \lam{E} is bound to the entire expression. The
 502     placeholder \lam{x}, which occurs in the replacement template, is not bound
 503     yet, so we need to generate a fresh binder for that. Let's use the binder
 504     \lam{a}. This results in the following replacement expression:
 505
 506     \startlambda
 507     λa.(case opcode of
 508       Low -> (+)
 509       High -> (-)) a
 510     \stoplambda
 511
 512     Continuing with this expression, we see that the transformation does not
 513     apply again (it is a lambda expression). Next we look at the body of this
 514     labmda abstraction:
 515
 516     \startlambda
 517     (case opcode of
 518       Low -> (+)
 519       High -> (-)) a
 520     \stoplambda
 521
 522     Here, the transformation does apply, binding \lam{E} to the entire
 523     expression and \lam{x} to the fresh binder \lam{b}, resulting in the
 524     replacement:
 525
 526     \startlambda
 527     λb.(case opcode of
 528       Low -> (+)
 529       High -> (-)) a b
 530     \stoplambda
 531
 532     Again, the transformation does not apply to this lambda abstraction, so we
 533     look at its body. For brevity, we'll put the case statement on one line from
 534     now on.
 535
 536     \startlambda
 537     (case opcode of Low -> (+); High -> (-)) a b
 538     \stoplambda
 539
 540     The type of this expression is \lam{Word}, so it does not match \lam{a -> b}
 541     and the transformation does not apply. Next, we have two options for the
 542     next expression to look at: The function position and argument position of
 543     the application. The expression in the argument position is \lam{b}, which
 544     has type \lam{Word}, so the transformation does not apply. The expression in
 545     the function position is:
 546
 547     \startlambda
 548     (case opcode of Low -> (+); High -> (-)) a
 549     \stoplambda
 550
 551     Obviously, the transformation does not apply here, since it occurs in
 552     function position. In the same way the transformation does not apply to both
 553     components of this expression (\lam{case opcode of Low -> (+); High -> (-)}
 554     and \lam{a}), so we'll skip to the components of the case expression: The
 555     scrutinee and both alternatives. Since the opcode is not a function, it does
 556     not apply here, and we'll leave both alternatives as an exercise to the
 557     reader. The final function, after all these transformations becomes:
 558
 559     \startlambda
 560     alu :: Bit -> Word -> Word -> Word
 561     alu = λopcode.λa.b. (case opcode of
 562       Low -> λa1.λb1 (+) a1 b1
 563       High -> λa2.λb2 (-) a2 b2) a b
 564     \stoplambda
 565
 566     In this case, the transformation does not apply anymore, though this might
 567     not always be the case (e.g., the application of a transformation on a
 568     subexpression might open up possibilities to apply the transformation
 569     further up in the expression).
 570
 571     \subsection{Transformation application}
 572       In this chapter we define a number of transformations, but how will we apply
 573       these? As stated before, our normal form is reached as soon as no
 574       transformation applies anymore. This means our application strategy is to
 575       simply apply any transformation that applies, and continuing to do that with
 576       the result of each transformation.
 577
 578       In particular, we define no particular order of transformations. Since
 579       transformation order should not influence the resulting normal form (see TODO
 580       ref), this leaves the implementation free to choose any application order that
 581       results in an efficient implementation.
 582
 583       When applying a single transformation, we try to apply it to every (sub)expression
 584       in a function, not just the top level function. This allows us to keep the
 585       transformation descriptions concise and powerful.
 586
 587     \subsection{Definitions}
 588       In the following sections, we will be using a number of functions and
 589       notations, which we will define here.
 590
 591       TODO: Define substitution
 592
 593       \subsubsection{Other concepts}
 594         A \emph{global variable} is any variable that is bound at the
 595         top level of a program, or an external module. A \emph{local variable} is any
 596         other variable (\eg, variables local to a function, which can be bound by
 597         lambda abstractions, let expressions and pattern matches of case
 598         alternatives).  Note that this is a slightly different notion of global versus
 599         local than what \small{GHC} uses internally.
 600         \defref{global variable} \defref{local variable}
 601
 602         A \emph{hardware representable} (or just \emph{representable}) type or value
 603         is (a value of) a type that we can generate a signal for in hardware. For
 604         example, a bit, a vector of bits, a 32 bit unsigned word, etc. Types that are
 605         not runtime representable notably include (but are not limited to): Types,
 606         dictionaries, functions.
 607         \defref{representable}
 608
 609         A \emph{builtin function} is a function supplied by the Cλash framework, whose
 610         implementation is not valid Cλash. The implementation is of course valid
 611         Haskell, for simulation, but it is not expressable in Cλash.
 612         \defref{builtin function} \defref{user-defined function}
 613
 614       For these functions, Cλash has a \emph{builtin hardware translation}, so calls
 615       to these functions can still be translated. These are functions like
 616       \lam{map}, \lam{hwor} and \lam{length}.
 617
 618       A \emph{user-defined} function is a function for which we do have a Cλash
 619       implementation available.
 620
 621       \subsubsection{Functions}
 622         Here, we define a number of functions that can be used below to concisely
 623         specify conditions.
 624
 625         \refdef{global variable}\emph{gvar(expr)} is true when \emph{expr} is a variable that references a
 626         global variable. It is false when it references a local variable.
 627
 628         \refdef{local variable}\emph{lvar(expr)} is the complement of \emph{gvar}; it is true when \emph{expr}
 629         references a local variable, false when it references a global variable.
 630
 631         \refdef{representable}\emph{representable(expr)} or \emph{representable(var)} is true when
 632         \emph{expr} or \emph{var} is \emph{representable}.
 633
 634     \subsection{Binder uniqueness}
 635       A common problem in transformation systems, is binder uniqueness. When not
 636       considering this problem, it is easy to create transformations that mix up
 637       bindings and cause name collisions. Take for example, the following core
 638       expression:
 639
 640       \startlambda
 641       (λa.λb.λc. a * b * c) x c
 642       \stoplambda
 643
 644       By applying β-reduction (see below) once, we can simplify this expression to:
 645
 646       \startlambda
 647       (λb.λc. x * b * c) c
 648       \stoplambda
 649
 650       Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
 651       binder. No harm done here. But note that we see multiple occurences of the
 652       \lam{c} binder. The first is a binding occurence, to which the second refers.
 653       The last, however refers to \emph{another} instance of \lam{c}, which is
 654       bound somewhere outside of this expression. Now, if we would apply beta
 655       reduction without taking heed of binder uniqueness, we would get:
 656
 657       \startlambda
 658       λc. x * c * c
 659       \stoplambda
 660
 661       This is obviously not what was supposed to happen! The root of this problem is
 662       the reuse of binders: Identical binders can be bound in different scopes, such
 663       that only the inner one is \quote{visible} in the inner expression. In the example
 664       above, the \lam{c} binder was bound outside of the expression and in the inner
 665       lambda expression. Inside that lambda expression, only the inner \lam{c} is
 666       visible.
 667
 668       There are a number of ways to solve this. \small{GHC} has isolated this
 669       problem to their binder substitution code, which performs \emph{deshadowing}
 670       during its expression traversal. This means that any binding that shadows
 671       another binding on a higher level is replaced by a new binder that does not
 672       shadow any other binding. This non-shadowing invariant is enough to prevent
 673       binder uniqueness problems in \small{GHC}.
 674
 675       In our transformation system, maintaining this non-shadowing invariant is
 676       a bit harder to do (mostly due to implementation issues, the prototype doesn't
 677       use \small{GHC}'s subsitution code). Also, we can observe the following
 678       points.
 679
 680       \startitemize
 681       \item Deshadowing does not guarantee overall uniqueness. For example, the
 682       following (slightly contrived) expression shows the identifier \lam{x} bound in
 683       two seperate places (and to different values), even though no shadowing
 684       occurs.
 685
 686       \startlambda
 687       (let x = 1 in x) + (let x = 2 in x)
 688       \stoplambda
 689
 690       \item In our normal form (and the resulting \small{VHDL}), all binders
 691       (signals) will end up in the same scope. To allow this, all binders within the
 692       same function should be unique.
 693
 694       \item When we know that all binders in an expression are unique, moving around
 695       or removing a subexpression will never cause any binder conflicts. If we have
 696       some way to generate fresh binders, introducing new subexpressions will not
 697       cause any problems either. The only way to cause conflicts is thus to
 698       duplicate an existing subexpression.
 699       \stopitemize
 700
 701       Given the above, our prototype maintains a unique binder invariant. This
 702       meanst that in any given moment during normalization, all binders \emph{within
 703       a single function} must be unique. To achieve this, we apply the following
 704       technique.
 705
 706       TODO: Define fresh binders and unique supplies
 707
 708       \startitemize
 709       \item Before starting normalization, all binders in the function are made
 710       unique. This is done by generating a fresh binder for every binder used. This
 711       also replaces binders that did not pose any conflict, but it does ensure that
 712       all binders within the function are generated by the same unique supply. See
 713       (TODO: ref fresh binder).
 714       \item Whenever a new binder must be generated, we generate a fresh binder that
 715       is guaranteed to be different from \emph{all binders generated so far}. This
 716       can thus never introduce duplication and will maintain the invariant.
 717       \item Whenever (part of) an expression is duplicated (for example when
 718       inlining), all binders in the expression are replaced with fresh binders
 719       (using the same method as at the start of normalization). These fresh binders
 720       can never introduce duplication, so this will maintain the invariant.
 721       \item Whenever we move part of an expression around within the function, there
 722       is no need to do anything special. There is obviously no way to introduce
 723       duplication by moving expressions around. Since we know that each of the
 724       binders is already unique, there is no way to introduce (incorrect) shadowing
 725       either.
 726       \stopitemize
 727
 728   \section{Transform passes}
 729     In this section we describe the actual transforms. Here we're using
 730     the core language in a notation that resembles lambda calculus.
 731
 732     Each of these transforms is meant to be applied to every (sub)expression
 733     in a program, for as long as it applies. Only when none of the
 734     transformations can be applied anymore, the program is in normal form (by
 735     definition). We hope to be able to prove that this form will obey all of the
 736     constraints defined above, but this has yet to happen (though it seems likely
 737     that it will).
 738
 739     Each of the transforms will be described informally first, explaining
 740     the need for and goal of the transform. Then, a formal definition is
 741     given, using a familiar syntax from the world of logic. Each transform
 742     is specified as a number of conditions (above the horizontal line) and a
 743     number of conclusions (below the horizontal line). The details of using
 744     this notation are still a bit fuzzy, so comments are welcom.
 745
 746     \subsection{General cleanup}
 747       These transformations are general cleanup transformations, that aim to
 748       make expressions simpler. These transformations usually clean up the
 749        mess left behind by other transformations or clean up expressions to
 750        expose new transformation opportunities for other transformations.
 751
 752        Most of these transformations are standard optimizations in other
 753        compilers as well. However, in our compiler, most of these are not just
 754        optimizations, but they are required to get our program into normal
 755        form.
 756
 757       \subsubsection{β-reduction}
 758         β-reduction is a well known transformation from lambda calculus, where it is
 759         the main reduction step. It reduces applications of labmda abstractions,
 760         removing both the lambda abstraction and the application.
 761
 762         In our transformation system, this step helps to remove unwanted lambda
 763         abstractions (basically all but the ones at the top level). Other
 764         transformations (application propagation, non-representable inlining) make
 765         sure that most lambda abstractions will eventually be reducable by
 766         β-reduction.
 767
 768         \starttrans
 769         (λx.E) M
 770         -----------------
 771         E[M/x]
 772         \stoptrans
 773
 774         % And an example
 775         \startbuffer[from]
 776         (λa. 2 * a) (2 * b)
 777         \stopbuffer
 778
 779         \startbuffer[to]
 780         2 * (2 * b)
 781         \stopbuffer
 782
 783         \transexample{β-reduction}{from}{to}
 784
 785       \subsubsection{Empty let removal}
 786         This transformation is simple: It removes recursive lets that have no bindings
 787         (which usually occurs when unused let binding removal removes the last
 788         binding from it).
 789
 790         \starttrans
 791         letrec in M
 792         --------------
 793         M
 794         \stoptrans
 795
 796         TODO: Example
 797
 798       \subsubsection{Simple let binding removal}
 799         This transformation inlines simple let bindings (\eg a = b).
 800
 801         This transformation is not needed to get into normal form, but makes the
 802         resulting \small{VHDL} a lot shorter.
 803
 804         \starttrans
 805         letrec
 806           a0 = E0
 807           \vdots
 808           ai = b
 809           \vdots
 810           an = En
 811         in
 812           M
 813         -----------------------------  \lam{b} is a variable reference
 814         letrec
 815           a0 = E0 [b/ai]
 816           \vdots
 817           ai-1 = Ei-1 [b/ai]
 818           ai+1 = Ei+1 [b/ai]
 819           \vdots
 820           an = En [b/ai]
 821         in
 822           M[b/ai]
 823         \stoptrans
 824
 825         TODO: Example
 826
 827       \subsubsection{Unused let binding removal}
 828         This transformation removes let bindings that are never used. Usually,
 829         the desugarer introduces some unused let bindings.
 830
 831         This normalization pass should really be unneeded to get into normal form
 832         (since unused bindings are not forbidden by the normal form), but in practice
 833         the desugarer or simplifier emits some unused bindings that cannot be
 834         normalized (e.g., calls to a \type{PatError} (TODO: Check this name)). Also,
 835         this transformation makes the resulting \small{VHDL} a lot shorter.
 836
 837         \starttrans
 838         letrec
 839           a0 = E0
 840           \vdots
 841           ai = Ei
 842           \vdots
 843           an = En
 844         in
 845           M                             \lam{a} does not occur free in \lam{M}
 846         ----------------------------    \forall j, 0 <= j <= n, j ≠ i (\lam{a} does not occur free in \lam{Ej})
 847         letrec
 848           a0 = E0
 849           \vdots
 850           ai-1 = Ei-1
 851           ai+1 = Ei+1
 852           \vdots
 853           an = En
 854         in
 855           M
 856         \stoptrans
 857
 858         TODO: Example
 859
 860       \subsubsection{Cast propagation / simplification}
 861         This transform pushes casts down into the expression as far as possible.
 862         Since its exact role and need is not clear yet, this transformation is
 863         not yet specified.
 864
 865         TODO: Cast propagation
 866
 867       \subsubsection{Top level binding inlining}
 868         This transform takes simple top level bindings generated by the
 869         \small{GHC} compiler. \small{GHC} sometimes generates very simple
 870         \quote{wrapper} bindings, which are bound to just a variable
 871         reference, or a partial application to constants or other variable
 872         references.
 873
 874         Note that this transformation is completely optional. It is not
 875         required to get any function into normal form, but it does help making
 876         the resulting VHDL output easier to read (since it removes a bunch of
 877         components that are really boring).
 878
 879         This transform takes any top level binding generated by the compiler,
 880         whose normalized form contains only a single let binding.
 881
 882         \starttrans
 883         x = λa0 ... λan.let y = E in y
 884         ~
 885         x
 886         --------------------------------------         \lam{x} is generated by the compiler
 887         λa0 ... λan.let y = E in y
 888         \stoptrans
 889
 890         \startbuffer[from]
 891         (+) :: Word -> Word -> Word
 892         (+) = GHC.Num.(+) @Word $dNum
 893         ~
 894         (+) a b
 895         \stopbuffer
 896         \startbuffer[to]
 897         GHC.Num.(+) @ Alu.Word $dNum a b
 898         \stopbuffer
 899
 900         \transexample{Top level binding inlining}{from}{to}
 901
 902         Without this transformation, the (+) function would generate an
 903         architecture which would just add its inputs. This generates a lot of
 904         overhead in the VHDL, which is particularly annoying when browsing the
 905         generated RTL schematic (especially since + is not allowed in VHDL
 906         architecture names\footnote{Technically, it is allowed to use
 907         non-alphanumerics when using extended identifiers, but it seems that
 908         none of the tooling likes extended identifiers in filenames, so it
 909         effectively doesn't work}, so the entity would be called
 910         \quote{w7aA7f} or something similarly unreadable and autogenerated).
 911
 912     \subsection{Program structure}
 913       These transformations are aimed at normalizing the overall structure
 914       into the intended form. This means ensuring there is a lambda abstraction
 915       at the top for every argument (input port), putting all of the other
 916       value definitions in let bindings and making the final return value a
 917       simple variable reference.
 918
 919       \subsubsection{η-abstraction}
 920         This transformation makes sure that all arguments of a function-typed
 921         expression are named, by introducing lambda expressions. When combined with
 922         β-reduction and non-representable binding inlining, all function-typed
 923         expressions should be lambda abstractions or global identifiers.
 924
 925         \starttrans
 926         E                 \lam{E :: a -> b}
 927         --------------    \lam{E} is not the first argument of an application.
 928         λx.E x            \lam{E} is not a lambda abstraction.
 929                           \lam{x} is a variable that does not occur free in \lam{E}.
 930         \stoptrans
 931
 932         \startbuffer[from]
 933         foo = λa.case a of
 934           True -> λb.mul b b
 935           False -> id
 936         \stopbuffer
 937
 938         \startbuffer[to]
 939         foo = λa.λx.(case a of
 940             True -> λb.mul b b
 941             False -> λy.id y) x
 942         \stopbuffer
 943
 944         \transexample{η-abstraction}{from}{to}
 945
 946       \subsubsection{Application propagation}
 947         This transformation is meant to propagate application expressions downwards
 948         into expressions as far as possible. This allows partial applications inside
 949         expressions to become fully applied and exposes new transformation
 950         opportunities for other transformations (like β-reduction and
 951         specialization).
 952
 953         \starttrans
 954         (letrec binds in E) M
 955         ------------------------
 956         letrec binds in E M
 957         \stoptrans
 958
 959         % And an example
 960         \startbuffer[from]
 961         ( letrec
 962             val = 1
 963           in
 964             add val
 965         ) 3
 966         \stopbuffer
 967
 968         \startbuffer[to]
 969         letrec
 970           val = 1
 971         in
 972           add val 3
 973         \stopbuffer
 974
 975         \transexample{Application propagation for a let expression}{from}{to}
 976
 977         \starttrans
 978         (case x of
 979           p1 -> E1
 980           \vdots
 981           pn -> En) M
 982         -----------------
 983         case x of
 984           p1 -> E1 M
 985           \vdots
 986           pn -> En M
 987         \stoptrans
 988
 989         % And an example
 990         \startbuffer[from]
 991         ( case x of
 992             True -> id
 993             False -> neg
 994         ) 1
 995         \stopbuffer
 996
 997         \startbuffer[to]
 998         case x of
 999           True -> id 1
1000           False -> neg 1
1001         \stopbuffer
1002
1003         \transexample{Application propagation for a case expression}{from}{to}
1004
1005       \subsubsection{Let recursification}
1006         This transformation makes all non-recursive lets recursive. In the
1007         end, we want a single recursive let in our normalized program, so all
1008         non-recursive lets can be converted. This also makes other
1009         transformations simpler: They can simply assume all lets are
1010         recursive.
1011
1012         \starttrans
1013         let
1014           a = E
1015         in
1016           M
1017         ------------------------------------------
1018         letrec
1019           a = E
1020         in
1021           M
1022         \stoptrans
1023
1024       \subsubsection{Let flattening}
1025         This transformation puts nested lets in the same scope, by lifting the
1026         binding(s) of the inner let into a new let around the outer let. Eventually,
1027         this will cause all let bindings to appear in the same scope (they will all be
1028         in scope for the function return value).
1029
1030         \starttrans
1031         letrec
1032           \vdots
1033           x = (letrec bindings in M)
1034           \vdots
1035         in
1036           N
1037         ------------------------------------------
1038         letrec
1039           \vdots
1040           bindings
1041           x = M
1042           \vdots
1043         in
1044           N
1045         \stoptrans
1046
1047         \startbuffer[from]
1048         letrec
1049           a = letrec
1050             x = 1
1051             y = 2
1052           in
1053             x + y
1054         in
1055           a
1056         \stopbuffer
1057         \startbuffer[to]
1058         letrec
1059           x = 1
1060           y = 2
1061           a = x + y
1062         in
1063           a
1064         \stopbuffer
1065
1066         \transexample{Let flattening}{from}{to}
1067
1068       \subsubsection{Return value simplification}
1069         This transformation ensures that the return value of a function is always a
1070         simple local variable reference.
1071
1072         Currently implemented using lambda simplification, let simplification, and
1073         top simplification. Should change into something like the following, which
1074         works only on the result of a function instead of any subexpression. This is
1075         achieved by the contexts, like \lam{x = E}, though this is strictly not
1076         correct (you could read this as "if there is any function \lam{x} that binds
1077         \lam{E}, any \lam{E} can be transformed, while we only mean the \lam{E} that
1078         is bound by \lam{x}. This might need some extra notes or something).
1079
1080         Note that the return value is not simplified if its not representable.
1081         Otherwise, this would cause a direct loop with the inlining of
1082         unrepresentable bindings, of course. If the return value is not
1083         representable because it has a function type, η-abstraction should
1084         make sure that this transformation will eventually apply. If the value
1085         is not representable for other reasons, the function result itself is
1086         not representable, meaning this function is not representable anyway!
1087
1088         \starttrans
1089         x = E                            \lam{E} is representable
1090         ~                                \lam{E} is not a lambda abstraction
1091         E                                \lam{E} is not a let expression
1092         ---------------------------      \lam{E} is not a local variable reference
1093         letrec x = E in x
1094         \stoptrans
1095
1096         \starttrans
1097         x = λv0 ... λvn.E
1098         ~                                \lam{E} is representable
1099         E                                \lam{E} is not a let expression
1100         ---------------------------      \lam{E} is not a local variable reference
1101         letrec x = E in x
1102         \stoptrans
1103
1104         \starttrans
1105         x = λv0 ... λvn.let ... in E
1106         ~                                \lam{E} is representable
1107         E                                \lam{E} is not a local variable reference
1108         ---------------------------
1109         letrec x = E in x
1110         \stoptrans
1111
1112         \startbuffer[from]
1113         x = add 1 2
1114         \stopbuffer
1115
1116         \startbuffer[to]
1117         x = letrec x = add 1 2 in x
1118         \stopbuffer
1119
1120         \transexample{Return value simplification}{from}{to}
1121
1122     \subsection{Argument simplification}
1123       The transforms in this section deal with simplifying application
1124       arguments into normal form. The goal here is to:
1125
1126       \startitemize
1127        \item Make all arguments of user-defined functions (\eg, of which
1128        we have a function body) simple variable references of a runtime
1129        representable type. This is needed, since these applications will be turned
1130        into component instantiations.
1131        \item Make all arguments of builtin functions one of:
1132          \startitemize
1133           \item A type argument.
1134           \item A dictionary argument.
1135           \item A type level expression.
1136           \item A variable reference of a runtime representable type.
1137           \item A variable reference or partial application of a function type.
1138          \stopitemize
1139       \stopitemize
1140
1141       When looking at the arguments of a user-defined function, we can
1142       divide them into two categories:
1143       \startitemize
1144         \item Arguments of a runtime representable type (\eg bits or vectors).
1145
1146               These arguments can be preserved in the program, since they can
1147               be translated to input ports later on.  However, since we can
1148               only connect signals to input ports, these arguments must be
1149               reduced to simple variables (for which signals will be
1150               produced). This is taken care of by the argument extraction
1151               transform.
1152         \item Non-runtime representable typed arguments.
1153
1154               These arguments cannot be preserved in the program, since we
1155               cannot represent them as input or output ports in the resulting
1156               \small{VHDL}. To remove them, we create a specialized version of the
1157               called function with these arguments filled in. This is done by
1158               the argument propagation transform.
1159
1160               Typically, these arguments are type and dictionary arguments that are
1161               used to make functions polymorphic. By propagating these arguments, we
1162               are essentially doing the same which GHC does when it specializes
1163               functions: Creating multiple variants of the same function, one for
1164               each type for which it is used. Other common non-representable
1165               arguments are functions, e.g. when calling a higher order function
1166               with another function or a lambda abstraction as an argument.
1167
1168               The reason for doing this is similar to the reasoning provided for
1169               the inlining of non-representable let bindings above. In fact, this
1170               argument propagation could be viewed as a form of cross-function
1171               inlining.
1172       \stopitemize
1173
1174       TODO: Check the following itemization.
1175
1176       When looking at the arguments of a builtin function, we can divide them
1177       into categories:
1178
1179       \startitemize
1180         \item Arguments of a runtime representable type.
1181
1182               As we have seen with user-defined functions, these arguments can
1183               always be reduced to a simple variable reference, by the
1184               argument extraction transform. Performing this transform for
1185               builtin functions as well, means that the translation of builtin
1186               functions can be limited to signal references, instead of
1187               needing to support all possible expressions.
1188
1189         \item Arguments of a function type.
1190
1191               These arguments are functions passed to higher order builtins,
1192               like \lam{map} and \lam{foldl}. Since implementing these
1193               functions for arbitrary function-typed expressions (\eg, lambda
1194               expressions) is rather comlex, we reduce these arguments to
1195               (partial applications of) global functions.
1196
1197               We can still support arbitrary expressions from the user code,
1198               by creating a new global function containing that expression.
1199               This way, we can simply replace the argument with a reference to
1200               that new function. However, since the expression can contain any
1201               number of free variables we also have to include partial
1202               applications in our normal form.
1203
1204               This category of arguments is handled by the function extraction
1205               transform.
1206         \item Other unrepresentable arguments.
1207
1208               These arguments can take a few different forms:
1209               \startdesc{Type arguments}
1210                 In the core language, type arguments can only take a single
1211                 form: A type wrapped in the Type constructor. Also, there is
1212                 nothing that can be done with type expressions, except for
1213                 applying functions to them, so we can simply leave type
1214                 arguments as they are.
1215               \stopdesc
1216               \startdesc{Dictionary arguments}
1217                 In the core language, dictionary arguments are used to find
1218                 operations operating on one of the type arguments (mostly for
1219                 finding class methods). Since we will not actually evaluatie
1220                 the function body for builtin functions and can generate
1221                 code for builtin functions by just looking at the type
1222                 arguments, these arguments can be ignored and left as they
1223                 are.
1224               \stopdesc
1225               \startdesc{Type level arguments}
1226                 Sometimes, we want to pass a value to a builtin function, but
1227                 we need to know the value at compile time. Additionally, the
1228                 value has an impact on the type of the function. This is
1229                 encoded using type-level values, where the actual value of the
1230                 argument is not important, but the type encodes some integer,
1231                 for example. Since the value is not important, the actual form
1232                 of the expression does not matter either and we can leave
1233                 these arguments as they are.
1234               \stopdesc
1235               \startdesc{Other arguments}
1236                 Technically, there is still a wide array of arguments that can
1237                 be passed, but does not fall into any of the above categories.
1238                 However, none of the supported builtin functions requires such
1239                 an argument. This leaves use with passing unsupported types to
1240                 a function, such as calling \lam{head} on a list of functions.
1241
1242                 In these cases, it would be impossible to generate hardware
1243                 for such a function call anyway, so we can ignore these
1244                 arguments.
1245
1246                 The only way to generate hardware for builtin functions with
1247                 arguments like these, is to expand the function call into an
1248                 equivalent core expression (\eg, expand map into a series of
1249                 function applications). But for now, we choose to simply not
1250                 support expressions like these.
1251               \stopdesc
1252
1253               From the above, we can conclude that we can simply ignore these
1254               other unrepresentable arguments and focus on the first two
1255               categories instead.
1256       \stopitemize
1257
1258       \subsubsection{Argument simplification}
1259         This transform deals with arguments to functions that
1260         are of a runtime representable type. It ensures that they will all become
1261         references to global variables, or local signals in the resulting \small{VHDL}.
1262
1263         TODO: It seems we can map an expression to a port, not only a signal.
1264         Perhaps this makes this transformation not needed?
1265         TODO: Say something about dataconstructors (without arguments, like True
1266         or False), which are variable references of a runtime representable
1267         type, but do not result in a signal.
1268
1269         To reduce a complex expression to a simple variable reference, we create
1270         a new let expression around the application, which binds the complex
1271         expression to a new variable. The original function is then applied to
1272         this variable.
1273
1274         \starttrans
1275         M N
1276         --------------------    \lam{N} is of a representable type
1277         letrec x = N in M x     \lam{N} is not a local variable reference
1278         \stoptrans
1279
1280         \startbuffer[from]
1281         add (add a 1) 1
1282         \stopbuffer
1283
1284         \startbuffer[to]
1285         letrec x = add a 1 in add x 1
1286         \stopbuffer
1287
1288         \transexample{Argument extraction}{from}{to}
1289
1290       \subsubsection{Function extraction}
1291         This transform deals with function-typed arguments to builtin functions.
1292         Since these arguments cannot be propagated, we choose to extract them
1293         into a new global function instead.
1294
1295         Any free variables occuring in the extracted arguments will become
1296         parameters to the new global function. The original argument is replaced
1297         with a reference to the new function, applied to any free variables from
1298         the original argument.
1299
1300         This transformation is useful when applying higher order builtin functions
1301         like \hs{map} to a lambda abstraction, for example. In this case, the code
1302         that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
1303         partial applications, not any other expression (such as lambda abstractions or
1304         even more complicated expressions).
1305
1306         \starttrans
1307         M N                     \lam{M} is a (partial aplication of a) builtin function.
1308         ---------------------   \lam{f0 ... fn} = free local variables of \lam{N}
1309         M (x f0 ... fn)         \lam{N :: a -> b}
1310         ~                       \lam{N} is not a (partial application of) a top level function
1311         x = λf0 ... λfn.N
1312         \stoptrans
1313
1314         \startbuffer[from]
1315         map (λa . add a b) xs
1316
1317         map (add b) ys
1318         \stopbuffer
1319
1320         \startbuffer[to]
1321         map (x0 b) xs
1322
1323         map x1 ys
1324         ~
1325         x0 = λb.λa.add a b
1326         x1 = λb.add b
1327         \stopbuffer
1328
1329         \transexample{Function extraction}{from}{to}
1330
1331         Note that \lam{x0} and {x1} will still need normalization after this.
1332
1333       \subsubsection{Argument propagation}
1334         This transform deals with arguments to user-defined functions that are
1335         not representable at runtime. This means these arguments cannot be
1336         preserved in the final form and most be {\em propagated}.
1337
1338         Propagation means to create a specialized version of the called
1339         function, with the propagated argument already filled in. As a simple
1340         example, in the following program:
1341
1342         \startlambda
1343         f = λa.λb.a + b
1344         inc = λa.f a 1
1345         \stoplambda
1346
1347         We could {\em propagate} the constant argument 1, with the following
1348         result:
1349
1350         \startlambda
1351         f' = λa.a + 1
1352         inc = λa.f' a
1353         \stoplambda
1354
1355         Special care must be taken when the to-be-propagated expression has any
1356         free variables. If this is the case, the original argument should not be
1357         removed alltogether, but replaced by all the free variables of the
1358         expression. In this way, the original expression can still be evaluated
1359         inside the new function. Also, this brings us closer to our goal: All
1360         these free variables will be simple variable references.
1361
1362         To prevent us from propagating the same argument over and over, a simple
1363         local variable reference is not propagated (since is has exactly one
1364         free variable, itself, we would only replace that argument with itself).
1365
1366         This shows that any free local variables that are not runtime representable
1367         cannot be brought into normal form by this transform. We rely on an
1368         inlining transformation to replace such a variable with an expression we
1369         can propagate again.
1370
1371         \starttrans
1372         x = E
1373         ~
1374         x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
1375         ---------------------------------------------    \lam{Yi} is not a local variable reference
1376         x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} = free local vars of \lam{Yi}
1377         ~
1378         x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn .
1379               E y0 ... yi-1 Yi yi+1 ... yn
1380
1381         \stoptrans
1382
1383         TODO: Example
1384
1385     \subsection{Case simplification}
1386       \subsubsection{Scrutinee simplification}
1387         This transform ensures that the scrutinee of a case expression is always
1388         a simple variable reference.
1389
1390         \starttrans
1391         case E of
1392           alts
1393         -----------------        \lam{E} is not a local variable reference
1394         letrec x = E in
1395           case E of
1396             alts
1397         \stoptrans
1398
1399         \startbuffer[from]
1400         case (foo a) of
1401           True -> a
1402           False -> b
1403         \stopbuffer
1404
1405         \startbuffer[to]
1406         letrec x = foo a in
1407           case x of
1408             True -> a
1409             False -> b
1410         \stopbuffer
1411
1412         \transexample{Let flattening}{from}{to}
1413
1414
1415       \subsubsection{Case simplification}
1416         This transformation ensures that all case expressions become normal form. This
1417         means they will become one of:
1418         \startitemize
1419         \item An extractor case with a single alternative that picks a single field
1420         from a datatype, \eg \lam{case x of (a, b) -> a}.
1421         \item A selector case with multiple alternatives and only wild binders, that
1422         makes a choice between expressions based on the constructor of another
1423         expression, \eg \lam{case x of Low -> a; High -> b}.
1424         \stopitemize
1425
1426         \starttrans
1427         case E of
1428           C0 v0,0 ... v0,m -> E0
1429           \vdots
1430           Cn vn,0 ... vn,m -> En
1431         --------------------------------------------------- \forall i \forall j, 0 <= i <= n, 0 <= i < m (\lam{wi,j} is a wild (unused) binder)
1432         letrec
1433           v0,0 = case x of C0 v0,0 .. v0,m -> v0,0
1434           \vdots
1435           v0,m = case x of C0 v0,0 .. v0,m -> v0,m
1436           x0 = E0
1437           \dots
1438           vn,m = case x of Cn vn,0 .. vn,m -> vn,m
1439           xn = En
1440         in
1441           case E of
1442             C0 w0,0 ... w0,m -> x0
1443             \vdots
1444             Cn wn,0 ... wn,m -> xn
1445         \stoptrans
1446
1447         TODO: This transformation specified like this is complicated and misses
1448         conditions to prevent looping with itself. Perhaps we should split it here for
1449         discussion?
1450
1451         \startbuffer[from]
1452         case a of
1453           True -> add b 1
1454           False -> add b 2
1455         \stopbuffer
1456
1457         \startbuffer[to]
1458         letnonrec
1459           x0 = add b 1
1460           x1 = add b 2
1461         in
1462           case a of
1463             True -> x0
1464             False -> x1
1465         \stopbuffer
1466
1467         \transexample{Selector case simplification}{from}{to}
1468
1469         \startbuffer[from]
1470         case a of
1471           (,) b c -> add b c
1472         \stopbuffer
1473         \startbuffer[to]
1474         letrec
1475           b = case a of (,) b c -> b
1476           c = case a of (,) b c -> c
1477           x0 = add b c
1478         in
1479           case a of
1480             (,) w0 w1 -> x0
1481         \stopbuffer
1482
1483         \transexample{Extractor case simplification}{from}{to}
1484
1485       \subsubsection{Case removal}
1486         This transform removes any case statements with a single alternative and
1487         only wild binders.
1488
1489         These "useless" case statements are usually leftovers from case simplification
1490         on extractor case (see the previous example).
1491
1492         \starttrans
1493         case x of
1494           C v0 ... vm -> E
1495         ----------------------     \lam{\forall i, 0 <= i <= m} (\lam{vi} does not occur free in E)
1496         E
1497         \stoptrans
1498
1499         \startbuffer[from]
1500         case a of
1501           (,) w0 w1 -> x0
1502         \stopbuffer
1503
1504         \startbuffer[to]
1505         x0
1506         \stopbuffer
1507
1508         \transexample{Case removal}{from}{to}
1509
1510   \subsection{Removing polymorphism}
1511     Reference type-specialization (== argument propagation)
1512
1513     Reference polymporphic binding inlining (== non-representable binding
1514     inlining).
1515
1516   \subsection{Defunctionalization}
1517     These transformations remove most higher order expressions from our
1518     program, making it completely first-order (the only exception here is for
1519     arguments to builtin functions, since we can't specialize builtin
1520     function. TODO: Talk more about this somewhere).
1521
1522     Reference higher-order-specialization (== argument propagation)
1523
1524       \subsubsection{Non-representable binding inlining}
1525         This transform inlines let bindings that have a non-representable type. Since
1526         we can never generate a signal assignment for these bindings (we cannot
1527         declare a signal assignment with a non-representable type, for obvious
1528         reasons), we have no choice but to inline the binding to remove it.
1529
1530         If the binding is non-representable because it is a lambda abstraction, it is
1531         likely that it will inlined into an application and β-reduction will remove
1532         the lambda abstraction and turn it into a representable expression at the
1533         inline site. The same holds for partial applications, which can be turned into
1534         full applications by inlining.
1535
1536         Other cases of non-representable bindings we see in practice are primitive
1537         Haskell types. In most cases, these will not result in a valid normalized
1538         output, but then the input would have been invalid to start with. There is one
1539         exception to this: When a builtin function is applied to a non-representable
1540         expression, things might work out in some cases. For example, when you write a
1541         literal \hs{SizedInt} in Haskell, like \hs{1 :: SizedInt D8}, this results in
1542         the following core: \lam{fromInteger (smallInteger 10)}, where for example
1543         \lam{10 :: GHC.Prim.Int\#} and \lam{smallInteger 10 :: Integer} have
1544         non-representable types. TODO: This/these paragraph(s) should probably become a
1545         separate discussion somewhere else.
1546
1547
1548         \starttrans
1549         letrec
1550           a0 = E0
1551           \vdots
1552           ai = Ei
1553           \vdots
1554           an = En
1555         in
1556           M
1557         --------------------------    \lam{Ei} has a non-representable type.
1558         letrec
1559           a0 = E0 [Ei/ai]
1560           \vdots
1561           ai-1 = Ei-1 [Ei/ai]
1562           ai+1 = Ei+1 [Ei/ai]
1563           \vdots
1564           an = En [Ei/ai]
1565         in
1566           M[Ei/ai]
1567         \stoptrans
1568
1569         \startbuffer[from]
1570         letrec
1571           a = smallInteger 10
1572           inc = λb -> add b 1
1573           inc' = add 1
1574           x = fromInteger a
1575         in
1576           inc (inc' x)
1577         \stopbuffer
1578
1579         \startbuffer[to]
1580         letrec
1581           x = fromInteger (smallInteger 10)
1582         in
1583           (λb -> add b 1) (add 1 x)
1584         \stopbuffer
1585
1586         \transexample{None representable binding inlining}{from}{to}
1587
1588
1589   \section{Provable properties}
1590     When looking at the system of transformations outlined above, there are a
1591     number of questions that we can ask ourselves. The main question is of course:
1592     \quote{Does our system work as intended?}. We can split this question into a
1593     number of subquestions:
1594
1595     \startitemize[KR]
1596     \item[q:termination] Does our system \emph{terminate}? Since our system will
1597     keep running as long as transformations apply, there is an obvious risk that
1598     it will keep running indefinitely. One transformation produces a result that
1599     is transformed back to the original by another transformation, for example.
1600     \item[q:soundness] Is our system \emph{sound}? Since our transformations
1601     continuously modify the expression, there is an obvious risk that the final
1602     normal form will not be equivalent to the original program: Its meaning could
1603     have changed.
1604     \item[q:completeness] Is our system \emph{complete}? Since we have a complex
1605     system of transformations, there is an obvious risk that some expressions will
1606     not end up in our intended normal form, because we forgot some transformation.
1607     In other words: Does our transformation system result in our intended normal
1608     form for all possible inputs?
1609     \item[q:determinism] Is our system \emph{deterministic}? Since we have defined
1610     no particular order in which the transformation should be applied, there is an
1611     obvious risk that different transformation orderings will result in
1612     \emph{different} normal forms. They might still both be intended normal forms
1613     (if our system is \emph{complete}) and describe correct hardware (if our
1614     system is \emph{sound}), so this property is less important than the previous
1615     three: The translator would still function properly without it.
1616     \stopitemize
1617
1618     \subsection{Graph representation}
1619       Before looking into how to prove these properties, we'll look at our
1620       transformation system from a graph perspective. The nodes of the graph are
1621       all possible Core expressions. The (directed) edges of the graph are
1622       transformations. When a transformation α applies to an expression \lam{A} to
1623       produce an expression \lam{B}, we add an edge from the node for \lam{A} to the
1624       node for \lam{B}, labeled α.
1625
1626       \startuseMPgraphic{TransformGraph}
1627         save a, b, c, d;
1628
1629         % Nodes
1630         newCircle.a(btex \lam{(λx.λy. (+) x y) 1} etex);
1631         newCircle.b(btex \lam{λy. (+) 1 y} etex);
1632         newCircle.c(btex \lam{(λx.(+) x) 1} etex);
1633         newCircle.d(btex \lam{(+) 1} etex);
1634
1635         b.c = origin;
1636         c.c = b.c + (4cm, 0cm);
1637         a.c = midpoint(b.c, c.c) + (0cm, 4cm);
1638         d.c = midpoint(b.c, c.c) - (0cm, 3cm);
1639
1640         % β-conversion between a and b
1641         ncarc.a(a)(b) "name(bred)";
1642         ObjLabel.a(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
1643         ncarc.b(b)(a) "name(bexp)", "linestyle(dashed withdots)";
1644         ObjLabel.b(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
1645
1646         % η-conversion between a and c
1647         ncarc.a(a)(c) "name(ered)";
1648         ObjLabel.a(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
1649         ncarc.c(c)(a) "name(eexp)", "linestyle(dashed withdots)";
1650         ObjLabel.c(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
1651
1652         % η-conversion between b and d
1653         ncarc.b(b)(d) "name(ered)";
1654         ObjLabel.b(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
1655         ncarc.d(d)(b) "name(eexp)", "linestyle(dashed withdots)";
1656         ObjLabel.d(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
1657
1658         % β-conversion between c and d
1659         ncarc.c(c)(d) "name(bred)";
1660         ObjLabel.c(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
1661         ncarc.d(d)(c) "name(bexp)", "linestyle(dashed withdots)";
1662         ObjLabel.d(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
1663
1664         % Draw objects and lines
1665         drawObj(a, b, c, d);
1666       \stopuseMPgraphic
1667
1668       \placeexample[right][ex:TransformGraph]{Partial graph of a labmda calculus
1669       system with β and η reduction (solid lines) and expansion (dotted lines).}
1670           \boxedgraphic{TransformGraph}
1671
1672       Of course our graph is unbounded, since we can construct an infinite amount of
1673       Core expressions. Also, there might potentially be multiple edges between two
1674       given nodes (with different labels), though seems unlikely to actually happen
1675       in our system.
1676
1677       See \in{example}[ex:TransformGraph] for the graph representation of a very
1678       simple lambda calculus that contains just the expressions \lam{(λx.λy. (+) x
1679       y) 1}, \lam{λy. (+) 1 y}, \lam{(λx.(+) x) 1} and \lam{(+) 1}. The
1680       transformation system consists of β-reduction and η-reduction (solid edges) or
1681       β-reduction and η-reduction (dotted edges).
1682
1683       TODO: Define β-reduction and η-reduction?
1684
1685       Note that the normal form of such a system consists of the set of nodes
1686       (expressions) without outgoing edges, since those are the expression to which
1687       no transformation applies anymore. We call this set of nodes the \emph{normal
1688       set}.
1689
1690       From such a graph, we can derive some properties easily:
1691       \startitemize[KR]
1692         \item A system will \emph{terminate} if there is no path of infinite length
1693         in the graph (this includes cycles).
1694         \item Soundness is not easily represented in the graph.
1695         \item A system is \emph{complete} if all of the nodes in the normal set have
1696         the intended normal form. The inverse (that all of the nodes outside of
1697         the normal set are \emph{not} in the intended normal form) is not
1698         strictly required.
1699         \item A system is deterministic if all paths from a node, which end in a node
1700         in the normal set, end at the same node.
1701       \stopitemize
1702
1703       When looking at the \in{example}[ex:TransformGraph], we see that the system
1704       terminates for both the reduction and expansion systems (but note that, for
1705       expansion, this is only true because we've limited the possible expressions!
1706       In comlete lambda calculus, there would be a path from \lam{(λx.λy. (+) x y)
1707       1} to \lam{(λx.λy.(λz.(+) z) x y) 1} to \lam{(λx.λy.(λz.(λq.(+) q) z) x y) 1}
1708       etc.)
1709
1710       If we would consider the system with both expansion and reduction, there would
1711       no longer be termination, since there would be cycles all over the place.
1712
1713       The reduction and expansion systems have a normal set of containing just
1714       \lam{(+) 1} or \lam{(λx.λy. (+) x y) 1} respectively. Since all paths in
1715       either system end up in these normal forms, both systems are \emph{complete}.
1716       Also, since there is only one normal form, it must obviously be
1717       \emph{deterministic} as well.
1718
1719     \subsection{Termination}
1720       Approach: Counting.
1721
1722       Church-Rosser?
1723
1724     \subsection{Soundness}
1725       Needs formal definition of semantics.
1726       Prove for each transformation seperately, implies soundness of the system.
1727
1728     \subsection{Completeness}
1729       Show that any transformation applies to every Core expression that is not
1730       in normal form. To prove: no transformation applies => in intended form.
1731       Show the reverse: Not in intended form => transformation applies.
1732
1733     \subsection{Determinism}
1734       How to prove this?