Normalization.tex

   1 \chapter[chap:normalization]{Normalization}
   2   % A helper to print a single example in the half the page width. The example
   3   % text should be in a buffer whose name is given in an argument.
   4   %
   5   % The align=right option really does left-alignment, but without the program
   6   % will end up on a single line. The strut=no option prevents a bunch of empty
   7   % space at the start of the frame.
   8   \define[1]\example{
   9     \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
  10       \setuptyping[option=LAM,style=sans,before=,after=,strip=auto]
  11       \typebuffer[#1]
  12       \setuptyping[option=none,style=\tttf,strip=auto]
  13     }
  14   }
  15
  16   \define[3]\transexample{
  17     \placeexample[here]{#1}
  18     \startcombination[2*1]
  19       {\example{#2}}{Original program}
  20       {\example{#3}}{Transformed program}
  21     \stopcombination
  22   }
  23
  24   The first step in the core to \small{VHDL} translation process, is normalization. We
  25   aim to bring the core description into a simpler form, which we can
  26   subsequently translate into \small{VHDL} easily. This normal form is needed because
  27   the full core language is more expressive than \small{VHDL} in some areas and because
  28   core can describe expressions that do not have a direct hardware
  29   interpretation.
  30
  31   \todo{Describe core properties not supported in \VHDL, and describe how the
  32   \VHDL we want to generate should look like.}
  33
  34   \section{Normal form}
  35     The transformations described here have a well-defined goal: To bring the
  36     program in a well-defined form that is directly translatable to hardware,
  37     while fully preserving the semantics of the program. We refer to this form as
  38     the \emph{normal form} of the program. The formal definition of this normal
  39     form is quite simple:
  40
  41     \placedefinition{}{A program is in \emph{normal form} if none of the
  42     transformations from this chapter apply.}
  43
  44     Of course, this is an \quote{easy} definition of the normal form, since our
  45     program will end up in normal form automatically. The more interesting part is
  46     to see if this normal form actually has the properties we would like it to
  47     have.
  48
  49     But, before getting into more definitions and details about this normal form,
  50     let's try to get a feeling for it first. The easiest way to do this is by
  51     describing the things we want to not have in a normal form.
  52
  53     \startitemize
  54       \item Any \emph{polymorphism} must be removed. When laying down hardware, we
  55       can't generate any signals that can have multiple types. All types must be
  56       completely known to generate hardware.
  57
  58       \item Any \emph{higher order} constructions must be removed. We can't
  59       generate a hardware signal that contains a function, so all values,
  60       arguments and returns values used must be first order.
  61
  62       \item Any complex \emph{nested scopes} must be removed. In the \small{VHDL}
  63       description, every signal is in a single scope. Also, full expressions are
  64       not supported everywhere (in particular port maps can only map signal names,
  65       not expressions). To make the \small{VHDL} generation easy, all values must be bound
  66       on the \quote{top level}.
  67     \stopitemize
  68
  69     \todo{Intermezzo: functions vs plain values}
  70
  71     A very simple example of a program in normal form is given in
  72     \in{example}[ex:MulSum]. As you can see, all arguments to the function (which
  73     will become input ports in the final hardware) are at the top. This means that
  74     the body of the final lambda abstraction is never a function, but always a
  75     plain value.
  76
  77     After the lambda abstractions, we see a single let expression, that binds two
  78     variables (\lam{mul} and \lam{sum}). These variables will be signals in the
  79     final hardware, bound to the output port of the \lam{*} and \lam{+}
  80     components.
  81
  82     The final line (the \quote{return value} of the function) selects the
  83     \lam{sum} signal to be the output port of the function. This \quote{return
  84     value} can always only be a variable reference, never a more complex
  85     expression.
  86
  87     \startbuffer[MulSum]
  88     alu :: Bit -> Word -> Word -> Word
  89     alu = λa.λb.λc.
  90         let
  91           mul = (*) a b
  92           sum = (+) mul c
  93         in
  94           sum
  95     \stopbuffer
  96
  97     \startuseMPgraphic{MulSum}
  98       save a, b, c, mul, add, sum;
  99
 100       % I/O ports
 101       newCircle.a(btex $a$ etex) "framed(false)";
 102       newCircle.b(btex $b$ etex) "framed(false)";
 103       newCircle.c(btex $c$ etex) "framed(false)";
 104       newCircle.sum(btex $res$ etex) "framed(false)";
 105
 106       % Components
 107       newCircle.mul(btex - etex);
 108       newCircle.add(btex + etex);
 109
 110       a.c      - b.c   = (0cm, 2cm);
 111       b.c      - c.c   = (0cm, 2cm);
 112       add.c            = c.c + (2cm, 0cm);
 113       mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
 114       sum.c            = add.c + (2cm, 0cm);
 115       c.c              = origin;
 116
 117       % Draw objects and lines
 118       drawObj(a, b, c, mul, add, sum);
 119
 120       ncarc(a)(mul) "arcangle(15)";
 121       ncarc(b)(mul) "arcangle(-15)";
 122       ncline(c)(add);
 123       ncline(mul)(add);
 124       ncline(add)(sum);
 125     \stopuseMPgraphic
 126
 127     \placeexample[here][ex:MulSum]{Simple architecture consisting of an adder and a
 128     subtractor.}
 129       \startcombination[2*1]
 130         {\typebufferlam{MulSum}}{Core description in normal form.}
 131         {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
 132       \stopcombination
 133
 134     The previous example described composing an architecture by calling other
 135     functions (operators), resulting in a simple architecture with component and
 136     connection. There is of course also some mechanism for choice in the normal
 137     form. In a normal Core program, the \emph{case} expression can be used in a
 138     few different ways to describe choice. In normal form, this is limited to a
 139     very specific form.
 140
 141     \in{Example}[ex:AddSubAlu] shows an example describing a
 142     simple \small{ALU}, which chooses between two operations based on an opcode
 143     bit. The main structure is the same as in \in{example}[ex:MulSum], but this
 144     time the \lam{res} variable is bound to a case expression. This case
 145     expression scrutinizes the variable \lam{opcode} (and scrutinizing more
 146     complex expressions is not supported). The case expression can select a
 147     different variable based on the constructor of \lam{opcode}.
 148
 149     \startbuffer[AddSubAlu]
 150     alu :: Bit -> Word -> Word -> Word
 151     alu = λopcode.λa.λb.
 152         let
 153           res1 = (+) a b
 154           res2 = (-) a b
 155           res = case opcode of
 156             Low -> res1
 157             High -> res2
 158         in
 159           res
 160     \stopbuffer
 161
 162     \startuseMPgraphic{AddSubAlu}
 163       save opcode, a, b, add, sub, mux, res;
 164
 165       % I/O ports
 166       newCircle.opcode(btex $opcode$ etex) "framed(false)";
 167       newCircle.a(btex $a$ etex) "framed(false)";
 168       newCircle.b(btex $b$ etex) "framed(false)";
 169       newCircle.res(btex $res$ etex) "framed(false)";
 170       % Components
 171       newCircle.add(btex + etex);
 172       newCircle.sub(btex - etex);
 173       newMux.mux;
 174
 175       opcode.c - a.c   = (0cm, 2cm);
 176       add.c    - a.c   = (4cm, 0cm);
 177       sub.c    - b.c   = (4cm, 0cm);
 178       a.c      - b.c   = (0cm, 3cm);
 179       mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
 180       res.c    - mux.c = (1.5cm, 0cm);
 181       b.c              = origin;
 182
 183       % Draw objects and lines
 184       drawObj(opcode, a, b, res, add, sub, mux);
 185
 186       ncline(a)(add) "posA(e)";
 187       ncline(b)(sub) "posA(e)";
 188       nccurve(a)(sub) "posA(e)", "angleA(0)";
 189       nccurve(b)(add) "posA(e)", "angleA(0)";
 190       nccurve(add)(mux) "posB(inpa)", "angleB(0)";
 191       nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
 192       nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
 193       ncline(mux)(res) "posA(out)";
 194     \stopuseMPgraphic
 195
 196     \placeexample[here][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
 197       \startcombination[2*1]
 198         {\typebufferlam{AddSubAlu}}{Core description in normal form.}
 199         {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
 200       \stopcombination
 201
 202     As a more complete example, consider \in{example}[ex:NormalComplete]. This
 203     example contains everything that is supported in normal form, with the
 204     exception of builtin higher order functions. The graphical version of the
 205     architecture contains a slightly simplified version, since the state tuple
 206     packing and unpacking have been left out. Instead, two seperate registers are
 207     drawn. Also note that most synthesis tools will further optimize this
 208     architecture by removing the multiplexers at the register input and replace
 209     them with some logic in the clock inputs, but we want to show the architecture
 210     as close to the description as possible.
 211
 212     \startbuffer[NormalComplete]
 213       regbank :: Bit
 214                  -> Word
 215                  -> State (Word, Word)
 216                  -> (State (Word, Word), Word)
 217
 218       -- All arguments are an inital lambda
 219       regbank = λa.λd.λsp.
 220       -- There are nested let expressions at top level
 221       let
 222         -- Unpack the state by coercion (\eg, cast from
 223         -- State (Word, Word) to (Word, Word))
 224         s = sp :: (Word, Word)
 225         -- Extract both registers from the state
 226         r1 = case s of (fst, snd) -> fst
 227         r2 = case s of (fst, snd) -> snd
 228         -- Calling some other user-defined function.
 229         d' = foo d
 230         -- Conditional connections
 231         out = case a of
 232           High -> r1
 233           Low -> r2
 234         r1' = case a of
 235           High -> d'
 236           Low -> r1
 237         r2' = case a of
 238           High -> r2
 239           Low -> d'
 240         -- Packing a tuple
 241         s' = (,) r1' r2'
 242         -- pack the state by coercion (\eg, cast from
 243         -- (Word, Word) to State (Word, Word))
 244         sp' = s' :: State (Word, Word)
 245         -- Pack our return value
 246         res = (,) sp' out
 247       in
 248         -- The actual result
 249         res
 250     \stopbuffer
 251
 252     \startuseMPgraphic{NormalComplete}
 253       save a, d, r, foo, muxr, muxout, out;
 254
 255       % I/O ports
 256       newCircle.a(btex \lam{a} etex) "framed(false)";
 257       newCircle.d(btex \lam{d} etex) "framed(false)";
 258       newCircle.out(btex \lam{out} etex) "framed(false)";
 259       % Components
 260       %newCircle.add(btex + etex);
 261       newBox.foo(btex \lam{foo} etex);
 262       newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
 263       newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
 264       newMux.muxr1;
 265       % Reflect over the vertical axis
 266       reflectObj(muxr1)((0,0), (0,1));
 267       newMux.muxr2;
 268       newMux.muxout;
 269       rotateObj(muxout)(-90);
 270
 271       d.c               = foo.c + (0cm, 1.5cm);
 272       a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
 273       foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
 274       muxr1.c           = r1.c + (0cm, 2cm);
 275       muxr2.c           = r2.c + (0cm, 2cm);
 276       r2.c              = r1.c + (4cm, 0cm);
 277       r1.c              = origin;
 278       muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
 279       out.c             = muxout.c - (0cm, 1.5cm);
 280
 281     %  % Draw objects and lines
 282       drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
 283
 284       ncline(d)(foo);
 285       nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
 286       nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
 287       nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
 288       nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
 289       nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
 290       nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
 291       nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
 292       nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
 293       % Connect port a
 294       nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
 295       nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
 296       nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
 297       ncline(muxout)(out) "posA(out)";
 298     \stopuseMPgraphic
 299
 300     \placeexample[here][ex:NormalComplete]{Simple architecture consisting of an adder and a
 301     subtractor.}
 302       \startcombination[2*1]
 303         {\typebufferlam{NormalComplete}}{Core description in normal form.}
 304         {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
 305       \stopcombination
 306
 307     \subsection{Intended normal form definition}
 308       Now we have some intuition for the normal form, we can describe how we want
 309       the normal form to look like in a slightly more formal manner. The following
 310       EBNF-like description completely captures the intended structure (and
 311       generates a subset of GHC's core format).
 312
 313       Some clauses have an expression listed in parentheses. These are conditions
 314       that need to apply to the clause.
 315
 316       \startlambda
 317       \italic{normal} = \italic{lambda}
 318       \italic{lambda} = λvar.\italic{lambda} (representable(var))
 319                       | \italic{toplet}
 320       \italic{toplet} = letrec [\italic{binding}...] in var (representable(varvar))
 321       \italic{binding} = var = \italic{rhs} (representable(rhs))
 322                        -- State packing and unpacking by coercion
 323                        | var0 = var1 :: State ty (lvar(var1))
 324                        | var0 = var1 :: ty (var0 :: State ty) (lvar(var1))
 325       \italic{rhs} = userapp
 326                    | builtinapp
 327                    -- Extractor case
 328                    | case var of C a0 ... an -> ai (lvar(var))
 329                    -- Selector case
 330                    | case var of (lvar(var))
 331                       DEFAULT -> var0 (lvar(var0))
 332                       C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar))
 333       \italic{userapp} = \italic{userfunc}
 334                        | \italic{userapp} {userarg}
 335       \italic{userfunc} = var (gvar(var))
 336       \italic{userarg} = var (lvar(var))
 337       \italic{builtinapp} = \italic{builtinfunc}
 338                           | \italic{builtinapp} \italic{builtinarg}
 339       \italic{builtinfunc} = var (bvar(var))
 340       \italic{builtinarg} = \italic{coreexpr}
 341       \stoplambda
 342
 343       \todo{Limit builtinarg further}
 344
 345       \todo{There can still be other casts around (which the code can handle,
 346       e.g., ignore), which still need to be documented here}
 347
 348       \todo{Note about the selector case. It just supports Bit and Bool
 349       currently, perhaps it should be generalized in the normal form? This is
 350       no longer true, btw}
 351
 352       When looking at such a program from a hardware perspective, the top level
 353       lambda's define the input ports. The value produced by the let expression is
 354       the output port. Most function applications bound by the let expression
 355       define a component instantiation, where the input and output ports are mapped
 356       to local signals or arguments. Some of the others use a builtin
 357       construction (\eg the \lam{case} statement) or call a builtin function
 358       (\eg \lam{add} or \lam{sub}). For these, a hardcoded \small{VHDL} translation is
 359       available.
 360
 361   \section{Transformation notation}
 362     To be able to concisely present transformations, we use a specific format to
 363     them. It is a simple format, similar to one used in logic reasoning.
 364
 365     Such a transformation description looks like the following.
 366
 367     \starttrans
 368     <context conditions>
 369     ~
 370     <original expression>
 371     --------------------------          <expression conditions>
 372     <transformed expresssion>
 373     ~
 374     <context additions>
 375     \stoptrans
 376
 377     This format desribes a transformation that applies to \lam{original
 378     expresssion} and transforms it into \lam{transformed expression}, assuming
 379     that all conditions apply. In this format, there are a number of placeholders
 380     in pointy brackets, most of which should be rather obvious in their meaning.
 381     Nevertheless, we will more precisely specify their meaning below:
 382
 383       \startdesc{<original expression>} The expression pattern that will be matched
 384       against (subexpressions of) the expression to be transformed. We call this a
 385       pattern, because it can contain \emph{placeholders} (variables), which match
 386       any expression or binder. Any such placeholder is said to be \emph{bound} to
 387       the expression it matches. It is convention to use an uppercase latter (\eg
 388       \lam{M} or \lam{E} to refer to any expression (including a simple variable
 389       reference) and lowercase letters (\eg \lam{v} or \lam{b}) to refer to
 390       (references to) binders.
 391
 392       For example, the pattern \lam{a + B} will match the expression
 393       \lam{v + (2 * w)} (and bind \lam{a} to \lam{v} and \lam{B} to
 394       \lam{(2 * 2)}), but not \lam{v + (2 * w)}.
 395       \stopdesc
 396
 397       \startdesc{<expression conditions>}
 398       These are extra conditions on the expression that is matched. These
 399       conditions can be used to further limit the cases in which the
 400       transformation applies, in particular to prevent a transformation from
 401       causing a loop with itself or another transformation.
 402
 403       Only if these if these conditions are \emph{all} true, this transformation
 404       applies.
 405       \stopdesc
 406
 407       \startdesc{<context conditions>}
 408       These are a number of extra conditions on the context of the function. In
 409       particular, these conditions can require some other top level function to be
 410       present, whose value matches the pattern given here. The format of each of
 411       these conditions is: \lam{binder = <pattern>}.
 412
 413       Typically, the binder is some placeholder bound in the \lam{<original
 414       expression>}, while the pattern contains some placeholders that are used in
 415       the \lam{transformed expression}.
 416
 417       Only if a top level binder exists that matches each binder and pattern, this
 418       transformation applies.
 419       \stopdesc
 420
 421       \startdesc{<transformed expression>}
 422       This is the expression template that is the result of the transformation. If, looking
 423       at the above three items, the transformation applies, the \lam{original
 424       expression} is completely replaced with the \lam{<transformed expression>}.
 425       We call this a template, because it can contain placeholders, referring to
 426       any placeholder bound by the \lam{<original expression>} or the
 427       \lam{<context conditions>}. The resulting expression will have those
 428       placeholders replaced by the values bound to them.
 429
 430       Any binder (lowercase) placeholder that has no value bound to it yet will be
 431       bound to (and replaced with) a fresh binder.
 432       \stopdesc
 433
 434       \startdesc{<context additions>}
 435       These are templates for new functions to add to the context. This is a way
 436       to have a transformation create new top level functiosn.
 437
 438       Each addition has the form \lam{binder = template}. As above, any
 439       placeholder in the addition is replaced with the value bound to it, and any
 440       binder placeholder that has no value bound to it yet will be bound to (and
 441       replaced with) a fresh binder.
 442       \stopdesc
 443
 444     As an example, we'll look at η-abstraction:
 445
 446     \starttrans
 447     E                 \lam{E :: a -> b}
 448     --------------    \lam{E} does not occur on a function position in an application
 449     λx.E x            \lam{E} is not a lambda abstraction.
 450     \stoptrans
 451
 452     Consider the following function, which is a fairly obvious way to specify a
 453     simple ALU (Note \at{example}[ex:AddSubAlu] is the normal form of this
 454     function):
 455
 456     \startlambda
 457     alu :: Bit -> Word -> Word -> Word
 458     alu = λopcode. case opcode of
 459       Low -> (+)
 460       High -> (-)
 461     \stoplambda
 462
 463     There are a few subexpressions in this function to which we could possibly
 464     apply the transformation. Since the pattern of the transformation is only
 465     the placeholder \lam{E}, any expression will match that. Whether the
 466     transformation applies to an expression is thus solely decided by the
 467     conditions to the right of the transformation.
 468
 469     We will look at each expression in the function in a top down manner. The
 470     first expression is the entire expression the function is bound to.
 471
 472     \startlambda
 473     λopcode. case opcode of
 474       Low -> (+)
 475       High -> (-)
 476     \stoplambda
 477
 478     As said, the expression pattern matches this. The type of this expression is
 479     \lam{Bit -> Word -> Word -> Word}, which matches \lam{a -> b} (Note that in
 480     this case \lam{a = Bit} and \lam{b = Word -> Word -> Word}).
 481
 482     Since this expression is at top level, it does not occur at a function
 483     position of an application. However, The expression is a lambda abstraction,
 484     so this transformation does not apply.
 485
 486     The next expression we could apply this transformation to, is the body of
 487     the lambda abstraction:
 488
 489     \startlambda
 490     case opcode of
 491       Low -> (+)
 492       High -> (-)
 493     \stoplambda
 494
 495     The type of this expression is \lam{Word -> Word -> Word}, which again
 496     matches \lam{a -> b}. The expression is the body of a lambda expression, so
 497     it does not occur at a function position of an application. Finally, the
 498     expression is not a lambda abstraction but a case expression, so all the
 499     conditions match. There are no context conditions to match, so the
 500     transformation applies.
 501
 502     By now, the placeholder \lam{E} is bound to the entire expression. The
 503     placeholder \lam{x}, which occurs in the replacement template, is not bound
 504     yet, so we need to generate a fresh binder for that. Let's use the binder
 505     \lam{a}. This results in the following replacement expression:
 506
 507     \startlambda
 508     λa.(case opcode of
 509       Low -> (+)
 510       High -> (-)) a
 511     \stoplambda
 512
 513     Continuing with this expression, we see that the transformation does not
 514     apply again (it is a lambda expression). Next we look at the body of this
 515     labmda abstraction:
 516
 517     \startlambda
 518     (case opcode of
 519       Low -> (+)
 520       High -> (-)) a
 521     \stoplambda
 522
 523     Here, the transformation does apply, binding \lam{E} to the entire
 524     expression and \lam{x} to the fresh binder \lam{b}, resulting in the
 525     replacement:
 526
 527     \startlambda
 528     λb.(case opcode of
 529       Low -> (+)
 530       High -> (-)) a b
 531     \stoplambda
 532
 533     Again, the transformation does not apply to this lambda abstraction, so we
 534     look at its body. For brevity, we'll put the case statement on one line from
 535     now on.
 536
 537     \startlambda
 538     (case opcode of Low -> (+); High -> (-)) a b
 539     \stoplambda
 540
 541     The type of this expression is \lam{Word}, so it does not match \lam{a -> b}
 542     and the transformation does not apply. Next, we have two options for the
 543     next expression to look at: The function position and argument position of
 544     the application. The expression in the argument position is \lam{b}, which
 545     has type \lam{Word}, so the transformation does not apply. The expression in
 546     the function position is:
 547
 548     \startlambda
 549     (case opcode of Low -> (+); High -> (-)) a
 550     \stoplambda
 551
 552     Obviously, the transformation does not apply here, since it occurs in
 553     function position. In the same way the transformation does not apply to both
 554     components of this expression (\lam{case opcode of Low -> (+); High -> (-)}
 555     and \lam{a}), so we'll skip to the components of the case expression: The
 556     scrutinee and both alternatives. Since the opcode is not a function, it does
 557     not apply here, and we'll leave both alternatives as an exercise to the
 558     reader. The final function, after all these transformations becomes:
 559
 560     \startlambda
 561     alu :: Bit -> Word -> Word -> Word
 562     alu = λopcode.λa.b. (case opcode of
 563       Low -> λa1.λb1 (+) a1 b1
 564       High -> λa2.λb2 (-) a2 b2) a b
 565     \stoplambda
 566
 567     In this case, the transformation does not apply anymore, though this might
 568     not always be the case (e.g., the application of a transformation on a
 569     subexpression might open up possibilities to apply the transformation
 570     further up in the expression).
 571
 572     \subsection{Transformation application}
 573       In this chapter we define a number of transformations, but how will we apply
 574       these? As stated before, our normal form is reached as soon as no
 575       transformation applies anymore. This means our application strategy is to
 576       simply apply any transformation that applies, and continuing to do that with
 577       the result of each transformation.
 578
 579       In particular, we define no particular order of transformations. Since
 580       transformation order should not influence the resulting normal form,
 581       \todo{This is not really true, but would like it to be...} this leaves
 582       the implementation free to choose any application order that results in
 583       an efficient implementation.
 584
 585       When applying a single transformation, we try to apply it to every (sub)expression
 586       in a function, not just the top level function. This allows us to keep the
 587       transformation descriptions concise and powerful.
 588
 589     \subsection{Definitions}
 590       In the following sections, we will be using a number of functions and
 591       notations, which we will define here.
 592
 593       \todo{Define substitution (notation)}
 594
 595       \subsubsection{Other concepts}
 596         A \emph{global variable} is any variable that is bound at the
 597         top level of a program, or an external module. A \emph{local variable} is any
 598         other variable (\eg, variables local to a function, which can be bound by
 599         lambda abstractions, let expressions and pattern matches of case
 600         alternatives).  Note that this is a slightly different notion of global versus
 601         local than what \small{GHC} uses internally.
 602         \defref{global variable} \defref{local variable}
 603
 604         A \emph{hardware representable} (or just \emph{representable}) type or value
 605         is (a value of) a type that we can generate a signal for in hardware. For
 606         example, a bit, a vector of bits, a 32 bit unsigned word, etc. Types that are
 607         not runtime representable notably include (but are not limited to): Types,
 608         dictionaries, functions.
 609         \defref{representable}
 610
 611         A \emph{builtin function} is a function supplied by the Cλash framework, whose
 612         implementation is not valid Cλash. The implementation is of course valid
 613         Haskell, for simulation, but it is not expressable in Cλash.
 614         \defref{builtin function} \defref{user-defined function}
 615
 616       For these functions, Cλash has a \emph{builtin hardware translation}, so calls
 617       to these functions can still be translated. These are functions like
 618       \lam{map}, \lam{hwor} and \lam{length}.
 619
 620       A \emph{user-defined} function is a function for which we do have a Cλash
 621       implementation available.
 622
 623       \subsubsection{Functions}
 624         Here, we define a number of functions that can be used below to concisely
 625         specify conditions.
 626
 627         \refdef{global variable}\emph{gvar(expr)} is true when \emph{expr} is a variable that references a
 628         global variable. It is false when it references a local variable.
 629
 630         \refdef{local variable}\emph{lvar(expr)} is the complement of \emph{gvar}; it is true when \emph{expr}
 631         references a local variable, false when it references a global variable.
 632
 633         \refdef{representable}\emph{representable(expr)} or \emph{representable(var)} is true when
 634         \emph{expr} or \emph{var} is \emph{representable}.
 635
 636     \subsection{Binder uniqueness}
 637       A common problem in transformation systems, is binder uniqueness. When not
 638       considering this problem, it is easy to create transformations that mix up
 639       bindings and cause name collisions. Take for example, the following core
 640       expression:
 641
 642       \startlambda
 643       (λa.λb.λc. a * b * c) x c
 644       \stoplambda
 645
 646       By applying β-reduction (see below) once, we can simplify this expression to:
 647
 648       \startlambda
 649       (λb.λc. x * b * c) c
 650       \stoplambda
 651
 652       Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
 653       binder. No harm done here. But note that we see multiple occurences of the
 654       \lam{c} binder. The first is a binding occurence, to which the second refers.
 655       The last, however refers to \emph{another} instance of \lam{c}, which is
 656       bound somewhere outside of this expression. Now, if we would apply beta
 657       reduction without taking heed of binder uniqueness, we would get:
 658
 659       \startlambda
 660       λc. x * c * c
 661       \stoplambda
 662
 663       This is obviously not what was supposed to happen! The root of this problem is
 664       the reuse of binders: Identical binders can be bound in different scopes, such
 665       that only the inner one is \quote{visible} in the inner expression. In the example
 666       above, the \lam{c} binder was bound outside of the expression and in the inner
 667       lambda expression. Inside that lambda expression, only the inner \lam{c} is
 668       visible.
 669
 670       There are a number of ways to solve this. \small{GHC} has isolated this
 671       problem to their binder substitution code, which performs \emph{deshadowing}
 672       during its expression traversal. This means that any binding that shadows
 673       another binding on a higher level is replaced by a new binder that does not
 674       shadow any other binding. This non-shadowing invariant is enough to prevent
 675       binder uniqueness problems in \small{GHC}.
 676
 677       In our transformation system, maintaining this non-shadowing invariant is
 678       a bit harder to do (mostly due to implementation issues, the prototype doesn't
 679       use \small{GHC}'s subsitution code). Also, we can observe the following
 680       points.
 681
 682       \startitemize
 683       \item Deshadowing does not guarantee overall uniqueness. For example, the
 684       following (slightly contrived) expression shows the identifier \lam{x} bound in
 685       two seperate places (and to different values), even though no shadowing
 686       occurs.
 687
 688       \startlambda
 689       (let x = 1 in x) + (let x = 2 in x)
 690       \stoplambda
 691
 692       \item In our normal form (and the resulting \small{VHDL}), all binders
 693       (signals) will end up in the same scope. To allow this, all binders within the
 694       same function should be unique.
 695
 696       \item When we know that all binders in an expression are unique, moving around
 697       or removing a subexpression will never cause any binder conflicts. If we have
 698       some way to generate fresh binders, introducing new subexpressions will not
 699       cause any problems either. The only way to cause conflicts is thus to
 700       duplicate an existing subexpression.
 701       \stopitemize
 702
 703       Given the above, our prototype maintains a unique binder invariant. This
 704       meanst that in any given moment during normalization, all binders \emph{within
 705       a single function} must be unique. To achieve this, we apply the following
 706       technique.
 707
 708       \todo{Define fresh binders and unique supplies}
 709
 710       \startitemize
 711       \item Before starting normalization, all binders in the function are made
 712       unique. This is done by generating a fresh binder for every binder used. This
 713       also replaces binders that did not pose any conflict, but it does ensure that
 714       all binders within the function are generated by the same unique supply. See
 715       \refdef{fresh binder}
 716       \item Whenever a new binder must be generated, we generate a fresh binder that
 717       is guaranteed to be different from \emph{all binders generated so far}. This
 718       can thus never introduce duplication and will maintain the invariant.
 719       \item Whenever (part of) an expression is duplicated (for example when
 720       inlining), all binders in the expression are replaced with fresh binders
 721       (using the same method as at the start of normalization). These fresh binders
 722       can never introduce duplication, so this will maintain the invariant.
 723       \item Whenever we move part of an expression around within the function, there
 724       is no need to do anything special. There is obviously no way to introduce
 725       duplication by moving expressions around. Since we know that each of the
 726       binders is already unique, there is no way to introduce (incorrect) shadowing
 727       either.
 728       \stopitemize
 729
 730   \section{Transform passes}
 731     In this section we describe the actual transforms. Here we're using
 732     the core language in a notation that resembles lambda calculus.
 733
 734     Each of these transforms is meant to be applied to every (sub)expression
 735     in a program, for as long as it applies. Only when none of the
 736     transformations can be applied anymore, the program is in normal form (by
 737     definition). We hope to be able to prove that this form will obey all of the
 738     constraints defined above, but this has yet to happen (though it seems likely
 739     that it will).
 740
 741     Each of the transforms will be described informally first, explaining
 742     the need for and goal of the transform. Then, a formal definition is
 743     given, using a familiar syntax from the world of logic. Each transform
 744     is specified as a number of conditions (above the horizontal line) and a
 745     number of conclusions (below the horizontal line). The details of using
 746     this notation are still a bit fuzzy, so comments are welcom.
 747
 748     \subsection{General cleanup}
 749       These transformations are general cleanup transformations, that aim to
 750       make expressions simpler. These transformations usually clean up the
 751        mess left behind by other transformations or clean up expressions to
 752        expose new transformation opportunities for other transformations.
 753
 754        Most of these transformations are standard optimizations in other
 755        compilers as well. However, in our compiler, most of these are not just
 756        optimizations, but they are required to get our program into normal
 757        form.
 758
 759       \subsubsection{β-reduction}
 760         β-reduction is a well known transformation from lambda calculus, where it is
 761         the main reduction step. It reduces applications of labmda abstractions,
 762         removing both the lambda abstraction and the application.
 763
 764         In our transformation system, this step helps to remove unwanted lambda
 765         abstractions (basically all but the ones at the top level). Other
 766         transformations (application propagation, non-representable inlining) make
 767         sure that most lambda abstractions will eventually be reducable by
 768         β-reduction.
 769
 770         \starttrans
 771         (λx.E) M
 772         -----------------
 773         E[M/x]
 774         \stoptrans
 775
 776         % And an example
 777         \startbuffer[from]
 778         (λa. 2 * a) (2 * b)
 779         \stopbuffer
 780
 781         \startbuffer[to]
 782         2 * (2 * b)
 783         \stopbuffer
 784
 785         \transexample{β-reduction}{from}{to}
 786
 787       \subsubsection{Empty let removal}
 788         This transformation is simple: It removes recursive lets that have no bindings
 789         (which usually occurs when unused let binding removal removes the last
 790         binding from it).
 791
 792         \starttrans
 793         letrec in M
 794         --------------
 795         M
 796         \stoptrans
 797
 798         \todo{Example}
 799
 800       \subsubsection{Simple let binding removal}
 801         This transformation inlines simple let bindings (\eg a = b).
 802
 803         This transformation is not needed to get into normal form, but makes the
 804         resulting \small{VHDL} a lot shorter.
 805
 806         \starttrans
 807         letrec
 808           a0 = E0
 809           \vdots
 810           ai = b
 811           \vdots
 812           an = En
 813         in
 814           M
 815         -----------------------------  \lam{b} is a variable reference
 816         letrec
 817           a0 = E0 [b/ai]
 818           \vdots
 819           ai-1 = Ei-1 [b/ai]
 820           ai+1 = Ei+1 [b/ai]
 821           \vdots
 822           an = En [b/ai]
 823         in
 824           M[b/ai]
 825         \stoptrans
 826
 827         \todo{example}
 828
 829       \subsubsection{Unused let binding removal}
 830         This transformation removes let bindings that are never used. Usually,
 831         the desugarer introduces some unused let bindings.
 832
 833         This normalization pass should really be unneeded to get into normal form
 834         (since unused bindings are not forbidden by the normal form), but in practice
 835         the desugarer or simplifier emits some unused bindings that cannot be
 836         normalized (e.g., calls to a \type{PatError} (\todo{Check this name}). Also,
 837         this transformation makes the resulting \small{VHDL} a lot shorter.
 838
 839         \starttrans
 840         letrec
 841           a0 = E0
 842           \vdots
 843           ai = Ei
 844           \vdots
 845           an = En
 846         in
 847           M                             \lam{a} does not occur free in \lam{M}
 848         ----------------------------    \forall j, 0 <= j <= n, j ≠ i (\lam{a} does not occur free in \lam{Ej})
 849         letrec
 850           a0 = E0
 851           \vdots
 852           ai-1 = Ei-1
 853           ai+1 = Ei+1
 854           \vdots
 855           an = En
 856         in
 857           M
 858         \stoptrans
 859
 860         \todo{Example}
 861
 862       \subsubsection{Cast propagation / simplification}
 863         This transform pushes casts down into the expression as far as possible.
 864         Since its exact role and need is not clear yet, this transformation is
 865         not yet specified.
 866
 867         \todo{Cast propagation}
 868
 869       \subsubsection{Top level binding inlining}
 870         This transform takes simple top level bindings generated by the
 871         \small{GHC} compiler. \small{GHC} sometimes generates very simple
 872         \quote{wrapper} bindings, which are bound to just a variable
 873         reference, or a partial application to constants or other variable
 874         references.
 875
 876         Note that this transformation is completely optional. It is not
 877         required to get any function into normal form, but it does help making
 878         the resulting VHDL output easier to read (since it removes a bunch of
 879         components that are really boring).
 880
 881         This transform takes any top level binding generated by the compiler,
 882         whose normalized form contains only a single let binding.
 883
 884         \starttrans
 885         x = λa0 ... λan.let y = E in y
 886         ~
 887         x
 888         --------------------------------------         \lam{x} is generated by the compiler
 889         λa0 ... λan.let y = E in y
 890         \stoptrans
 891
 892         \startbuffer[from]
 893         (+) :: Word -> Word -> Word
 894         (+) = GHC.Num.(+) @Word $dNum
 895         ~
 896         (+) a b
 897         \stopbuffer
 898         \startbuffer[to]
 899         GHC.Num.(+) @ Alu.Word $dNum a b
 900         \stopbuffer
 901
 902         \transexample{Top level binding inlining}{from}{to}
 903
 904         Without this transformation, the (+) function would generate an
 905         architecture which would just add its inputs. This generates a lot of
 906         overhead in the VHDL, which is particularly annoying when browsing the
 907         generated RTL schematic (especially since + is not allowed in VHDL
 908         architecture names\footnote{Technically, it is allowed to use
 909         non-alphanumerics when using extended identifiers, but it seems that
 910         none of the tooling likes extended identifiers in filenames, so it
 911         effectively doesn't work}, so the entity would be called
 912         \quote{w7aA7f} or something similarly unreadable and autogenerated).
 913
 914     \subsection{Program structure}
 915       These transformations are aimed at normalizing the overall structure
 916       into the intended form. This means ensuring there is a lambda abstraction
 917       at the top for every argument (input port), putting all of the other
 918       value definitions in let bindings and making the final return value a
 919       simple variable reference.
 920
 921       \subsubsection{η-abstraction}
 922         This transformation makes sure that all arguments of a function-typed
 923         expression are named, by introducing lambda expressions. When combined with
 924         β-reduction and non-representable binding inlining, all function-typed
 925         expressions should be lambda abstractions or global identifiers.
 926
 927         \starttrans
 928         E                 \lam{E :: a -> b}
 929         --------------    \lam{E} is not the first argument of an application.
 930         λx.E x            \lam{E} is not a lambda abstraction.
 931                           \lam{x} is a variable that does not occur free in \lam{E}.
 932         \stoptrans
 933
 934         \startbuffer[from]
 935         foo = λa.case a of
 936           True -> λb.mul b b
 937           False -> id
 938         \stopbuffer
 939
 940         \startbuffer[to]
 941         foo = λa.λx.(case a of
 942             True -> λb.mul b b
 943             False -> λy.id y) x
 944         \stopbuffer
 945
 946         \transexample{η-abstraction}{from}{to}
 947
 948       \subsubsection{Application propagation}
 949         This transformation is meant to propagate application expressions downwards
 950         into expressions as far as possible. This allows partial applications inside
 951         expressions to become fully applied and exposes new transformation
 952         opportunities for other transformations (like β-reduction and
 953         specialization).
 954
 955         \starttrans
 956         (letrec binds in E) M
 957         ------------------------
 958         letrec binds in E M
 959         \stoptrans
 960
 961         % And an example
 962         \startbuffer[from]
 963         ( letrec
 964             val = 1
 965           in
 966             add val
 967         ) 3
 968         \stopbuffer
 969
 970         \startbuffer[to]
 971         letrec
 972           val = 1
 973         in
 974           add val 3
 975         \stopbuffer
 976
 977         \transexample{Application propagation for a let expression}{from}{to}
 978
 979         \starttrans
 980         (case x of
 981           p1 -> E1
 982           \vdots
 983           pn -> En) M
 984         -----------------
 985         case x of
 986           p1 -> E1 M
 987           \vdots
 988           pn -> En M
 989         \stoptrans
 990
 991         % And an example
 992         \startbuffer[from]
 993         ( case x of
 994             True -> id
 995             False -> neg
 996         ) 1
 997         \stopbuffer
 998
 999         \startbuffer[to]
1000         case x of
1001           True -> id 1
1002           False -> neg 1
1003         \stopbuffer
1004
1005         \transexample{Application propagation for a case expression}{from}{to}
1006
1007       \subsubsection{Let recursification}
1008         This transformation makes all non-recursive lets recursive. In the
1009         end, we want a single recursive let in our normalized program, so all
1010         non-recursive lets can be converted. This also makes other
1011         transformations simpler: They can simply assume all lets are
1012         recursive.
1013
1014         \starttrans
1015         let
1016           a = E
1017         in
1018           M
1019         ------------------------------------------
1020         letrec
1021           a = E
1022         in
1023           M
1024         \stoptrans
1025
1026       \subsubsection{Let flattening}
1027         This transformation puts nested lets in the same scope, by lifting the
1028         binding(s) of the inner let into a new let around the outer let. Eventually,
1029         this will cause all let bindings to appear in the same scope (they will all be
1030         in scope for the function return value).
1031
1032         \starttrans
1033         letrec
1034           \vdots
1035           x = (letrec bindings in M)
1036           \vdots
1037         in
1038           N
1039         ------------------------------------------
1040         letrec
1041           \vdots
1042           bindings
1043           x = M
1044           \vdots
1045         in
1046           N
1047         \stoptrans
1048
1049         \startbuffer[from]
1050         letrec
1051           a = letrec
1052             x = 1
1053             y = 2
1054           in
1055             x + y
1056         in
1057           a
1058         \stopbuffer
1059         \startbuffer[to]
1060         letrec
1061           x = 1
1062           y = 2
1063           a = x + y
1064         in
1065           a
1066         \stopbuffer
1067
1068         \transexample{Let flattening}{from}{to}
1069
1070       \subsubsection{Return value simplification}
1071         This transformation ensures that the return value of a function is always a
1072         simple local variable reference.
1073
1074         Currently implemented using lambda simplification, let simplification, and
1075         top simplification. Should change into something like the following, which
1076         works only on the result of a function instead of any subexpression. This is
1077         achieved by the contexts, like \lam{x = E}, though this is strictly not
1078         correct (you could read this as "if there is any function \lam{x} that binds
1079         \lam{E}, any \lam{E} can be transformed, while we only mean the \lam{E} that
1080         is bound by \lam{x}. This might need some extra notes or something).
1081
1082         Note that the return value is not simplified if its not representable.
1083         Otherwise, this would cause a direct loop with the inlining of
1084         unrepresentable bindings, of course. If the return value is not
1085         representable because it has a function type, η-abstraction should
1086         make sure that this transformation will eventually apply. If the value
1087         is not representable for other reasons, the function result itself is
1088         not representable, meaning this function is not representable anyway!
1089
1090         \starttrans
1091         x = E                            \lam{E} is representable
1092         ~                                \lam{E} is not a lambda abstraction
1093         E                                \lam{E} is not a let expression
1094         ---------------------------      \lam{E} is not a local variable reference
1095         letrec x = E in x
1096         \stoptrans
1097
1098         \starttrans
1099         x = λv0 ... λvn.E
1100         ~                                \lam{E} is representable
1101         E                                \lam{E} is not a let expression
1102         ---------------------------      \lam{E} is not a local variable reference
1103         letrec x = E in x
1104         \stoptrans
1105
1106         \starttrans
1107         x = λv0 ... λvn.let ... in E
1108         ~                                \lam{E} is representable
1109         E                                \lam{E} is not a local variable reference
1110         ---------------------------
1111         letrec x = E in x
1112         \stoptrans
1113
1114         \startbuffer[from]
1115         x = add 1 2
1116         \stopbuffer
1117
1118         \startbuffer[to]
1119         x = letrec x = add 1 2 in x
1120         \stopbuffer
1121
1122         \transexample{Return value simplification}{from}{to}
1123
1124     \subsection{Argument simplification}
1125       The transforms in this section deal with simplifying application
1126       arguments into normal form. The goal here is to:
1127
1128       \startitemize
1129        \item Make all arguments of user-defined functions (\eg, of which
1130        we have a function body) simple variable references of a runtime
1131        representable type. This is needed, since these applications will be turned
1132        into component instantiations.
1133        \item Make all arguments of builtin functions one of:
1134          \startitemize
1135           \item A type argument.
1136           \item A dictionary argument.
1137           \item A type level expression.
1138           \item A variable reference of a runtime representable type.
1139           \item A variable reference or partial application of a function type.
1140          \stopitemize
1141       \stopitemize
1142
1143       When looking at the arguments of a user-defined function, we can
1144       divide them into two categories:
1145       \startitemize
1146         \item Arguments of a runtime representable type (\eg bits or vectors).
1147
1148               These arguments can be preserved in the program, since they can
1149               be translated to input ports later on.  However, since we can
1150               only connect signals to input ports, these arguments must be
1151               reduced to simple variables (for which signals will be
1152               produced). This is taken care of by the argument extraction
1153               transform.
1154         \item Non-runtime representable typed arguments.
1155
1156               These arguments cannot be preserved in the program, since we
1157               cannot represent them as input or output ports in the resulting
1158               \small{VHDL}. To remove them, we create a specialized version of the
1159               called function with these arguments filled in. This is done by
1160               the argument propagation transform.
1161
1162               Typically, these arguments are type and dictionary arguments that are
1163               used to make functions polymorphic. By propagating these arguments, we
1164               are essentially doing the same which GHC does when it specializes
1165               functions: Creating multiple variants of the same function, one for
1166               each type for which it is used. Other common non-representable
1167               arguments are functions, e.g. when calling a higher order function
1168               with another function or a lambda abstraction as an argument.
1169
1170               The reason for doing this is similar to the reasoning provided for
1171               the inlining of non-representable let bindings above. In fact, this
1172               argument propagation could be viewed as a form of cross-function
1173               inlining.
1174       \stopitemize
1175
1176       \todo{Check the following itemization.}
1177
1178       When looking at the arguments of a builtin function, we can divide them
1179       into categories:
1180
1181       \startitemize
1182         \item Arguments of a runtime representable type.
1183
1184               As we have seen with user-defined functions, these arguments can
1185               always be reduced to a simple variable reference, by the
1186               argument extraction transform. Performing this transform for
1187               builtin functions as well, means that the translation of builtin
1188               functions can be limited to signal references, instead of
1189               needing to support all possible expressions.
1190
1191         \item Arguments of a function type.
1192
1193               These arguments are functions passed to higher order builtins,
1194               like \lam{map} and \lam{foldl}. Since implementing these
1195               functions for arbitrary function-typed expressions (\eg, lambda
1196               expressions) is rather comlex, we reduce these arguments to
1197               (partial applications of) global functions.
1198
1199               We can still support arbitrary expressions from the user code,
1200               by creating a new global function containing that expression.
1201               This way, we can simply replace the argument with a reference to
1202               that new function. However, since the expression can contain any
1203               number of free variables we also have to include partial
1204               applications in our normal form.
1205
1206               This category of arguments is handled by the function extraction
1207               transform.
1208         \item Other unrepresentable arguments.
1209
1210               These arguments can take a few different forms:
1211               \startdesc{Type arguments}
1212                 In the core language, type arguments can only take a single
1213                 form: A type wrapped in the Type constructor. Also, there is
1214                 nothing that can be done with type expressions, except for
1215                 applying functions to them, so we can simply leave type
1216                 arguments as they are.
1217               \stopdesc
1218               \startdesc{Dictionary arguments}
1219                 In the core language, dictionary arguments are used to find
1220                 operations operating on one of the type arguments (mostly for
1221                 finding class methods). Since we will not actually evaluatie
1222                 the function body for builtin functions and can generate
1223                 code for builtin functions by just looking at the type
1224                 arguments, these arguments can be ignored and left as they
1225                 are.
1226               \stopdesc
1227               \startdesc{Type level arguments}
1228                 Sometimes, we want to pass a value to a builtin function, but
1229                 we need to know the value at compile time. Additionally, the
1230                 value has an impact on the type of the function. This is
1231                 encoded using type-level values, where the actual value of the
1232                 argument is not important, but the type encodes some integer,
1233                 for example. Since the value is not important, the actual form
1234                 of the expression does not matter either and we can leave
1235                 these arguments as they are.
1236               \stopdesc
1237               \startdesc{Other arguments}
1238                 Technically, there is still a wide array of arguments that can
1239                 be passed, but does not fall into any of the above categories.
1240                 However, none of the supported builtin functions requires such
1241                 an argument. This leaves use with passing unsupported types to
1242                 a function, such as calling \lam{head} on a list of functions.
1243
1244                 In these cases, it would be impossible to generate hardware
1245                 for such a function call anyway, so we can ignore these
1246                 arguments.
1247
1248                 The only way to generate hardware for builtin functions with
1249                 arguments like these, is to expand the function call into an
1250                 equivalent core expression (\eg, expand map into a series of
1251                 function applications). But for now, we choose to simply not
1252                 support expressions like these.
1253               \stopdesc
1254
1255               From the above, we can conclude that we can simply ignore these
1256               other unrepresentable arguments and focus on the first two
1257               categories instead.
1258       \stopitemize
1259
1260       \subsubsection{Argument simplification}
1261         This transform deals with arguments to functions that
1262         are of a runtime representable type. It ensures that they will all become
1263         references to global variables, or local signals in the resulting \small{VHDL}.
1264
1265         \todo{It seems we can map an expression to a port, not only a signal.}
1266         Perhaps this makes this transformation not needed?
1267         \todo{Say something about dataconstructors (without arguments, like True
1268         or False), which are variable references of a runtime representable
1269         type, but do not result in a signal.}
1270
1271         To reduce a complex expression to a simple variable reference, we create
1272         a new let expression around the application, which binds the complex
1273         expression to a new variable. The original function is then applied to
1274         this variable.
1275
1276         \starttrans
1277         M N
1278         --------------------    \lam{N} is of a representable type
1279         letrec x = N in M x     \lam{N} is not a local variable reference
1280         \stoptrans
1281
1282         \startbuffer[from]
1283         add (add a 1) 1
1284         \stopbuffer
1285
1286         \startbuffer[to]
1287         letrec x = add a 1 in add x 1
1288         \stopbuffer
1289
1290         \transexample{Argument extraction}{from}{to}
1291
1292       \subsubsection{Function extraction}
1293         This transform deals with function-typed arguments to builtin functions.
1294         Since these arguments cannot be propagated, we choose to extract them
1295         into a new global function instead.
1296
1297         Any free variables occuring in the extracted arguments will become
1298         parameters to the new global function. The original argument is replaced
1299         with a reference to the new function, applied to any free variables from
1300         the original argument.
1301
1302         This transformation is useful when applying higher order builtin functions
1303         like \hs{map} to a lambda abstraction, for example. In this case, the code
1304         that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
1305         partial applications, not any other expression (such as lambda abstractions or
1306         even more complicated expressions).
1307
1308         \starttrans
1309         M N                     \lam{M} is a (partial aplication of a) builtin function.
1310         ---------------------   \lam{f0 ... fn} = free local variables of \lam{N}
1311         M (x f0 ... fn)         \lam{N :: a -> b}
1312         ~                       \lam{N} is not a (partial application of) a top level function
1313         x = λf0 ... λfn.N
1314         \stoptrans
1315
1316         \startbuffer[from]
1317         map (λa . add a b) xs
1318
1319         map (add b) ys
1320         \stopbuffer
1321
1322         \startbuffer[to]
1323         map (x0 b) xs
1324
1325         map x1 ys
1326         ~
1327         x0 = λb.λa.add a b
1328         x1 = λb.add b
1329         \stopbuffer
1330
1331         \transexample{Function extraction}{from}{to}
1332
1333         Note that \lam{x0} and {x1} will still need normalization after this.
1334
1335       \subsubsection{Argument propagation}
1336         \fxnote{This section should be generalized and describe
1337         specialization, so other transformations can refer to this (since
1338         specialization is really used in multiple categories).}
1339
1340         This transform deals with arguments to user-defined functions that are
1341         not representable at runtime. This means these arguments cannot be
1342         preserved in the final form and most be {\em propagated}.
1343
1344         Propagation means to create a specialized version of the called
1345         function, with the propagated argument already filled in. As a simple
1346         example, in the following program:
1347
1348         \startlambda
1349         f = λa.λb.a + b
1350         inc = λa.f a 1
1351         \stoplambda
1352
1353         We could {\em propagate} the constant argument 1, with the following
1354         result:
1355
1356         \startlambda
1357         f' = λa.a + 1
1358         inc = λa.f' a
1359         \stoplambda
1360
1361         Special care must be taken when the to-be-propagated expression has any
1362         free variables. If this is the case, the original argument should not be
1363         removed alltogether, but replaced by all the free variables of the
1364         expression. In this way, the original expression can still be evaluated
1365         inside the new function. Also, this brings us closer to our goal: All
1366         these free variables will be simple variable references.
1367
1368         To prevent us from propagating the same argument over and over, a simple
1369         local variable reference is not propagated (since is has exactly one
1370         free variable, itself, we would only replace that argument with itself).
1371
1372         This shows that any free local variables that are not runtime representable
1373         cannot be brought into normal form by this transform. We rely on an
1374         inlining transformation to replace such a variable with an expression we
1375         can propagate again.
1376
1377         \starttrans
1378         x = E
1379         ~
1380         x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
1381         ---------------------------------------------    \lam{Yi} is not a local variable reference
1382         x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} = free local vars of \lam{Yi}
1383         ~
1384         x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn .
1385               E y0 ... yi-1 Yi yi+1 ... yn
1386
1387         \stoptrans
1388
1389         \todo{Example}
1390
1391     \subsection{Case simplification}
1392       \subsubsection{Scrutinee simplification}
1393         This transform ensures that the scrutinee of a case expression is always
1394         a simple variable reference.
1395
1396         \starttrans
1397         case E of
1398           alts
1399         -----------------        \lam{E} is not a local variable reference
1400         letrec x = E in
1401           case E of
1402             alts
1403         \stoptrans
1404
1405         \startbuffer[from]
1406         case (foo a) of
1407           True -> a
1408           False -> b
1409         \stopbuffer
1410
1411         \startbuffer[to]
1412         letrec x = foo a in
1413           case x of
1414             True -> a
1415             False -> b
1416         \stopbuffer
1417
1418         \transexample{Let flattening}{from}{to}
1419
1420
1421       \subsubsection{Case simplification}
1422         This transformation ensures that all case expressions become normal form. This
1423         means they will become one of:
1424         \startitemize
1425         \item An extractor case with a single alternative that picks a single field
1426         from a datatype, \eg \lam{case x of (a, b) -> a}.
1427         \item A selector case with multiple alternatives and only wild binders, that
1428         makes a choice between expressions based on the constructor of another
1429         expression, \eg \lam{case x of Low -> a; High -> b}.
1430         \stopitemize
1431
1432         \starttrans
1433         case E of
1434           C0 v0,0 ... v0,m -> E0
1435           \vdots
1436           Cn vn,0 ... vn,m -> En
1437         --------------------------------------------------- \forall i \forall j, 0 <= i <= n, 0 <= i < m (\lam{wi,j} is a wild (unused) binder)
1438         letrec
1439           v0,0 = case x of C0 v0,0 .. v0,m -> v0,0
1440           \vdots
1441           v0,m = case x of C0 v0,0 .. v0,m -> v0,m
1442           x0 = E0
1443           \dots
1444           vn,m = case x of Cn vn,0 .. vn,m -> vn,m
1445           xn = En
1446         in
1447           case E of
1448             C0 w0,0 ... w0,m -> x0
1449             \vdots
1450             Cn wn,0 ... wn,m -> xn
1451         \stoptrans
1452
1453         \fxnote{This transformation specified like this is complicated and misses
1454         conditions to prevent looping with itself. Perhaps it should be split here for
1455         discussion?}
1456
1457         \startbuffer[from]
1458         case a of
1459           True -> add b 1
1460           False -> add b 2
1461         \stopbuffer
1462
1463         \startbuffer[to]
1464         letnonrec
1465           x0 = add b 1
1466           x1 = add b 2
1467         in
1468           case a of
1469             True -> x0
1470             False -> x1
1471         \stopbuffer
1472
1473         \transexample{Selector case simplification}{from}{to}
1474
1475         \startbuffer[from]
1476         case a of
1477           (,) b c -> add b c
1478         \stopbuffer
1479         \startbuffer[to]
1480         letrec
1481           b = case a of (,) b c -> b
1482           c = case a of (,) b c -> c
1483           x0 = add b c
1484         in
1485           case a of
1486             (,) w0 w1 -> x0
1487         \stopbuffer
1488
1489         \transexample{Extractor case simplification}{from}{to}
1490
1491       \subsubsection{Case removal}
1492         This transform removes any case statements with a single alternative and
1493         only wild binders.
1494
1495         These "useless" case statements are usually leftovers from case simplification
1496         on extractor case (see the previous example).
1497
1498         \starttrans
1499         case x of
1500           C v0 ... vm -> E
1501         ----------------------     \lam{\forall i, 0 <= i <= m} (\lam{vi} does not occur free in E)
1502         E
1503         \stoptrans
1504
1505         \startbuffer[from]
1506         case a of
1507           (,) w0 w1 -> x0
1508         \stopbuffer
1509
1510         \startbuffer[to]
1511         x0
1512         \stopbuffer
1513
1514         \transexample{Case removal}{from}{to}
1515
1516   \subsection{Removing polymorphism}
1517     Reference type-specialization (== argument propagation)
1518
1519     Reference polymporphic binding inlining (== non-representable binding
1520     inlining).
1521
1522   \subsection{Defunctionalization}
1523     These transformations remove most higher order expressions from our
1524     program, making it completely first-order (the only exception here is for
1525     arguments to builtin functions, since we can't specialize builtin
1526     function. \todo{Talk more about this somewhere}
1527
1528     Reference higher-order-specialization (== argument propagation)
1529
1530       \subsubsection{Non-representable binding inlining}
1531         This transform inlines let bindings that have a non-representable type. Since
1532         we can never generate a signal assignment for these bindings (we cannot
1533         declare a signal assignment with a non-representable type, for obvious
1534         reasons), we have no choice but to inline the binding to remove it.
1535
1536         If the binding is non-representable because it is a lambda abstraction, it is
1537         likely that it will inlined into an application and β-reduction will remove
1538         the lambda abstraction and turn it into a representable expression at the
1539         inline site. The same holds for partial applications, which can be turned into
1540         full applications by inlining.
1541
1542         Other cases of non-representable bindings we see in practice are primitive
1543         Haskell types. In most cases, these will not result in a valid normalized
1544         output, but then the input would have been invalid to start with. There is one
1545         exception to this: When a builtin function is applied to a non-representable
1546         expression, things might work out in some cases. For example, when you write a
1547         literal \hs{SizedInt} in Haskell, like \hs{1 :: SizedInt D8}, this results in
1548         the following core: \lam{fromInteger (smallInteger 10)}, where for example
1549         \lam{10 :: GHC.Prim.Int\#} and \lam{smallInteger 10 :: Integer} have
1550         non-representable types. \todo{This/these paragraph(s) should probably become a
1551         separate discussion somewhere else}
1552
1553
1554         \starttrans
1555         letrec
1556           a0 = E0
1557           \vdots
1558           ai = Ei
1559           \vdots
1560           an = En
1561         in
1562           M
1563         --------------------------    \lam{Ei} has a non-representable type.
1564         letrec
1565           a0 = E0 [Ei/ai]
1566           \vdots
1567           ai-1 = Ei-1 [Ei/ai]
1568           ai+1 = Ei+1 [Ei/ai]
1569           \vdots
1570           an = En [Ei/ai]
1571         in
1572           M[Ei/ai]
1573         \stoptrans
1574
1575         \startbuffer[from]
1576         letrec
1577           a = smallInteger 10
1578           inc = λb -> add b 1
1579           inc' = add 1
1580           x = fromInteger a
1581         in
1582           inc (inc' x)
1583         \stopbuffer
1584
1585         \startbuffer[to]
1586         letrec
1587           x = fromInteger (smallInteger 10)
1588         in
1589           (λb -> add b 1) (add 1 x)
1590         \stopbuffer
1591
1592         \transexample{None representable binding inlining}{from}{to}
1593
1594
1595   \section{Provable properties}
1596     When looking at the system of transformations outlined above, there are a
1597     number of questions that we can ask ourselves. The main question is of course:
1598     \quote{Does our system work as intended?}. We can split this question into a
1599     number of subquestions:
1600
1601     \startitemize[KR]
1602     \item[q:termination] Does our system \emph{terminate}? Since our system will
1603     keep running as long as transformations apply, there is an obvious risk that
1604     it will keep running indefinitely. One transformation produces a result that
1605     is transformed back to the original by another transformation, for example.
1606     \item[q:soundness] Is our system \emph{sound}? Since our transformations
1607     continuously modify the expression, there is an obvious risk that the final
1608     normal form will not be equivalent to the original program: Its meaning could
1609     have changed.
1610     \item[q:completeness] Is our system \emph{complete}? Since we have a complex
1611     system of transformations, there is an obvious risk that some expressions will
1612     not end up in our intended normal form, because we forgot some transformation.
1613     In other words: Does our transformation system result in our intended normal
1614     form for all possible inputs?
1615     \item[q:determinism] Is our system \emph{deterministic}? Since we have defined
1616     no particular order in which the transformation should be applied, there is an
1617     obvious risk that different transformation orderings will result in
1618     \emph{different} normal forms. They might still both be intended normal forms
1619     (if our system is \emph{complete}) and describe correct hardware (if our
1620     system is \emph{sound}), so this property is less important than the previous
1621     three: The translator would still function properly without it.
1622     \stopitemize
1623
1624     \subsection{Graph representation}
1625       Before looking into how to prove these properties, we'll look at our
1626       transformation system from a graph perspective. The nodes of the graph are
1627       all possible Core expressions. The (directed) edges of the graph are
1628       transformations. When a transformation α applies to an expression \lam{A} to
1629       produce an expression \lam{B}, we add an edge from the node for \lam{A} to the
1630       node for \lam{B}, labeled α.
1631
1632       \startuseMPgraphic{TransformGraph}
1633         save a, b, c, d;
1634
1635         % Nodes
1636         newCircle.a(btex \lam{(λx.λy. (+) x y) 1} etex);
1637         newCircle.b(btex \lam{λy. (+) 1 y} etex);
1638         newCircle.c(btex \lam{(λx.(+) x) 1} etex);
1639         newCircle.d(btex \lam{(+) 1} etex);
1640
1641         b.c = origin;
1642         c.c = b.c + (4cm, 0cm);
1643         a.c = midpoint(b.c, c.c) + (0cm, 4cm);
1644         d.c = midpoint(b.c, c.c) - (0cm, 3cm);
1645
1646         % β-conversion between a and b
1647         ncarc.a(a)(b) "name(bred)";
1648         ObjLabel.a(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
1649         ncarc.b(b)(a) "name(bexp)", "linestyle(dashed withdots)";
1650         ObjLabel.b(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
1651
1652         % η-conversion between a and c
1653         ncarc.a(a)(c) "name(ered)";
1654         ObjLabel.a(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
1655         ncarc.c(c)(a) "name(eexp)", "linestyle(dashed withdots)";
1656         ObjLabel.c(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
1657
1658         % η-conversion between b and d
1659         ncarc.b(b)(d) "name(ered)";
1660         ObjLabel.b(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
1661         ncarc.d(d)(b) "name(eexp)", "linestyle(dashed withdots)";
1662         ObjLabel.d(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
1663
1664         % β-conversion between c and d
1665         ncarc.c(c)(d) "name(bred)";
1666         ObjLabel.c(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
1667         ncarc.d(d)(c) "name(bexp)", "linestyle(dashed withdots)";
1668         ObjLabel.d(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
1669
1670         % Draw objects and lines
1671         drawObj(a, b, c, d);
1672       \stopuseMPgraphic
1673
1674       \placeexample[right][ex:TransformGraph]{Partial graph of a labmda calculus
1675       system with β and η reduction (solid lines) and expansion (dotted lines).}
1676           \boxedgraphic{TransformGraph}
1677
1678       Of course our graph is unbounded, since we can construct an infinite amount of
1679       Core expressions. Also, there might potentially be multiple edges between two
1680       given nodes (with different labels), though seems unlikely to actually happen
1681       in our system.
1682
1683       See \in{example}[ex:TransformGraph] for the graph representation of a very
1684       simple lambda calculus that contains just the expressions \lam{(λx.λy. (+) x
1685       y) 1}, \lam{λy. (+) 1 y}, \lam{(λx.(+) x) 1} and \lam{(+) 1}. The
1686       transformation system consists of β-reduction and η-reduction (solid edges) or
1687       β-reduction and η-reduction (dotted edges).
1688
1689       \todo{Define β-reduction and η-reduction?}
1690
1691       Note that the normal form of such a system consists of the set of nodes
1692       (expressions) without outgoing edges, since those are the expression to which
1693       no transformation applies anymore. We call this set of nodes the \emph{normal
1694       set}.
1695
1696       From such a graph, we can derive some properties easily:
1697       \startitemize[KR]
1698         \item A system will \emph{terminate} if there is no path of infinite length
1699         in the graph (this includes cycles).
1700         \item Soundness is not easily represented in the graph.
1701         \item A system is \emph{complete} if all of the nodes in the normal set have
1702         the intended normal form. The inverse (that all of the nodes outside of
1703         the normal set are \emph{not} in the intended normal form) is not
1704         strictly required.
1705         \item A system is deterministic if all paths from a node, which end in a node
1706         in the normal set, end at the same node.
1707       \stopitemize
1708
1709       When looking at the \in{example}[ex:TransformGraph], we see that the system
1710       terminates for both the reduction and expansion systems (but note that, for
1711       expansion, this is only true because we've limited the possible expressions!
1712       In comlete lambda calculus, there would be a path from \lam{(λx.λy. (+) x y)
1713       1} to \lam{(λx.λy.(λz.(+) z) x y) 1} to \lam{(λx.λy.(λz.(λq.(+) q) z) x y) 1}
1714       etc.)
1715
1716       If we would consider the system with both expansion and reduction, there would
1717       no longer be termination, since there would be cycles all over the place.
1718
1719       The reduction and expansion systems have a normal set of containing just
1720       \lam{(+) 1} or \lam{(λx.λy. (+) x y) 1} respectively. Since all paths in
1721       either system end up in these normal forms, both systems are \emph{complete}.
1722       Also, since there is only one normal form, it must obviously be
1723       \emph{deterministic} as well.
1724
1725     \subsection{Termination}
1726       Approach: Counting.
1727
1728       Church-Rosser?
1729
1730     \subsection{Soundness}
1731       Needs formal definition of semantics.
1732       Prove for each transformation seperately, implies soundness of the system.
1733
1734     \subsection{Completeness}
1735       Show that any transformation applies to every Core expression that is not
1736       in normal form. To prove: no transformation applies => in intended form.
1737       Show the reverse: Not in intended form => transformation applies.
1738
1739     \subsection{Determinism}
1740       How to prove this?