Chapters/Normalization.tex

   1 \chapter[chap:normalization]{Normalization}
   2
   3 % A helper to print a single example in the half the page width. The example
   4 % text should be in a buffer whose name is given in an argument.
   5 %
   6 % The align=right option really does left-alignment, but without the program
   7 % will end up on a single line. The strut=no option prevents a bunch of empty
   8 % space at the start of the frame.
   9 \define[1]\example{
  10   \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
  11     \setuptyping[option=LAM,style=sans,before=,after=]
  12     \typebuffer[#1]
  13     \setuptyping[option=none,style=\tttf]
  14   }
  15 }
  16
  17 \define[3]\transexample{
  18   \placeexample[here]{#1}
  19   \startcombination[2*1]
  20     {\example{#2}}{Original program}
  21     {\example{#3}}{Transformed program}
  22   \stopcombination
  23 }
  24
  25 The first step in the core to \small{VHDL} translation process, is normalization. We
  26 aim to bring the core description into a simpler form, which we can
  27 subsequently translate into \small{VHDL} easily. This normal form is needed because
  28 the full core language is more expressive than \small{VHDL} in some areas and because
  29 core can describe expressions that do not have a direct hardware
  30 interpretation.
  31
  32 TODO: Describe core properties not supported in \small{VHDL}, and describe how the
  33 \small{VHDL} we want to generate should look like.
  34
  35 \section{Normal form}
  36 The transformations described here have a well-defined goal: To bring the
  37 program in a well-defined form that is directly translatable to hardware,
  38 while fully preserving the semantics of the program. We refer to this form as
  39 the \emph{normal form} of the program. The formal definition of this normal
  40 form is quite simple:
  41
  42 \placedefinition{}{A program is in \emph{normal form} if none of the
  43 transformations from this chapter apply.}
  44
  45 Of course, this is an \quote{easy} definition of the normal form, since our
  46 program will end up in normal form automatically. The more interesting part is
  47 to see if this normal form actually has the properties we would like it to
  48 have.
  49
  50 But, before getting into more definitions and details about this normal form,
  51 let's try to get a feeling for it first. The easiest way to do this is by
  52 describing the things we want to not have in a normal form.
  53
  54 \startitemize
  55   \item Any \emph{polymorphism} must be removed. When laying down hardware, we
  56   can't generate any signals that can have multiple types. All types must be
  57   completely known to generate hardware.
  58
  59   \item Any \emph{higher order} constructions must be removed. We can't
  60   generate a hardware signal that contains a function, so all values,
  61   arguments and returns values used must be first order.
  62
  63   \item Any complex \emph{nested scopes} must be removed. In the \small{VHDL}
  64   description, every signal is in a single scope. Also, full expressions are
  65   not supported everywhere (in particular port maps can only map signal names,
  66   not expressions). To make the \small{VHDL} generation easy, all values must be bound
  67   on the \quote{top level}.
  68 \stopitemize
  69
  70 TODO: Intermezzo: functions vs plain values
  71
  72 A very simple example of a program in normal form is given in
  73 \in{example}[ex:MulSum]. As you can see, all arguments to the function (which
  74 will become input ports in the final hardware) are at the top. This means that
  75 the body of the final lambda abstraction is never a function, but always a
  76 plain value.
  77
  78 After the lambda abstractions, we see a single let expression, that binds two
  79 variables (\lam{mul} and \lam{sum}). These variables will be signals in the
  80 final hardware, bound to the output port of the \lam{*} and \lam{+}
  81 components.
  82
  83 The final line (the \quote{return value} of the function) selects the
  84 \lam{sum} signal to be the output port of the function. This \quote{return
  85 value} can always only be a variable reference, never a more complex
  86 expression.
  87
  88 \startbuffer[MulSum]
  89 alu :: Bit -> Word -> Word -> Word
  90 alu = λa.λb.λc.
  91     let
  92       mul = (*) a b
  93       sum = (+) mul c
  94     in
  95       sum
  96 \stopbuffer
  97
  98 \startuseMPgraphic{MulSum}
  99   save a, b, c, mul, add, sum;
 100
 101   % I/O ports
 102   newCircle.a(btex $a$ etex) "framed(false)";
 103   newCircle.b(btex $b$ etex) "framed(false)";
 104   newCircle.c(btex $c$ etex) "framed(false)";
 105   newCircle.sum(btex $res$ etex) "framed(false)";
 106
 107   % Components
 108   newCircle.mul(btex - etex);
 109   newCircle.add(btex + etex);
 110
 111   a.c      - b.c   = (0cm, 2cm);
 112   b.c      - c.c   = (0cm, 2cm);
 113   add.c            = c.c + (2cm, 0cm);
 114   mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
 115   sum.c            = add.c + (2cm, 0cm);
 116   c.c              = origin;
 117
 118   % Draw objects and lines
 119   drawObj(a, b, c, mul, add, sum);
 120
 121   ncarc(a)(mul) "arcangle(15)";
 122   ncarc(b)(mul) "arcangle(-15)";
 123   ncline(c)(add);
 124   ncline(mul)(add);
 125   ncline(add)(sum);
 126 \stopuseMPgraphic
 127
 128 \placeexample[here][ex:MulSum]{Simple architecture consisting of an adder and a
 129 subtractor.}
 130   \startcombination[2*1]
 131     {\typebufferlam{MulSum}}{Core description in normal form.}
 132     {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
 133   \stopcombination
 134
 135 The previous example described composing an architecture by calling other
 136 functions (operators), resulting in a simple architecture with component and
 137 connection. There is of course also some mechanism for choice in the normal
 138 form. In a normal Core program, the \emph{case} expression can be used in a
 139 few different ways to describe choice. In normal form, this is limited to a
 140 very specific form.
 141
 142 \in{Example}[ex:AddSubAlu] shows an example describing a
 143 simple \small{ALU}, which chooses between two operations based on an opcode
 144 bit. The main structure is the same as in \in{example}[ex:MulSum], but this
 145 time the \lam{res} variable is bound to a case expression. This case
 146 expression scrutinizes the variable \lam{opcode} (and scrutinizing more
 147 complex expressions is not supported). The case expression can select a
 148 different variable based on the constructor of \lam{opcode}.
 149
 150 \startbuffer[AddSubAlu]
 151 alu :: Bit -> Word -> Word -> Word
 152 alu = λopcode.λa.λb.
 153     let
 154       res1 = (+) a b
 155       res2 = (-) a b
 156       res = case opcode of
 157         Low -> res1
 158         High -> res2
 159     in
 160       res
 161 \stopbuffer
 162
 163 \startuseMPgraphic{AddSubAlu}
 164   save opcode, a, b, add, sub, mux, res;
 165
 166   % I/O ports
 167   newCircle.opcode(btex $opcode$ etex) "framed(false)";
 168   newCircle.a(btex $a$ etex) "framed(false)";
 169   newCircle.b(btex $b$ etex) "framed(false)";
 170   newCircle.res(btex $res$ etex) "framed(false)";
 171   % Components
 172   newCircle.add(btex + etex);
 173   newCircle.sub(btex - etex);
 174   newMux.mux;
 175
 176   opcode.c - a.c   = (0cm, 2cm);
 177   add.c    - a.c   = (4cm, 0cm);
 178   sub.c    - b.c   = (4cm, 0cm);
 179   a.c      - b.c   = (0cm, 3cm);
 180   mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
 181   res.c    - mux.c = (1.5cm, 0cm);
 182   b.c              = origin;
 183
 184   % Draw objects and lines
 185   drawObj(opcode, a, b, res, add, sub, mux);
 186
 187   ncline(a)(add) "posA(e)";
 188   ncline(b)(sub) "posA(e)";
 189   nccurve(a)(sub) "posA(e)", "angleA(0)";
 190   nccurve(b)(add) "posA(e)", "angleA(0)";
 191   nccurve(add)(mux) "posB(inpa)", "angleB(0)";
 192   nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
 193   nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
 194   ncline(mux)(res) "posA(out)";
 195 \stopuseMPgraphic
 196
 197 \placeexample[here][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
 198   \startcombination[2*1]
 199     {\typebufferlam{AddSubAlu}}{Core description in normal form.}
 200     {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
 201   \stopcombination
 202
 203 As a more complete example, consider \in{example}[ex:NormalComplete]. This
 204 example contains everything that is supported in normal form, with the
 205 exception of builtin higher order functions. The graphical version of the
 206 architecture contains a slightly simplified version, since the state tuple
 207 packing and unpacking have been left out. Instead, two seperate registers are
 208 drawn. Also note that most synthesis tools will further optimize this
 209 architecture by removing the multiplexers at the register input and replace
 210 them with some logic in the clock inputs, but we want to show the architecture
 211 as close to the description as possible.
 212
 213 \startbuffer[NormalComplete]
 214   regbank :: Bit
 215              -> Word
 216              -> State (Word, Word)
 217              -> (State (Word, Word), Word)
 218
 219   -- All arguments are an inital lambda
 220   regbank = λa.λd.λsp.
 221   -- There are nested let expressions at top level
 222   let
 223     -- Unpack the state by coercion (\eg, cast from
 224     -- State (Word, Word) to (Word, Word))
 225     s = sp :: (Word, Word)
 226     -- Extract both registers from the state
 227     r1 = case s of (fst, snd) -> fst
 228     r2 = case s of (fst, snd) -> snd
 229     -- Calling some other user-defined function.
 230     d' = foo d
 231     -- Conditional connections
 232     out = case a of
 233       High -> r1
 234       Low -> r2
 235     r1' = case a of
 236       High -> d'
 237       Low -> r1
 238     r2' = case a of
 239       High -> r2
 240       Low -> d'
 241     -- Packing a tuple
 242     s' = (,) r1' r2'
 243     -- pack the state by coercion (\eg, cast from
 244     -- (Word, Word) to State (Word, Word))
 245     sp' = s' :: State (Word, Word)
 246     -- Pack our return value
 247     res = (,) sp' out
 248   in
 249     -- The actual result
 250     res
 251 \stopbuffer
 252
 253 \startuseMPgraphic{NormalComplete}
 254   save a, d, r, foo, muxr, muxout, out;
 255
 256   % I/O ports
 257   newCircle.a(btex \lam{a} etex) "framed(false)";
 258   newCircle.d(btex \lam{d} etex) "framed(false)";
 259   newCircle.out(btex \lam{out} etex) "framed(false)";
 260   % Components
 261   %newCircle.add(btex + etex);
 262   newBox.foo(btex \lam{foo} etex);
 263   newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
 264   newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
 265   newMux.muxr1;
 266   % Reflect over the vertical axis
 267   reflectObj(muxr1)((0,0), (0,1));
 268   newMux.muxr2;
 269   newMux.muxout;
 270   rotateObj(muxout)(-90);
 271
 272   d.c               = foo.c + (0cm, 1.5cm);
 273   a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
 274   foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
 275   muxr1.c           = r1.c + (0cm, 2cm);
 276   muxr2.c           = r2.c + (0cm, 2cm);
 277   r2.c              = r1.c + (4cm, 0cm);
 278   r1.c              = origin;
 279   muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
 280   out.c             = muxout.c - (0cm, 1.5cm);
 281
 282 %  % Draw objects and lines
 283   drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
 284
 285   ncline(d)(foo);
 286   nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
 287   nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
 288   nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
 289   nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
 290   nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
 291   nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
 292   nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
 293   nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
 294   % Connect port a
 295   nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
 296   nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
 297   nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
 298   ncline(muxout)(out) "posA(out)";
 299 \stopuseMPgraphic
 300
 301 \placeexample[here][ex:NormalComplete]{Simple architecture consisting of an adder and a
 302 subtractor.}
 303   \startcombination[2*1]
 304     {\typebufferlam{NormalComplete}}{Core description in normal form.}
 305     {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
 306   \stopcombination
 307
 308 \subsection{Intended normal form definition}
 309 Now we have some intuition for the normal form, we can describe how we want
 310 the normal form to look like in a slightly more formal manner. The following
 311 EBNF-like description completely captures the intended structure (and
 312 generates a subset of GHC's core format).
 313
 314 Some clauses have an expression listed in parentheses. These are conditions
 315 that need to apply to the clause.
 316
 317 \startlambda
 318 \italic{normal} = \italic{lambda}
 319 \italic{lambda} = λvar.\italic{lambda} (representable(var))
 320                 | \italic{toplet}
 321 \italic{toplet} = let \italic{binding} in \italic{toplet}
 322                 | letrec [\italic{binding}] in \italic{toplet}
 323                 | var (representable(varvar))
 324 \italic{binding} = var = \italic{rhs} (representable(rhs))
 325                  -- State packing and unpacking by coercion
 326                  | var0 = var1 :: State ty (lvar(var1))
 327                  | var0 = var1 :: ty (var0 :: State ty) (lvar(var1))
 328 \italic{rhs} = userapp
 329              | builtinapp
 330              -- Extractor case
 331              | case var of C a0 ... an -> ai (lvar(var))
 332              -- Selector case
 333              | case var of (lvar(var))
 334                 DEFAULT -> var0 (lvar(var0))
 335                 C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar))
 336 \italic{userapp} = \italic{userfunc}
 337                  | \italic{userapp} {userarg}
 338 \italic{userfunc} = var (gvar(var))
 339 \italic{userarg} = var (lvar(var))
 340 \italic{builtinapp} = \italic{builtinfunc}
 341                     | \italic{builtinapp} \italic{builtinarg}
 342 \italic{builtinfunc} = var (bvar(var))
 343 \italic{builtinarg} = \italic{coreexpr}
 344 \stoplambda
 345
 346 -- TODO: Limit builtinarg further
 347
 348 -- TODO: There can still be other casts around (which the code can handle,
 349 e.g., ignore), which still need to be documented here.
 350
 351 -- TODO: Note about the selector case. It just supports Bit and Bool
 352 currently, perhaps it should be generalized in the normal form?
 353
 354 When looking at such a program from a hardware perspective, the top level
 355 lambda's define the input ports. The value produced by the let expression is
 356 the output port. Most function applications bound by the let expression
 357 define a component instantiation, where the input and output ports are mapped
 358 to local signals or arguments. Some of the others use a builtin
 359 construction (\eg the \lam{case} statement) or call a builtin function
 360 (\eg \lam{add} or \lam{sub}). For these, a hardcoded \small{VHDL} translation is
 361 available.
 362
 363 \section{Transformation notation}
 364 To be able to concisely present transformations, we use a specific format to
 365 them. It is a simple format, similar to one used in logic reasoning.
 366
 367 Such a transformation description looks like the following.
 368
 369 \starttrans
 370 <context conditions>
 371 ~
 372 <original expression>
 373 --------------------------          <expression conditions>
 374 <transformed expresssion>
 375 ~
 376 <context additions>
 377 \stoptrans
 378
 379 This format desribes a transformation that applies to \lam{original
 380 expresssion} and transforms it into \lam{transformed expression}, assuming
 381 that all conditions apply. In this format, there are a number of placeholders
 382 in pointy brackets, most of which should be rather obvious in their meaning.
 383 Nevertheless, we will more precisely specify their meaning below:
 384
 385   \startdesc{<original expression>} The expression pattern that will be matched
 386   against (subexpressions of) the expression to be transformed. We call this a
 387   pattern, because it can contain \emph{placeholders} (variables), which match
 388   any expression or binder. Any such placeholder is said to be \emph{bound} to
 389   the expression it matches. It is convention to use an uppercase latter (\eg
 390   \lam{M} or \lam{E} to refer to any expression (including a simple variable
 391   reference) and lowercase letters (\eg \lam{v} or \lam{b}) to refer to
 392   (references to) binders.
 393
 394   For example, the pattern \lam{a + B} will match the expression
 395   \lam{v + (2 * w)} (and bind \lam{a} to \lam{v} and \lam{B} to
 396   \lam{(2 * 2)}), but not \lam{v + (2 * w)}.
 397   \stopdesc
 398
 399   \startdesc{<expression conditions>}
 400   These are extra conditions on the expression that is matched. These
 401   conditions can be used to further limit the cases in which the
 402   transformation applies, in particular to prevent a transformation from
 403   causing a loop with itself or another transformation.
 404
 405   Only if these if these conditions are \emph{all} true, this transformation
 406   applies.
 407   \stopdesc
 408
 409   \startdesc{<context conditions>}
 410   These are a number of extra conditions on the context of the function. In
 411   particular, these conditions can require some other top level function to be
 412   present, whose value matches the pattern given here. The format of each of
 413   these conditions is: \lam{binder = <pattern>}.
 414
 415   Typically, the binder is some placeholder bound in the \lam{<original
 416   expression>}, while the pattern contains some placeholders that are used in
 417   the \lam{transformed expression}.
 418
 419   Only if a top level binder exists that matches each binder and pattern, this
 420   transformation applies.
 421   \stopdesc
 422
 423   \startdesc{<transformed expression>}
 424   This is the expression template that is the result of the transformation. If, looking
 425   at the above three items, the transformation applies, the \lam{original
 426   expression} is completely replaced with the \lam{<transformed expression>}.
 427   We call this a template, because it can contain placeholders, referring to
 428   any placeholder bound by the \lam{<original expression>} or the
 429   \lam{<context conditions>}. The resulting expression will have those
 430   placeholders replaced by the values bound to them.
 431
 432   Any binder (lowercase) placeholder that has no value bound to it yet will be
 433   bound to (and replaced with) a fresh binder.
 434   \stopdesc
 435
 436   \startdesc{<context additions>}
 437   These are templates for new functions to add to the context. This is a way
 438   to have a transformation create new top level functiosn.
 439
 440   Each addition has the form \lam{binder = template}. As above, any
 441   placeholder in the addition is replaced with the value bound to it, and any
 442   binder placeholder that has no value bound to it yet will be bound to (and
 443   replaced with) a fresh binder.
 444   \stopdesc
 445
 446   As an example, we'll look at η-abstraction:
 447
 448 \starttrans
 449 E                 \lam{E :: a -> b}
 450 --------------    \lam{E} does not occur on a function position in an application
 451 λx.E x            \lam{E} is not a lambda abstraction.
 452 \stoptrans
 453
 454   Consider the following function, which is a fairly obvious way to specify a
 455   simple ALU (Note \at{example}[ex:AddSubAlu] is the normal form of this
 456   function):
 457
 458 \startlambda
 459 alu :: Bit -> Word -> Word -> Word
 460 alu = λopcode. case opcode of
 461   Low -> (+)
 462   High -> (-)
 463 \stoplambda
 464
 465   There are a few subexpressions in this function to which we could possibly
 466   apply the transformation. Since the pattern of the transformation is only
 467   the placeholder \lam{E}, any expression will match that. Whether the
 468   transformation applies to an expression is thus solely decided by the
 469   conditions to the right of the transformation.
 470
 471   We will look at each expression in the function in a top down manner. The
 472   first expression is the entire expression the function is bound to.
 473
 474 \startlambda
 475 λopcode. case opcode of
 476   Low -> (+)
 477   High -> (-)
 478 \stoplambda
 479
 480   As said, the expression pattern matches this. The type of this expression is
 481   \lam{Bit -> Word -> Word -> Word}, which matches \lam{a -> b} (Note that in
 482   this case \lam{a = Bit} and \lam{b = Word -> Word -> Word}).
 483
 484   Since this expression is at top level, it does not occur at a function
 485   position of an application. However, The expression is a lambda abstraction,
 486   so this transformation does not apply.
 487
 488   The next expression we could apply this transformation to, is the body of
 489   the lambda abstraction:
 490
 491 \startlambda
 492 case opcode of
 493   Low -> (+)
 494   High -> (-)
 495 \stoplambda
 496
 497   The type of this expression is \lam{Word -> Word -> Word}, which again
 498   matches \lam{a -> b}. The expression is the body of a lambda expression, so
 499   it does not occur at a function position of an application. Finally, the
 500   expression is not a lambda abstraction but a case expression, so all the
 501   conditions match. There are no context conditions to match, so the
 502   transformation applies.
 503
 504   By now, the placeholder \lam{E} is bound to the entire expression. The
 505   placeholder \lam{x}, which occurs in the replacement template, is not bound
 506   yet, so we need to generate a fresh binder for that. Let's use the binder
 507   \lam{a}. This results in the following replacement expression:
 508
 509 \startlambda
 510 λa.(case opcode of
 511   Low -> (+)
 512   High -> (-)) a
 513 \stoplambda
 514
 515   Continuing with this expression, we see that the transformation does not
 516   apply again (it is a lambda expression). Next we look at the body of this
 517   labmda abstraction:
 518
 519 \startlambda
 520 (case opcode of
 521   Low -> (+)
 522   High -> (-)) a
 523 \stoplambda
 524
 525   Here, the transformation does apply, binding \lam{E} to the entire
 526   expression and \lam{x} to the fresh binder \lam{b}, resulting in the
 527   replacement:
 528
 529 \startlambda
 530 λb.(case opcode of
 531   Low -> (+)
 532   High -> (-)) a b
 533 \stoplambda
 534
 535   Again, the transformation does not apply to this lambda abstraction, so we
 536   look at its body. For brevity, we'll put the case statement on one line from
 537   now on.
 538
 539 \startlambda
 540 (case opcode of Low -> (+); High -> (-)) a b
 541 \stoplambda
 542
 543   The type of this expression is \lam{Word}, so it does not match \lam{a -> b}
 544   and the transformation does not apply. Next, we have two options for the
 545   next expression to look at: The function position and argument position of
 546   the application. The expression in the argument position is \lam{b}, which
 547   has type \lam{Word}, so the transformation does not apply. The expression in
 548   the function position is:
 549
 550 \startlambda
 551 (case opcode of Low -> (+); High -> (-)) a
 552 \stoplambda
 553
 554   Obviously, the transformation does not apply here, since it occurs in
 555   function position. In the same way the transformation does not apply to both
 556   components of this expression (\lam{case opcode of Low -> (+); High -> (-)}
 557   and \lam{a}), so we'll skip to the components of the case expression: The
 558   scrutinee and both alternatives. Since the opcode is not a function, it does
 559   not apply here, and we'll leave both alternatives as an exercise to the
 560   reader. The final function, after all these transformations becomes:
 561
 562 \startlambda
 563 alu :: Bit -> Word -> Word -> Word
 564 alu = λopcode.λa.b. (case opcode of
 565   Low -> λa1.λb1 (+) a1 b1
 566   High -> λa2.λb2 (-) a2 b2) a b
 567 \stoplambda
 568
 569   In this case, the transformation does not apply anymore, though this might
 570   not always be the case (e.g., the application of a transformation on a
 571   subexpression might open up possibilities to apply the transformation
 572   further up in the expression).
 573
 574 \subsection{Transformation application}
 575 In this chapter we define a number of transformations, but how will we apply
 576 these? As stated before, our normal form is reached as soon as no
 577 transformation applies anymore. This means our application strategy is to
 578 simply apply any transformation that applies, and continuing to do that with
 579 the result of each transformation.
 580
 581 In particular, we define no particular order of transformations. Since
 582 transformation order should not influence the resulting normal form (see TODO
 583 ref), this leaves the implementation free to choose any application order that
 584 results in an efficient implementation.
 585
 586 When applying a single transformation, we try to apply it to every (sub)expression
 587 in a function, not just the top level function. This allows us to keep the
 588 transformation descriptions concise and powerful.
 589
 590 \subsection{Definitions}
 591 In the following sections, we will be using a number of functions and
 592 notations, which we will define here.
 593
 594 \subsubsection{Other concepts}
 595 A \emph{global variable} is any variable that is bound at the
 596 top level of a program, or an external module. A \emph{local variable} is any
 597 other variable (\eg, variables local to a function, which can be bound by
 598 lambda abstractions, let expressions and pattern matches of case
 599 alternatives).  Note that this is a slightly different notion of global versus
 600 local than what \small{GHC} uses internally.
 601 \defref{global variable} \defref{local variable}
 602
 603 A \emph{hardware representable} (or just \emph{representable}) type or value
 604 is (a value of) a type that we can generate a signal for in hardware. For
 605 example, a bit, a vector of bits, a 32 bit unsigned word, etc. Types that are
 606 not runtime representable notably include (but are not limited to): Types,
 607 dictionaries, functions.
 608 \defref{representable}
 609
 610 A \emph{builtin function} is a function supplied by the Cλash framework, whose
 611 implementation is not valid Cλash. The implementation is of course valid
 612 Haskell, for simulation, but it is not expressable in Cλash.
 613 \defref{builtin function} \defref{user-defined function}
 614
 615 For these functions, Cλash has a \emph{builtin hardware translation}, so calls
 616 to these functions can still be translated. These are functions like
 617 \lam{map}, \lam{hwor} and \lam{length}.
 618
 619 A \emph{user-defined} function is a function for which we do have a Cλash
 620 implementation available.
 621
 622 \subsubsection{Functions}
 623 Here, we define a number of functions that can be used below to concisely
 624 specify conditions.
 625
 626 \refdef{global variable}\emph{gvar(expr)} is true when \emph{expr} is a variable that references a
 627 global variable. It is false when it references a local variable.
 628
 629 \refdef{local variable}\emph{lvar(expr)} is the complement of \emph{gvar}; it is true when \emph{expr}
 630 references a local variable, false when it references a global variable.
 631
 632 \refdef{representable}\emph{representable(expr)} or \emph{representable(var)} is true when
 633 \emph{expr} or \emph{var} is \emph{representable}.
 634
 635 \subsection{Binder uniqueness}
 636 A common problem in transformation systems, is binder uniqueness. When not
 637 considering this problem, it is easy to create transformations that mix up
 638 bindings and cause name collisions. Take for example, the following core
 639 expression:
 640
 641 \startlambda
 642 (λa.λb.λc. a * b * c) x c
 643 \stoplambda
 644
 645 By applying β-reduction (see below) once, we can simplify this expression to:
 646
 647 \startlambda
 648 (λb.λc. x * b * c) c
 649 \stoplambda
 650
 651 Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
 652 binder. No harm done here. But note that we see multiple occurences of the
 653 \lam{c} binder. The first is a binding occurence, to which the second refers.
 654 The last, however refers to \emph{another} instance of \lam{c}, which is
 655 bound somewhere outside of this expression. Now, if we would apply beta
 656 reduction without taking heed of binder uniqueness, we would get:
 657
 658 \startlambda
 659 λc. x * c * c
 660 \stoplambda
 661
 662 This is obviously not what was supposed to happen! The root of this problem is
 663 the reuse of binders: Identical binders can be bound in different scopes, such
 664 that only the inner one is \quote{visible} in the inner expression. In the example
 665 above, the \lam{c} binder was bound outside of the expression and in the inner
 666 lambda expression. Inside that lambda expression, only the inner \lam{c} is
 667 visible.
 668
 669 There are a number of ways to solve this. \small{GHC} has isolated this
 670 problem to their binder substitution code, which performs \emph{deshadowing}
 671 during its expression traversal. This means that any binding that shadows
 672 another binding on a higher level is replaced by a new binder that does not
 673 shadow any other binding. This non-shadowing invariant is enough to prevent
 674 binder uniqueness problems in \small{GHC}.
 675
 676 In our transformation system, maintaining this non-shadowing invariant is
 677 a bit harder to do (mostly due to implementation issues, the prototype doesn't
 678 use \small{GHC}'s subsitution code). Also, we can observe the following
 679 points.
 680
 681 \startitemize
 682 \item Deshadowing does not guarantee overall uniqueness. For example, the
 683 following (slightly contrived) expression shows the identifier \lam{x} bound in
 684 two seperate places (and to different values), even though no shadowing
 685 occurs.
 686
 687 \startlambda
 688 (let x = 1 in x) + (let x = 2 in x)
 689 \stoplambda
 690
 691 \item In our normal form (and the resulting \small{VHDL}), all binders
 692 (signals) will end up in the same scope. To allow this, all binders within the
 693 same function should be unique.
 694
 695 \item When we know that all binders in an expression are unique, moving around
 696 or removing a subexpression will never cause any binder conflicts. If we have
 697 some way to generate fresh binders, introducing new subexpressions will not
 698 cause any problems either. The only way to cause conflicts is thus to
 699 duplicate an existing subexpression.
 700 \stopitemize
 701
 702 Given the above, our prototype maintains a unique binder invariant. This
 703 meanst that in any given moment during normalization, all binders \emph{within
 704 a single function} must be unique. To achieve this, we apply the following
 705 technique.
 706
 707 TODO: Define fresh binders and unique supplies
 708
 709 \startitemize
 710 \item Before starting normalization, all binders in the function are made
 711 unique. This is done by generating a fresh binder for every binder used. This
 712 also replaces binders that did not pose any conflict, but it does ensure that
 713 all binders within the function are generated by the same unique supply. See
 714 (TODO: ref fresh binder).
 715 \item Whenever a new binder must be generated, we generate a fresh binder that
 716 is guaranteed to be different from \emph{all binders generated so far}. This
 717 can thus never introduce duplication and will maintain the invariant.
 718 \item Whenever (part of) an expression is duplicated (for example when
 719 inlining), all binders in the expression are replaced with fresh binders
 720 (using the same method as at the start of normalization). These fresh binders
 721 can never introduce duplication, so this will maintain the invariant.
 722 \item Whenever we move part of an expression around within the function, there
 723 is no need to do anything special. There is obviously no way to introduce
 724 duplication by moving expressions around. Since we know that each of the
 725 binders is already unique, there is no way to introduce (incorrect) shadowing
 726 either.
 727 \stopitemize
 728
 729 \section{Transform passes}
 730 In this section we describe the actual transforms. Here we're using
 731 the core language in a notation that resembles lambda calculus.
 732
 733 Each of these transforms is meant to be applied to every (sub)expression
 734 in a program, for as long as it applies. Only when none of the
 735 transformations can be applied anymore, the program is in normal form (by
 736 definition). We hope to be able to prove that this form will obey all of the
 737 constraints defined above, but this has yet to happen (though it seems likely
 738 that it will).
 739
 740 Each of the transforms will be described informally first, explaining
 741 the need for and goal of the transform. Then, a formal definition is
 742 given, using a familiar syntax from the world of logic. Each transform
 743 is specified as a number of conditions (above the horizontal line) and a
 744 number of conclusions (below the horizontal line). The details of using
 745 this notation are still a bit fuzzy, so comments are welcom.
 746
 747   \subsection{General cleanup}
 748
 749     \subsubsection{β-reduction}
 750       β-reduction is a well known transformation from lambda calculus, where it is
 751       the main reduction step. It reduces applications of labmda abstractions,
 752       removing both the lambda abstraction and the application.
 753
 754       In our transformation system, this step helps to remove unwanted lambda
 755       abstractions (basically all but the ones at the top level). Other
 756       transformations (application propagation, non-representable inlining) make
 757       sure that most lambda abstractions will eventually be reducable by
 758       β-reduction.
 759
 760       TODO: Define substitution syntax
 761
 762       \starttrans
 763       (λx.E) M
 764       -----------------
 765       E[M/x]
 766       \stoptrans
 767
 768       % And an example
 769       \startbuffer[from]
 770       (λa. 2 * a) (2 * b)
 771       \stopbuffer
 772
 773       \startbuffer[to]
 774       2 * (2 * b)
 775       \stopbuffer
 776
 777       \transexample{β-reduction}{from}{to}
 778
 779     \subsubsection{Application propagation}
 780       This transformation is meant to propagate application expressions downwards
 781       into expressions as far as possible. This allows partial applications inside
 782       expressions to become fully applied and exposes new transformation
 783       possibilities for other transformations (like β-reduction).
 784
 785       \starttrans
 786       let binds in E) M
 787       -----------------
 788       let binds in E M
 789       \stoptrans
 790
 791       % And an example
 792       \startbuffer[from]
 793       ( let
 794           val = 1
 795         in
 796           add val
 797       ) 3
 798       \stopbuffer
 799
 800       \startbuffer[to]
 801       let
 802         val = 1
 803       in
 804         add val 3
 805       \stopbuffer
 806
 807       \transexample{Application propagation for a let expression}{from}{to}
 808
 809       \starttrans
 810       (case x of
 811         p1 -> E1
 812         \vdots
 813         pn -> En) M
 814       -----------------
 815       case x of
 816         p1 -> E1 M
 817         \vdots
 818         pn -> En M
 819       \stoptrans
 820
 821       % And an example
 822       \startbuffer[from]
 823       ( case x of
 824           True -> id
 825           False -> neg
 826       ) 1
 827       \stopbuffer
 828
 829       \startbuffer[to]
 830       case x of
 831         True -> id 1
 832         False -> neg 1
 833       \stopbuffer
 834
 835       \transexample{Application propagation for a case expression}{from}{to}
 836
 837     \subsubsection{Empty let removal}
 838       This transformation is simple: It removes recursive lets that have no bindings
 839       (which usually occurs when let derecursification removes the last binding from
 840       it).
 841
 842       \starttrans
 843       letrec in M
 844       --------------
 845       M
 846       \stoptrans
 847
 848       \subsubsection{Simple let binding removal}
 849       This transformation inlines simple let bindings (\eg a = b).
 850
 851       This transformation is not needed to get into normal form, but makes the
 852       resulting \small{VHDL} a lot shorter.
 853
 854       \starttrans
 855       letnonrec
 856         a = b
 857       in
 858         M
 859       -----------------
 860       M[b/a]
 861       \stoptrans
 862
 863       \starttrans
 864       letrec
 865         \vdots
 866         a = b
 867         \vdots
 868       in
 869         M
 870       -----------------
 871       let
 872         \vdots [b/a]
 873         \vdots [b/a]
 874       in
 875         M[b/a]
 876       \stoptrans
 877
 878     \subsubsection{Unused let binding removal}
 879       This transformation removes let bindings that are never used. Usually,
 880       the desugarer introduces some unused let bindings.
 881
 882       This normalization pass should really be unneeded to get into normal form
 883       (since ununsed bindings are not forbidden by the normal form), but in practice
 884       the desugarer or simplifier emits some unused bindings that cannot be
 885       normalized (e.g., calls to a \type{PatError} (TODO: Check this name)). Also,
 886       this transformation makes the resulting \small{VHDL} a lot shorter.
 887
 888       \starttrans
 889       let a = E in M
 890       ----------------------------    \lam{a} does not occur free in \lam{M}
 891       M
 892       \stoptrans
 893
 894       \starttrans
 895       letrec
 896         \vdots
 897         a = E
 898         \vdots
 899       in
 900         M
 901       ----------------------------    \lam{a} does not occur free in \lam{M}
 902       letrec
 903         \vdots
 904         \vdots
 905       in
 906         M
 907       \stoptrans
 908
 909     \subsubsection{Cast propagation / simplification}
 910       This transform pushes casts down into the expression as far as possible.
 911       Since its exact role and need is not clear yet, this transformation is
 912       not yet specified.
 913
 914     \subsubsection{Compiler generated top level binding inlining}
 915       TODO
 916
 917   \section{Program structure}
 918
 919     \subsubsection{η-abstraction}
 920       This transformation makes sure that all arguments of a function-typed
 921       expression are named, by introducing lambda expressions. When combined with
 922       β-reduction and function inlining below, all function-typed expressions should
 923       be lambda abstractions or global identifiers.
 924
 925       \starttrans
 926       E                 \lam{E :: a -> b}
 927       --------------    \lam{E} is not the first argument of an application.
 928       λx.E x            \lam{E} is not a lambda abstraction.
 929                         \lam{x} is a variable that does not occur free in \lam{E}.
 930       \stoptrans
 931
 932       \startbuffer[from]
 933       foo = λa.case a of
 934         True -> λb.mul b b
 935         False -> id
 936       \stopbuffer
 937
 938       \startbuffer[to]
 939       foo = λa.λx.(case a of
 940           True -> λb.mul b b
 941           False -> λy.id y) x
 942       \stopbuffer
 943
 944       \transexample{η-abstraction}{from}{to}
 945
 946     \subsubsection{Let derecursification}
 947       This transformation is meant to make lets non-recursive whenever possible.
 948       This might allow other optimizations to do their work better. TODO: Why is
 949       this needed exactly?
 950
 951     \subsubsection{Let flattening}
 952       This transformation puts nested lets in the same scope, by lifting the
 953       binding(s) of the inner let into a new let around the outer let. Eventually,
 954       this will cause all let bindings to appear in the same scope (they will all be
 955       in scope for the function return value).
 956
 957       Note that this transformation does not try to be smart when faced with
 958       recursive lets, it will just leave the lets recursive (possibly joining a
 959       recursive and non-recursive let into a single recursive let). The let
 960       rederecursification transformation will do this instead.
 961
 962       \starttrans
 963       letnonrec x = (let bindings in M) in N
 964       ------------------------------------------
 965       let bindings in (letnonrec x = M) in N
 966       \stoptrans
 967
 968       \starttrans
 969       letrec
 970         \vdots
 971         x = (let bindings in M)
 972         \vdots
 973       in
 974         N
 975       ------------------------------------------
 976       letrec
 977         \vdots
 978         bindings
 979         x = M
 980         \vdots
 981       in
 982         N
 983       \stoptrans
 984
 985       \startbuffer[from]
 986       let
 987         a = letrec
 988           x = 1
 989           y = 2
 990         in
 991           x + y
 992       in
 993         letrec
 994           b = let c = 3 in a + c
 995           d = 4
 996         in
 997           d + b
 998       \stopbuffer
 999       \startbuffer[to]
1000       letrec
1001         x = 1
1002         y = 2
1003       in
1004         let
1005           a = x + y
1006         in
1007           letrec
1008             c = 3
1009             b = a + c
1010             d = 4
1011           in
1012             d + b
1013       \stopbuffer
1014
1015       \transexample{Let flattening}{from}{to}
1016
1017     \subsubsection{Return value simplification}
1018       This transformation ensures that the return value of a function is always a
1019       simple local variable reference.
1020
1021       Currently implemented using lambda simplification, let simplification, and
1022       top simplification. Should change into something like the following, which
1023       works only on the result of a function instead of any subexpression. This is
1024       achieved by the contexts, like \lam{x = E}, though this is strictly not
1025       correct (you could read this as "if there is any function \lam{x} that binds
1026       \lam{E}, any \lam{E} can be transformed, while we only mean the \lam{E} that
1027       is bound by \lam{x}. This might need some extra notes or something).
1028
1029       \starttrans
1030       x = E                            \lam{E} is representable
1031       ~                                \lam{E} is not a lambda abstraction
1032       E                                \lam{E} is not a let expression
1033       ---------------------------      \lam{E} is not a local variable reference
1034       let x = E in x
1035       \stoptrans
1036
1037       \starttrans
1038       x = λv0 ... λvn.E
1039       ~                                \lam{E} is representable
1040       E                                \lam{E} is not a let expression
1041       ---------------------------      \lam{E} is not a local variable reference
1042       let x = E in x
1043       \stoptrans
1044
1045       \starttrans
1046       x = λv0 ... λvn.let ... in E
1047       ~                                \lam{E} is representable
1048       E                                \lam{E} is not a local variable reference
1049       ---------------------------
1050       let x = E in x
1051       \stoptrans
1052
1053       \startbuffer[from]
1054       x = add 1 2
1055       \stopbuffer
1056
1057       \startbuffer[to]
1058       x = let x = add 1 2 in x
1059       \stopbuffer
1060
1061       \transexample{Return value simplification}{from}{to}
1062
1063   \subsection{Argument simplification}
1064     The transforms in this section deal with simplifying application
1065     arguments into normal form. The goal here is to:
1066
1067     \startitemize
1068      \item Make all arguments of user-defined functions (\eg, of which
1069      we have a function body) simple variable references of a runtime
1070      representable type. This is needed, since these applications will be turned
1071      into component instantiations.
1072      \item Make all arguments of builtin functions one of:
1073        \startitemize
1074         \item A type argument.
1075         \item A dictionary argument.
1076         \item A type level expression.
1077         \item A variable reference of a runtime representable type.
1078         \item A variable reference or partial application of a function type.
1079        \stopitemize
1080     \stopitemize
1081
1082     When looking at the arguments of a user-defined function, we can
1083     divide them into two categories:
1084     \startitemize
1085       \item Arguments of a runtime representable type (\eg bits or vectors).
1086
1087             These arguments can be preserved in the program, since they can
1088             be translated to input ports later on.  However, since we can
1089             only connect signals to input ports, these arguments must be
1090             reduced to simple variables (for which signals will be
1091             produced). This is taken care of by the argument extraction
1092             transform.
1093       \item Non-runtime representable typed arguments.
1094
1095             These arguments cannot be preserved in the program, since we
1096             cannot represent them as input or output ports in the resulting
1097             \small{VHDL}. To remove them, we create a specialized version of the
1098             called function with these arguments filled in. This is done by
1099             the argument propagation transform.
1100
1101             Typically, these arguments are type and dictionary arguments that are
1102             used to make functions polymorphic. By propagating these arguments, we
1103             are essentially doing the same which GHC does when it specializes
1104             functions: Creating multiple variants of the same function, one for
1105             each type for which it is used. Other common non-representable
1106             arguments are functions, e.g. when calling a higher order function
1107             with another function or a lambda abstraction as an argument.
1108
1109             The reason for doing this is similar to the reasoning provided for
1110             the inlining of non-representable let bindings above. In fact, this
1111             argument propagation could be viewed as a form of cross-function
1112             inlining.
1113     \stopitemize
1114
1115     TODO: Check the following itemization.
1116
1117     When looking at the arguments of a builtin function, we can divide them
1118     into categories:
1119
1120     \startitemize
1121       \item Arguments of a runtime representable type.
1122
1123             As we have seen with user-defined functions, these arguments can
1124             always be reduced to a simple variable reference, by the
1125             argument extraction transform. Performing this transform for
1126             builtin functions as well, means that the translation of builtin
1127             functions can be limited to signal references, instead of
1128             needing to support all possible expressions.
1129
1130       \item Arguments of a function type.
1131
1132             These arguments are functions passed to higher order builtins,
1133             like \lam{map} and \lam{foldl}. Since implementing these
1134             functions for arbitrary function-typed expressions (\eg, lambda
1135             expressions) is rather comlex, we reduce these arguments to
1136             (partial applications of) global functions.
1137
1138             We can still support arbitrary expressions from the user code,
1139             by creating a new global function containing that expression.
1140             This way, we can simply replace the argument with a reference to
1141             that new function. However, since the expression can contain any
1142             number of free variables we also have to include partial
1143             applications in our normal form.
1144
1145             This category of arguments is handled by the function extraction
1146             transform.
1147       \item Other unrepresentable arguments.
1148
1149             These arguments can take a few different forms:
1150             \startdesc{Type arguments}
1151               In the core language, type arguments can only take a single
1152               form: A type wrapped in the Type constructor. Also, there is
1153               nothing that can be done with type expressions, except for
1154               applying functions to them, so we can simply leave type
1155               arguments as they are.
1156             \stopdesc
1157             \startdesc{Dictionary arguments}
1158               In the core language, dictionary arguments are used to find
1159               operations operating on one of the type arguments (mostly for
1160               finding class methods). Since we will not actually evaluatie
1161               the function body for builtin functions and can generate
1162               code for builtin functions by just looking at the type
1163               arguments, these arguments can be ignored and left as they
1164               are.
1165             \stopdesc
1166             \startdesc{Type level arguments}
1167               Sometimes, we want to pass a value to a builtin function, but
1168               we need to know the value at compile time. Additionally, the
1169               value has an impact on the type of the function. This is
1170               encoded using type-level values, where the actual value of the
1171               argument is not important, but the type encodes some integer,
1172               for example. Since the value is not important, the actual form
1173               of the expression does not matter either and we can leave
1174               these arguments as they are.
1175             \stopdesc
1176             \startdesc{Other arguments}
1177               Technically, there is still a wide array of arguments that can
1178               be passed, but does not fall into any of the above categories.
1179               However, none of the supported builtin functions requires such
1180               an argument. This leaves use with passing unsupported types to
1181               a function, such as calling \lam{head} on a list of functions.
1182
1183               In these cases, it would be impossible to generate hardware
1184               for such a function call anyway, so we can ignore these
1185               arguments.
1186
1187               The only way to generate hardware for builtin functions with
1188               arguments like these, is to expand the function call into an
1189               equivalent core expression (\eg, expand map into a series of
1190               function applications). But for now, we choose to simply not
1191               support expressions like these.
1192             \stopdesc
1193
1194             From the above, we can conclude that we can simply ignore these
1195             other unrepresentable arguments and focus on the first two
1196             categories instead.
1197     \stopitemize
1198
1199     \subsubsection{Argument simplification}
1200       This transform deals with arguments to functions that
1201       are of a runtime representable type. It ensures that they will all become
1202       references to global variables, or local signals in the resulting \small{VHDL}.
1203
1204       TODO: It seems we can map an expression to a port, not only a signal.
1205       Perhaps this makes this transformation not needed?
1206       TODO: Say something about dataconstructors (without arguments, like True
1207       or False), which are variable references of a runtime representable
1208       type, but do not result in a signal.
1209
1210       To reduce a complex expression to a simple variable reference, we create
1211       a new let expression around the application, which binds the complex
1212       expression to a new variable. The original function is then applied to
1213       this variable.
1214
1215       \starttrans
1216       M N
1217       --------------------    \lam{N} is of a representable type
1218       let x = N in M x        \lam{N} is not a local variable reference
1219       \stoptrans
1220
1221       \startbuffer[from]
1222       add (add a 1) 1
1223       \stopbuffer
1224
1225       \startbuffer[to]
1226       let x = add a 1 in add x 1
1227       \stopbuffer
1228
1229       \transexample{Argument extraction}{from}{to}
1230
1231     \subsubsection{Function extraction}
1232       This transform deals with function-typed arguments to builtin functions.
1233       Since these arguments cannot be propagated, we choose to extract them
1234       into a new global function instead.
1235
1236       Any free variables occuring in the extracted arguments will become
1237       parameters to the new global function. The original argument is replaced
1238       with a reference to the new function, applied to any free variables from
1239       the original argument.
1240
1241       This transformation is useful when applying higher order builtin functions
1242       like \hs{map} to a lambda abstraction, for example. In this case, the code
1243       that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
1244       partial applications, not any other expression (such as lambda abstractions or
1245       even more complicated expressions).
1246
1247       \starttrans
1248       M N                     \lam{M} is a (partial aplication of a) builtin function.
1249       ---------------------   \lam{f0 ... fn} = free local variables of \lam{N}
1250       M x f0 ... fn           \lam{N :: a -> b}
1251       ~                       \lam{N} is not a (partial application of) a top level function
1252       x = λf0 ... λfn.N
1253       \stoptrans
1254
1255       \startbuffer[from]
1256       map (λa . add a b) xs
1257
1258       map (add b) ys
1259       \stopbuffer
1260
1261       \startbuffer[to]
1262       x0 = λb.λa.add a b
1263       ~
1264       map x0 xs
1265
1266       x1 = λb.add b
1267       map x1 ys
1268       \stopbuffer
1269
1270       \transexample{Function extraction}{from}{to}
1271
1272     \subsubsection{Argument propagation}
1273       This transform deals with arguments to user-defined functions that are
1274       not representable at runtime. This means these arguments cannot be
1275       preserved in the final form and most be {\em propagated}.
1276
1277       Propagation means to create a specialized version of the called
1278       function, with the propagated argument already filled in. As a simple
1279       example, in the following program:
1280
1281       \startlambda
1282       f = λa.λb.a + b
1283       inc = λa.f a 1
1284       \stoplambda
1285
1286       we could {\em propagate} the constant argument 1, with the following
1287       result:
1288
1289       \startlambda
1290       f' = λa.a + 1
1291       inc = λa.f' a
1292       \stoplambda
1293
1294       Special care must be taken when the to-be-propagated expression has any
1295       free variables. If this is the case, the original argument should not be
1296       removed alltogether, but replaced by all the free variables of the
1297       expression. In this way, the original expression can still be evaluated
1298       inside the new function. Also, this brings us closer to our goal: All
1299       these free variables will be simple variable references.
1300
1301       To prevent us from propagating the same argument over and over, a simple
1302       local variable reference is not propagated (since is has exactly one
1303       free variable, itself, we would only replace that argument with itself).
1304
1305       This shows that any free local variables that are not runtime representable
1306       cannot be brought into normal form by this transform. We rely on an
1307       inlining transformation to replace such a variable with an expression we
1308       can propagate again.
1309
1310       \starttrans
1311       x = E
1312       ~
1313       x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
1314       ---------------------------------------------    \lam{Yi} is not a local variable reference
1315       x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} = free local vars of \lam{Yi}
1316       ~
1317       x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn .
1318             E y0 ... yi-1 Yi yi+1 ... yn
1319
1320       \stoptrans
1321
1322       TODO: Example
1323
1324   \subsection{Case simplification}
1325     \subsubsection{Scrutinee simplification}
1326       This transform ensures that the scrutinee of a case expression is always
1327       a simple variable reference.
1328
1329       \starttrans
1330       case E of
1331         alts
1332       -----------------        \lam{E} is not a local variable reference
1333       let x = E in
1334         case E of
1335           alts
1336       \stoptrans
1337
1338       \startbuffer[from]
1339       case (foo a) of
1340         True -> a
1341         False -> b
1342       \stopbuffer
1343
1344       \startbuffer[to]
1345       let x = foo a in
1346         case x of
1347           True -> a
1348           False -> b
1349       \stopbuffer
1350
1351       \transexample{Let flattening}{from}{to}
1352
1353
1354     \subsubsection{Case simplification}
1355       This transformation ensures that all case expressions become normal form. This
1356       means they will become one of:
1357       \startitemize
1358       \item An extractor case with a single alternative that picks a single field
1359       from a datatype, \eg \lam{case x of (a, b) -> a}.
1360       \item A selector case with multiple alternatives and only wild binders, that
1361       makes a choice between expressions based on the constructor of another
1362       expression, \eg \lam{case x of Low -> a; High -> b}.
1363       \stopitemize
1364
1365       \starttrans
1366       case E of
1367         C0 v0,0 ... v0,m -> E0
1368         \vdots
1369         Cn vn,0 ... vn,m -> En
1370       --------------------------------------------------- \forall i \forall j, 0 <= i <= n, 0 <= i < m (\lam{wi,j} is a wild (unused) binder)
1371       letnonrec
1372         v0,0 = case x of C0 v0,0 .. v0,m -> v0,0
1373         \vdots
1374         v0,m = case x of C0 v0,0 .. v0,m -> v0,m
1375         x0 = E0
1376         \dots
1377         vn,m = case x of Cn vn,0 .. vn,m -> vn,m
1378         xn = En
1379       in
1380         case E of
1381           C0 w0,0 ... w0,m -> x0
1382           \vdots
1383           Cn wn,0 ... wn,m -> xn
1384       \stoptrans
1385
1386       TODO: This transformation specified like this is complicated and misses
1387       conditions to prevent looping with itself. Perhaps we should split it here for
1388       discussion?
1389
1390       \startbuffer[from]
1391       case a of
1392         True -> add b 1
1393         False -> add b 2
1394       \stopbuffer
1395
1396       \startbuffer[to]
1397       letnonrec
1398         x0 = add b 1
1399         x1 = add b 2
1400       in
1401         case a of
1402           True -> x0
1403           False -> x1
1404       \stopbuffer
1405
1406       \transexample{Selector case simplification}{from}{to}
1407
1408       \startbuffer[from]
1409       case a of
1410         (,) b c -> add b c
1411       \stopbuffer
1412       \startbuffer[to]
1413       letnonrec
1414         b = case a of (,) b c -> b
1415         c = case a of (,) b c -> c
1416         x0 = add b c
1417       in
1418         case a of
1419           (,) w0 w1 -> x0
1420       \stopbuffer
1421
1422       \transexample{Extractor case simplification}{from}{to}
1423
1424     \subsubsection{Case removal}
1425       This transform removes any case statements with a single alternative and
1426       only wild binders.
1427
1428       These "useless" case statements are usually leftovers from case simplification
1429       on extractor case (see the previous example).
1430
1431       \starttrans
1432       case x of
1433         C v0 ... vm -> E
1434       ----------------------     \lam{\forall i, 0 <= i <= m} (\lam{vi} does not occur free in E)
1435       E
1436       \stoptrans
1437
1438       \startbuffer[from]
1439       case a of
1440         (,) w0 w1 -> x0
1441       \stopbuffer
1442
1443       \startbuffer[to]
1444       x0
1445       \stopbuffer
1446
1447       \transexample{Case removal}{from}{to}
1448
1449 \subsection{Monomorphisation}
1450   TODO: Better name for this section
1451
1452   Reference type-specialization (== argument propagation)
1453
1454 \subsubsection{Defunctionalization}
1455   Reference higher-order-specialization (== argument propagation)
1456
1457     \subsubsection{Non-representable binding inlining}
1458       This transform inlines let bindings that have a non-representable type. Since
1459       we can never generate a signal assignment for these bindings (we cannot
1460       declare a signal assignment with a non-representable type, for obvious
1461       reasons), we have no choice but to inline the binding to remove it.
1462
1463       If the binding is non-representable because it is a lambda abstraction, it is
1464       likely that it will inlined into an application and β-reduction will remove
1465       the lambda abstraction and turn it into a representable expression at the
1466       inline site. The same holds for partial applications, which can be turned into
1467       full applications by inlining.
1468
1469       Other cases of non-representable bindings we see in practice are primitive
1470       Haskell types. In most cases, these will not result in a valid normalized
1471       output, but then the input would have been invalid to start with. There is one
1472       exception to this: When a builtin function is applied to a non-representable
1473       expression, things might work out in some cases. For example, when you write a
1474       literal \hs{SizedInt} in Haskell, like \hs{1 :: SizedInt D8}, this results in
1475       the following core: \lam{fromInteger (smallInteger 10)}, where for example
1476       \lam{10 :: GHC.Prim.Int\#} and \lam{smallInteger 10 :: Integer} have
1477       non-representable types. TODO: This/these paragraph(s) should probably become a
1478       separate discussion somewhere else.
1479
1480       \starttrans
1481       letnonrec a = E in M
1482       --------------------------    \lam{E} has a non-representable type.
1483       M[E/a]
1484       \stoptrans
1485
1486       \starttrans
1487       letrec
1488         \vdots
1489         a = E
1490         \vdots
1491       in
1492         M
1493       --------------------------    \lam{E} has a non-representable type.
1494       letrec
1495         \vdots [E/a]
1496         \vdots [E/a]
1497       in
1498         M[E/a]
1499       \stoptrans
1500
1501       \startbuffer[from]
1502       letrec
1503         a = smallInteger 10
1504         inc = λa -> add a 1
1505         inc' = add 1
1506         x = fromInteger a
1507       in
1508         inc (inc' x)
1509       \stopbuffer
1510
1511       \startbuffer[to]
1512       letrec
1513         x = fromInteger (smallInteger 10)
1514       in
1515         (λa -> add a 1) (add 1 x)
1516       \stopbuffer
1517
1518       \transexample{Let flattening}{from}{to}
1519
1520
1521 \section{Provable properties}
1522   When looking at the system of transformations outlined above, there are a
1523   number of questions that we can ask ourselves. The main question is of course:
1524   \quote{Does our system work as intended?}. We can split this question into a
1525   number of subquestions:
1526
1527   \startitemize[KR]
1528   \item[q:termination] Does our system \emph{terminate}? Since our system will
1529   keep running as long as transformations apply, there is an obvious risk that
1530   it will keep running indefinitely. One transformation produces a result that
1531   is transformed back to the original by another transformation, for example.
1532   \item[q:soundness] Is our system \emph{sound}? Since our transformations
1533   continuously modify the expression, there is an obvious risk that the final
1534   normal form will not be equivalent to the original program: Its meaning could
1535   have changed.
1536   \item[q:completeness] Is our system \emph{complete}? Since we have a complex
1537   system of transformations, there is an obvious risk that some expressions will
1538   not end up in our intended normal form, because we forgot some transformation.
1539   In other words: Does our transformation system result in our intended normal
1540   form for all possible inputs?
1541   \item[q:determinism] Is our system \emph{deterministic}? Since we have defined
1542   no particular order in which the transformation should be applied, there is an
1543   obvious risk that different transformation orderings will result in
1544   \emph{different} normal forms. They might still both be intended normal forms
1545   (if our system is \emph{complete}) and describe correct hardware (if our
1546   system is \emph{sound}), so this property is less important than the previous
1547   three: The translator would still function properly without it.
1548   \stopitemize
1549
1550   \subsection{Graph representation}
1551     Before looking into how to prove these properties, we'll look at our
1552     transformation system from a graph perspective. The nodes of the graph are
1553     all possible Core expressions. The (directed) edges of the graph are
1554     transformations. When a transformation α applies to an expression \lam{A} to
1555     produce an expression \lam{B}, we add an edge from the node for \lam{A} to the
1556     node for \lam{B}, labeled α.
1557
1558     \startuseMPgraphic{TransformGraph}
1559       save a, b, c, d;
1560
1561       % Nodes
1562       newCircle.a(btex \lam{(λx.λy. (+) x y) 1} etex);
1563       newCircle.b(btex \lam{λy. (+) 1 y} etex);
1564       newCircle.c(btex \lam{(λx.(+) x) 1} etex);
1565       newCircle.d(btex \lam{(+) 1} etex);
1566
1567       b.c = origin;
1568       c.c = b.c + (4cm, 0cm);
1569       a.c = midpoint(b.c, c.c) + (0cm, 4cm);
1570       d.c = midpoint(b.c, c.c) - (0cm, 3cm);
1571
1572       % β-conversion between a and b
1573       ncarc.a(a)(b) "name(bred)";
1574       ObjLabel.a(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
1575       ncarc.b(b)(a) "name(bexp)", "linestyle(dashed withdots)";
1576       ObjLabel.b(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
1577
1578       % η-conversion between a and c
1579       ncarc.a(a)(c) "name(ered)";
1580       ObjLabel.a(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
1581       ncarc.c(c)(a) "name(eexp)", "linestyle(dashed withdots)";
1582       ObjLabel.c(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
1583
1584       % η-conversion between b and d
1585       ncarc.b(b)(d) "name(ered)";
1586       ObjLabel.b(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
1587       ncarc.d(d)(b) "name(eexp)", "linestyle(dashed withdots)";
1588       ObjLabel.d(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
1589
1590       % β-conversion between c and d
1591       ncarc.c(c)(d) "name(bred)";
1592       ObjLabel.c(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
1593       ncarc.d(d)(c) "name(bexp)", "linestyle(dashed withdots)";
1594       ObjLabel.d(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
1595
1596       % Draw objects and lines
1597       drawObj(a, b, c, d);
1598     \stopuseMPgraphic
1599
1600     \placeexample[right][ex:TransformGraph]{Partial graph of a labmda calculus
1601     system with β and η reduction (solid lines) and expansion (dotted lines).}
1602         \boxedgraphic{TransformGraph}
1603
1604     Of course our graph is unbounded, since we can construct an infinite amount of
1605     Core expressions. Also, there might potentially be multiple edges between two
1606     given nodes (with different labels), though seems unlikely to actually happen
1607     in our system.
1608
1609     See \in{example}[ex:TransformGraph] for the graph representation of a very
1610     simple lambda calculus that contains just the expressions \lam{(λx.λy. (+) x
1611     y) 1}, \lam{λy. (+) 1 y}, \lam{(λx.(+) x) 1} and \lam{(+) 1}. The
1612     transformation system consists of β-reduction and η-reduction (solid edges) or
1613     β-reduction and η-reduction (dotted edges).
1614
1615     TODO: Define β-reduction and η-reduction?
1616
1617     Note that the normal form of such a system consists of the set of nodes
1618     (expressions) without outgoing edges, since those are the expression to which
1619     no transformation applies anymore. We call this set of nodes the \emph{normal
1620     set}.
1621
1622     From such a graph, we can derive some properties easily:
1623     \startitemize[KR]
1624       \item A system will \emph{terminate} if there is no path of infinite length
1625       in the graph (this includes cycles).
1626       \item Soundness is not easily represented in the graph.
1627       \item A system is \emph{complete} if all of the nodes in the normal set have
1628       the intended normal form. The inverse (that all of the nodes outside of
1629       the normal set are \emph{not} in the intended normal form) is not
1630       strictly required.
1631       \item A system is deterministic if all paths from a node, which end in a node
1632       in the normal set, end at the same node.
1633     \stopitemize
1634
1635     When looking at the \in{example}[ex:TransformGraph], we see that the system
1636     terminates for both the reduction and expansion systems (but note that, for
1637     expansion, this is only true because we've limited the possible expressions!
1638     In comlete lambda calculus, there would be a path from \lam{(λx.λy. (+) x y)
1639     1} to \lam{(λx.λy.(λz.(+) z) x y) 1} to \lam{(λx.λy.(λz.(λq.(+) q) z) x y) 1}
1640     etc.)
1641
1642     If we would consider the system with both expansion and reduction, there would
1643     no longer be termination, since there would be cycles all over the place.
1644
1645     The reduction and expansion systems have a normal set of containing just
1646     \lam{(+) 1} or \lam{(λx.λy. (+) x y) 1} respectively. Since all paths in
1647     either system end up in these normal forms, both systems are \emph{complete}.
1648     Also, since there is only one normal form, it must obviously be
1649     \emph{deterministic} as well.
1650
1651   \subsection{Termination}
1652     Approach: Counting.
1653
1654     Church-Rosser?
1655
1656   \subsection{Soundness}
1657     Needs formal definition of semantics.
1658     Prove for each transformation seperately, implies soundness of the system.
1659
1660   \subsection{Completeness}
1661     Show that any transformation applies to every Core expression that is not
1662     in normal form. To prove: no transformation applies => in intended form.
1663     Show the reverse: Not in intended form => transformation applies.
1664
1665   \subsection{Determinism}
1666     How to prove this?