Chapters/Normalization.tex

   1 \chapter[chap:normalization]{Normalization}
   2
   3 % A helper to print a single example in the half the page width. The example
   4 % text should be in a buffer whose name is given in an argument.
   5 %
   6 % The align=right option really does left-alignment, but without the program
   7 % will end up on a single line. The strut=no option prevents a bunch of empty
   8 % space at the start of the frame.
   9 \define[1]\example{
  10   \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
  11     \setuptyping[option=LAM,style=sans,before=,after=]
  12     \typebuffer[#1]
  13     \setuptyping[option=none,style=\tttf]
  14   }
  15 }
  16
  17
  18 % A transformation example
  19 \definefloat[example][examples]
  20 \setupcaption[example][location=top] % Put captions on top
  21
  22 \define[3]\transexample{
  23   \placeexample[here]{#1}
  24   \startcombination[2*1]
  25     {\example{#2}}{Original program}
  26     {\example{#3}}{Transformed program}
  27   \stopcombination
  28 }
  29 %
  30 %\define[3]\transexampleh{
  31 %%  \placeexample[here]{#1}
  32 %%  \startcombination[1*2]
  33 %%    {\example{#2}}{Original program}
  34 %%    {\example{#3}}{Transformed program}
  35 %%  \stopcombination
  36 %}
  37
  38 The first step in the core to \small{VHDL} translation process, is normalization. We
  39 aim to bring the core description into a simpler form, which we can
  40 subsequently translate into \small{VHDL} easily. This normal form is needed because
  41 the full core language is more expressive than \small{VHDL} in some areas and because
  42 core can describe expressions that do not have a direct hardware
  43 interpretation.
  44
  45 TODO: Describe core properties not supported in \small{VHDL}, and describe how the
  46 \small{VHDL} we want to generate should look like.
  47
  48 \section{Normal form}
  49 The transformations described here have a well-defined goal: To bring the
  50 program in a well-defined form that is directly translatable to hardware,
  51 while fully preserving the semantics of the program. We refer to this form as
  52 the \emph{normal form} of the program. The formal definition of this normal
  53 form is quite simple:
  54
  55 \placedefinition{}{A program is in \emph{normal form} if none of the
  56 transformations from this chapter apply.}
  57
  58 Of course, this is an \quote{easy} definition of the normal form, since our
  59 program will end up in normal form automatically. The more interesting part is
  60 to see if this normal form actually has the properties we would like it to
  61 have.
  62
  63 But, before getting into more definitions and details about this normal form,
  64 let's try to get a feeling for it first. The easiest way to do this is by
  65 describing the things we want to not have in a normal form.
  66
  67 \startitemize
  68   \item Any \emph{polymorphism} must be removed. When laying down hardware, we
  69   can't generate any signals that can have multiple types. All types must be
  70   completely known to generate hardware.
  71
  72   \item Any \emph{higher order} constructions must be removed. We can't
  73   generate a hardware signal that contains a function, so all values,
  74   arguments and returns values used must be first order.
  75
  76   \item Any complex \emph{nested scopes} must be removed. In the \small{VHDL}
  77   description, every signal is in a single scope. Also, full expressions are
  78   not supported everywhere (in particular port maps can only map signal names,
  79   not expressions). To make the \small{VHDL} generation easy, all values must be bound
  80   on the \quote{top level}.
  81 \stopitemize
  82
  83 TODO: Intermezzo: functions vs plain values
  84
  85 A very simple example of a program in normal form is given in
  86 \in{example}[ex:MulSum]. As you can see, all arguments to the function (which
  87 will become input ports in the final hardware) are at the top. This means that
  88 the body of the final lambda abstraction is never a function, but always a
  89 plain value.
  90
  91 After the lambda abstractions, we see a single let expression, that binds two
  92 variables (\lam{mul} and \lam{sum}). These variables will be signals in the
  93 final hardware, bound to the output port of the \lam{*} and \lam{+}
  94 components.
  95
  96 The final line (the \quote{return value} of the function) selects the
  97 \lam{sum} signal to be the output port of the function. This \quote{return
  98 value} can always only be a variable reference, never a more complex
  99 expression.
 100
 101 \startbuffer[MulSum]
 102 alu :: Bit -> Word -> Word -> Word
 103 alu = λa.λb.λc.
 104     let
 105       mul = (*) a b
 106       sum = (+) mul c
 107     in
 108       sum
 109 \stopbuffer
 110
 111 \startuseMPgraphic{MulSum}
 112   save a, b, c, mul, add, sum;
 113
 114   % I/O ports
 115   newCircle.a(btex $a$ etex) "framed(false)";
 116   newCircle.b(btex $b$ etex) "framed(false)";
 117   newCircle.c(btex $c$ etex) "framed(false)";
 118   newCircle.sum(btex $res$ etex) "framed(false)";
 119
 120   % Components
 121   newCircle.mul(btex - etex);
 122   newCircle.add(btex + etex);
 123
 124   a.c      - b.c   = (0cm, 2cm);
 125   b.c      - c.c   = (0cm, 2cm);
 126   add.c            = c.c + (2cm, 0cm);
 127   mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
 128   sum.c            = add.c + (2cm, 0cm);
 129   c.c              = origin;
 130
 131   % Draw objects and lines
 132   drawObj(a, b, c, mul, add, sum);
 133
 134   ncarc(a)(mul) "arcangle(15)";
 135   ncarc(b)(mul) "arcangle(-15)";
 136   ncline(c)(add);
 137   ncline(mul)(add);
 138   ncline(add)(sum);
 139 \stopuseMPgraphic
 140
 141 \placeexample[here][ex:MulSum]{Simple architecture consisting of an adder and a
 142 subtractor.}
 143   \startcombination[2*1]
 144     {\typebufferlam{MulSum}}{Core description in normal form.}
 145     {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
 146   \stopcombination
 147
 148 The previous example described composing an architecture by calling other
 149 functions (operators), resulting in a simple architecture with component and
 150 connection. There is of course also some mechanism for choice in the normal
 151 form. In a normal Core program, the \emph{case} expression can be used in a
 152 few different ways to describe choice. In normal form, this is limited to a
 153 very specific form.
 154
 155 \in{Example}[ex:AddSubAlu] shows an example describing a
 156 simple \small{ALU}, which chooses between two operations based on an opcode
 157 bit. The main structure is the same as in \in{example}[ex:MulSum], but this
 158 time the \lam{res} variable is bound to a case expression. This case
 159 expression scrutinizes the variable \lam{opcode} (and scrutinizing more
 160 complex expressions is not supported). The case expression can select a
 161 different variable based on the constructor of \lam{opcode}.
 162
 163 \startbuffer[AddSubAlu]
 164 alu :: Bit -> Word -> Word -> Word
 165 alu = λopcode.λa.λb.
 166     let
 167       res1 = (+) a b
 168       res2 = (-) a b
 169       res = case opcode of
 170         Low -> res1
 171         High -> res2
 172     in
 173       res
 174 \stopbuffer
 175
 176 \startuseMPgraphic{AddSubAlu}
 177   save opcode, a, b, add, sub, mux, res;
 178
 179   % I/O ports
 180   newCircle.opcode(btex $opcode$ etex) "framed(false)";
 181   newCircle.a(btex $a$ etex) "framed(false)";
 182   newCircle.b(btex $b$ etex) "framed(false)";
 183   newCircle.res(btex $res$ etex) "framed(false)";
 184   % Components
 185   newCircle.add(btex + etex);
 186   newCircle.sub(btex - etex);
 187   newMux.mux;
 188
 189   opcode.c - a.c   = (0cm, 2cm);
 190   add.c    - a.c   = (4cm, 0cm);
 191   sub.c    - b.c   = (4cm, 0cm);
 192   a.c      - b.c   = (0cm, 3cm);
 193   mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
 194   res.c    - mux.c = (1.5cm, 0cm);
 195   b.c              = origin;
 196
 197   % Draw objects and lines
 198   drawObj(opcode, a, b, res, add, sub, mux);
 199
 200   ncline(a)(add) "posA(e)";
 201   ncline(b)(sub) "posA(e)";
 202   nccurve(a)(sub) "posA(e)", "angleA(0)";
 203   nccurve(b)(add) "posA(e)", "angleA(0)";
 204   nccurve(add)(mux) "posB(inpa)", "angleB(0)";
 205   nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
 206   nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
 207   ncline(mux)(res) "posA(out)";
 208 \stopuseMPgraphic
 209
 210 \placeexample[here][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
 211   \startcombination[2*1]
 212     {\typebufferlam{AddSubAlu}}{Core description in normal form.}
 213     {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
 214   \stopcombination
 215
 216 As a more complete example, consider \in{example}[ex:NormalComplete]. This
 217 example contains everything that is supported in normal form, with the
 218 exception of builtin higher order functions.
 219
 220 \startbuffer[NormalComplete]
 221   regbank :: Bit
 222              -> Word
 223              -> State (Word, Word)
 224              -> (State (Word, Word), Word)
 225
 226   -- All arguments are an inital lambda
 227   regbank = λa.λd.λsp.
 228   -- There are nested let expressions at top level
 229   let
 230     -- Unpack the state by coercion (\eg, cast from
 231     -- State (Word, Word) to (Word, Word))
 232     s = sp :: (Word, Word)
 233     -- Extract both registers from the state
 234     r1 = case s of (fst, snd) -> fst
 235     r2 = case s of (fst, snd) -> snd
 236     -- Calling some other user-defined function.
 237     d' = foo d
 238     -- Conditional connections
 239     out = case a of
 240       High -> r1
 241       Low -> r2
 242     r1' = case a of
 243       High -> d'
 244       Low -> r1
 245     r2' = case a of
 246       High -> r2
 247       Low -> d'
 248     -- Packing a tuple
 249     s' = (,) r1' r2'
 250     -- pack the state by coercion (\eg, cast from
 251     -- (Word, Word) to State (Word, Word))
 252     sp' = s' :: State (Word, Word)
 253     -- Pack our return value
 254     res = (,) sp' out
 255   in
 256     -- The actual result
 257     res
 258 \stopbuffer
 259
 260 \startuseMPgraphic{NormalComplete}
 261   save a, d, r, foo, muxr, muxout, out;
 262
 263   % I/O ports
 264   newCircle.a(btex \lam{a} etex) "framed(false)";
 265   newCircle.d(btex \lam{d} etex) "framed(false)";
 266   newCircle.out(btex \lam{out} etex) "framed(false)";
 267   % Components
 268   %newCircle.add(btex + etex);
 269   newBox.foo(btex \lam{foo} etex);
 270   newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
 271   newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
 272   newMux.muxr1;
 273   % Reflect over the vertical axis
 274   reflectObj(muxr1)((0,0), (0,1));
 275   newMux.muxr2;
 276   newMux.muxout;
 277   rotateObj(muxout)(-90);
 278
 279   d.c               = foo.c + (0cm, 1.5cm);
 280   a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
 281   foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
 282   muxr1.c           = r1.c + (0cm, 2cm);
 283   muxr2.c           = r2.c + (0cm, 2cm);
 284   r2.c              = r1.c + (4cm, 0cm);
 285   r1.c              = origin;
 286   muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
 287   out.c             = muxout.c - (0cm, 1.5cm);
 288
 289 %  % Draw objects and lines
 290   drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
 291
 292   ncline(d)(foo);
 293   nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
 294   nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
 295   nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
 296   nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
 297   nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
 298   nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
 299   nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
 300   nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
 301   % Connect port a
 302   nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
 303   nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
 304   nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
 305   ncline(muxout)(out) "posA(out)";
 306 \stopuseMPgraphic
 307
 308 \placeexample[here][ex:NormalComplete]{Simple architecture consisting of an adder and a
 309 subtractor.}
 310   \startcombination[2*1]
 311     {\typebufferlam{NormalComplete}}{Core description in normal form.}
 312     {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
 313   \stopcombination
 314
 315 \subsection{Normal form definition}
 316 Now we have some intuition for the normal form, we can describe how we want
 317 the normal form to look like in a slightly more formal manner. The following
 318 EBNF-like description completely captures the intended structure (and
 319 generates a subset of GHC's core format).
 320
 321 Some clauses have an expression listed in parentheses. These are conditions
 322 that need to apply to the clause.
 323
 324 \startlambda
 325 \italic{normal} = \italic{lambda}
 326 \italic{lambda} = λvar.\italic{lambda} (representable(var))
 327                 | \italic{toplet}
 328 \italic{toplet} = let \italic{binding} in \italic{toplet}
 329                 | letrec [\italic{binding}] in \italic{toplet}
 330                 | var (representable(varvar))
 331 \italic{binding} = var = \italic{rhs} (representable(rhs))
 332                  -- State packing and unpacking by coercion
 333                  | var0 = var1 :: State ty (lvar(var1))
 334                  | var0 = var1 :: ty (var0 :: State ty) (lvar(var1))
 335 \italic{rhs} = userapp
 336              | builtinapp
 337              -- Extractor case
 338              | case var of C a0 ... an -> ai (lvar(var))
 339              -- Selector case
 340              | case var of (lvar(var))
 341                 DEFAULT -> var0 (lvar(var0))
 342                 C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar))
 343 \italic{userapp} = \italic{userfunc}
 344                  | \italic{userapp} {userarg}
 345 \italic{userfunc} = var (gvar(var))
 346 \italic{userarg} = var (lvar(var))
 347 \italic{builtinapp} = \italic{builtinfunc}
 348                     | \italic{builtinapp} \italic{builtinarg}
 349 \italic{builtinfunc} = var (bvar(var))
 350 \italic{builtinarg} = \italic{coreexpr}
 351 \stoplambda
 352
 353 -- TODO: Limit builtinarg further
 354
 355 -- TODO: There can still be other casts around (which the code can handle,
 356 e.g., ignore), which still need to be documented here.
 357
 358 -- TODO: Note about the selector case. It just supports Bit and Bool
 359 currently, perhaps it should be generalized in the normal form?
 360
 361 When looking at such a program from a hardware perspective, the top level
 362 lambda's define the input ports. The value produced by the let expression is
 363 the output port. Most function applications bound by the let expression
 364 define a component instantiation, where the input and output ports are mapped
 365 to local signals or arguments. Some of the others use a builtin
 366 construction (\eg the \lam{case} statement) or call a builtin function
 367 (\eg \lam{add} or \lam{sub}). For these, a hardcoded \small{VHDL} translation is
 368 available.
 369
 370 \subsection{Definitions}
 371 In the following sections, we will be using a number of functions and
 372 notations, which we will define here.
 373
 374 \subsubsection{Transformations}
 375 The most important notation is the one for transformation, which looks like
 376 the following:
 377
 378 \starttrans
 379 context conditions
 380 ~
 381 from
 382 ------------------------            expression conditions
 383 to
 384 ~
 385 context additions
 386 \stoptrans
 387
 388 Here, we describe a transformation. The most import parts are \lam{from} and
 389 \lam{to}, which describe the Core expresssion that should be matched and the
 390 expression that it should be replaced with. This matching can occur anywhere
 391 in function that is being normalized, so it applies to any subexpression as
 392 well.
 393
 394 The \lam{expression conditions} list a number of conditions on the \lam{from}
 395 expression that must hold for the transformation to apply.
 396
 397 Furthermore, there is some way to look into the environment (\eg, other top
 398 level bindings).  The \lam{context conditions} part specifies any number of
 399 top level bindings that must be present for the transformation to apply.
 400 Usually, this lists a top level binding that binds an identfier that is also
 401 used in the \lam{from} expression, allowing us to "access" the value of a top
 402 level binding in the \lam{to} expression (\eg, for inlining).
 403
 404 Finally, there is a way to influence the environment. The \lam{context
 405 additions} part lists any number of new top level bindings that should be
 406 added.
 407
 408 If there are no \lam{context conditions} or \lam{context additions}, they can
 409 be left out alltogether, along with the separator \lam{~}.
 410
 411 TODO: Example
 412
 413 \subsubsection{Other concepts}
 414 A \emph{global variable} is any variable that is bound at the
 415 top level of a program, or an external module. A local variable is any other
 416 variable (\eg, variables local to a function, which can be bound by lambda
 417 abstractions, let expressions and case expressions).
 418
 419 A \emph{hardware representable} type is a type that we can generate
 420 a signal for in hardware. For example, a bit, a vector of bits, a 32 bit
 421 unsigned word, etc. Types that are not runtime representable notably
 422 include (but are not limited to): Types, dictionaries, functions.
 423
 424 A \emph{builtin function} is a function for which a builtin
 425 hardware translation is available, because its actual definition is not
 426 translatable. A user-defined function is any other function.
 427
 428 \subsubsection{Functions}
 429 Here, we define a number of functions that can be used below to concisely
 430 specify conditions.
 431
 432 \emph{gvar(expr)} is true when \emph{expr} is a variable that references a
 433 global variable. It is false when it references a local variable.
 434
 435 \emph{lvar(expr)} is the inverse of \emph{gvar}; it is true when \emph{expr}
 436 references a local variable, false when it references a global variable.
 437
 438 \emph{representable(expr)} or \emph{representable(var)} is true when
 439 \emph{expr} or \emph{var} has a type that is representable at runtime.
 440
 441 \section{Transform passes}
 442 In this section we describe the actual transforms. Here we're using
 443 the core language in a notation that resembles lambda calculus.
 444
 445 Each of these transforms is meant to be applied to every (sub)expression
 446 in a program, for as long as it applies. Only when none of the
 447 transformations can be applied anymore, the program is in normal form (by
 448 definition). We hope to be able to prove that this form will obey all of the
 449 constraints defined above, but this has yet to happen (though it seems likely
 450 that it will).
 451
 452 Each of the transforms will be described informally first, explaining
 453 the need for and goal of the transform. Then, a formal definition is
 454 given, using a familiar syntax from the world of logic. Each transform
 455 is specified as a number of conditions (above the horizontal line) and a
 456 number of conclusions (below the horizontal line). The details of using
 457 this notation are still a bit fuzzy, so comments are welcom.
 458
 459 TODO: Formally describe the "apply to every (sub)expression" in terms of
 460 rules with full transformations in the conditions.
 461
 462 \subsection{η-abstraction}
 463 This transformation makes sure that all arguments of a function-typed
 464 expression are named, by introducing lambda expressions. When combined with
 465 β-reduction and function inlining below, all function-typed expressions should
 466 be lambda abstractions or global identifiers.
 467
 468 \starttrans
 469 E                 \lam{E :: * -> *}
 470 --------------    \lam{E} is not the first argument of an application.
 471 λx.E x            \lam{E} is not a lambda abstraction.
 472                   \lam{x} is a variable that does not occur free in \lam{E}.
 473 \stoptrans
 474
 475 \startbuffer[from]
 476 foo = λa.case a of
 477   True -> λb.mul b b
 478   False -> id
 479 \stopbuffer
 480
 481 \startbuffer[to]
 482 foo = λa.λx.(case a of
 483     True -> λb.mul b b
 484     False -> λy.id y) x
 485 \stopbuffer
 486
 487 \transexample{η-abstraction}{from}{to}
 488
 489 \subsection{Extended β-reduction}
 490 This transformation is meant to propagate application expressions downwards
 491 into expressions as far as possible. In lambda calculus, this reduction
 492 is known as β-reduction, but it is of course only defined for
 493 applications of lambda abstractions. We extend this reduction to also
 494 work for the rest of core (case and let expressions).
 495
 496 For let expressions:
 497 \starttrans
 498 let binds in E) M
 499 -----------------
 500 let binds in E M
 501 \stoptrans
 502
 503 For case statements:
 504 \starttrans
 505 (case x of
 506   p1 -> E1
 507   \vdots
 508   pn -> En) M
 509 -----------------
 510 case x of
 511   p1 -> E1 M
 512   \vdots
 513   pn -> En M
 514 \stoptrans
 515
 516 For lambda expressions:
 517 \starttrans
 518 (λx.E) M
 519 -----------------
 520 E[M/x]
 521 \stoptrans
 522
 523 % And an example
 524 \startbuffer[from]
 525 ( let a = (case x of
 526             True -> id
 527             False -> neg
 528           ) 1
 529       b = (let y = 3 in add y) 2
 530   in
 531     (λz.add 1 z)
 532 ) 3
 533 \stopbuffer
 534
 535 \startbuffer[to]
 536 let a = case x of
 537            True -> id 1
 538            False -> neg 1
 539     b = let y = 3 in add y 2
 540 in
 541   add 1 3
 542 \stopbuffer
 543
 544 \transexample{Extended β-reduction}{from}{to}
 545
 546 \subsection{Let derecursification}
 547 This transformation is meant to make lets non-recursive whenever possible.
 548 This might allow other optimizations to do their work better. TODO: Why is
 549 this needed exactly?
 550
 551 \subsection{Let flattening}
 552 This transformation puts nested lets in the same scope, by lifting the
 553 binding(s) of the inner let into a new let around the outer let. Eventually,
 554 this will cause all let bindings to appear in the same scope (they will all be
 555 in scope for the function return value).
 556
 557 Note that this transformation does not try to be smart when faced with
 558 recursive lets, it will just leave the lets recursive (possibly joining a
 559 recursive and non-recursive let into a single recursive let). The let
 560 rederursification transformation will do this instead.
 561
 562 \starttrans
 563 letnonrec x = (let bindings in M) in N
 564 ------------------------------------------
 565 let bindings in (letnonrec x = M) in N
 566 \stoptrans
 567
 568 \starttrans
 569 letrec
 570   \vdots
 571   x = (let bindings in M)
 572   \vdots
 573 in
 574   N
 575 ------------------------------------------
 576 letrec
 577   \vdots
 578   bindings
 579   x = M
 580   \vdots
 581 in
 582   N
 583 \stoptrans
 584
 585 \startbuffer[from]
 586 let
 587   a = letrec
 588     x = 1
 589     y = 2
 590   in
 591     x + y
 592 in
 593   letrec
 594     b = let c = 3 in a + c
 595     d = 4
 596   in
 597     d + b
 598 \stopbuffer
 599 \startbuffer[to]
 600 letrec
 601   x = 1
 602   y = 2
 603 in
 604   let
 605     a = x + y
 606   in
 607     letrec
 608       c = 3
 609       b = a + c
 610       d = 4
 611     in
 612       d + b
 613 \stopbuffer
 614
 615 \transexample{Let flattening}{from}{to}
 616
 617 \subsection{Empty let removal}
 618 This transformation is simple: It removes recursive lets that have no bindings
 619 (which usually occurs when let derecursification removes the last binding from
 620 it).
 621
 622 \starttrans
 623 letrec in M
 624 --------------
 625 M
 626 \stoptrans
 627
 628 \subsection{Simple let binding removal}
 629 This transformation inlines simple let bindings (\eg a = b).
 630
 631 This transformation is not needed to get into normal form, but makes the
 632 resulting \small{VHDL} a lot shorter.
 633
 634 \starttrans
 635 letnonrec
 636   a = b
 637 in
 638   M
 639 -----------------
 640 M[b/a]
 641 \stoptrans
 642
 643 \starttrans
 644 letrec
 645   \vdots
 646   a = b
 647   \vdots
 648 in
 649   M
 650 -----------------
 651 let
 652   \vdots [b/a]
 653   \vdots [b/a]
 654 in
 655   M[b/a]
 656 \stoptrans
 657
 658 \subsection{Unused let binding removal}
 659 This transformation removes let bindings that are never used. Usually,
 660 the desugarer introduces some unused let bindings.
 661
 662 This normalization pass should really be unneeded to get into normal form
 663 (since ununsed bindings are not forbidden by the normal form), but in practice
 664 the desugarer or simplifier emits some unused bindings that cannot be
 665 normalized (e.g., calls to a \type{PatError} (TODO: Check this name)). Also,
 666 this transformation makes the resulting \small{VHDL} a lot shorter.
 667
 668 \starttrans
 669 let a = E in M
 670 ----------------------------    \lam{a} does not occur free in \lam{M}
 671 M
 672 \stoptrans
 673
 674 \starttrans
 675 letrec
 676   \vdots
 677   a = E
 678   \vdots
 679 in
 680   M
 681 ----------------------------    \lam{a} does not occur free in \lam{M}
 682 letrec
 683   \vdots
 684   \vdots
 685 in
 686   M
 687 \stoptrans
 688
 689 \subsection{Non-representable binding inlining}
 690 This transform inlines let bindings that have a non-representable type. Since
 691 we can never generate a signal assignment for these bindings (we cannot
 692 declare a signal assignment with a non-representable type, for obvious
 693 reasons), we have no choice but to inline the binding to remove it.
 694
 695 If the binding is non-representable because it is a lambda abstraction, it is
 696 likely that it will inlined into an application and β-reduction will remove
 697 the lambda abstraction and turn it into a representable expression at the
 698 inline site. The same holds for partial applications, which can be turned into
 699 full applications by inlining.
 700
 701 Other cases of non-representable bindings we see in practice are primitive
 702 Haskell types. In most cases, these will not result in a valid normalized
 703 output, but then the input would have been invalid to start with. There is one
 704 exception to this: When a builtin function is applied to a non-representable
 705 expression, things might work out in some cases. For example, when you write a
 706 literal \hs{SizedInt} in Haskell, like \hs{1 :: SizedInt D8}, this results in
 707 the following core: \lam{fromInteger (smallInteger 10)}, where for example
 708 \lam{10 :: GHC.Prim.Int\#} and \lam{smallInteger 10 :: Integer} have
 709 non-representable types. TODO: This/these paragraph(s) should probably become a
 710 separate discussion somewhere else.
 711
 712 \starttrans
 713 letnonrec a = E in M
 714 --------------------------    \lam{E} has a non-representable type.
 715 M[E/a]
 716 \stoptrans
 717
 718 \starttrans
 719 letrec
 720   \vdots
 721   a = E
 722   \vdots
 723 in
 724   M
 725 --------------------------    \lam{E} has a non-representable type.
 726 letrec
 727   \vdots [E/a]
 728   \vdots [E/a]
 729 in
 730   M[E/a]
 731 \stoptrans
 732
 733 \startbuffer[from]
 734 letrec
 735   a = smallInteger 10
 736   inc = λa -> add a 1
 737   inc' = add 1
 738   x = fromInteger a
 739 in
 740   inc (inc' x)
 741 \stopbuffer
 742
 743 \startbuffer[to]
 744 letrec
 745   x = fromInteger (smallInteger 10)
 746 in
 747   (λa -> add a 1) (add 1 x)
 748 \stopbuffer
 749
 750 \transexample{Let flattening}{from}{to}
 751
 752 \subsection{Compiler generated top level binding inlining}
 753 TODO
 754
 755 \subsection{Scrutinee simplification}
 756 This transform ensures that the scrutinee of a case expression is always
 757 a simple variable reference.
 758
 759 \starttrans
 760 case E of
 761   alts
 762 -----------------        \lam{E} is not a local variable reference
 763 let x = E in
 764   case E of
 765     alts
 766 \stoptrans
 767
 768 \startbuffer[from]
 769 case (foo a) of
 770   True -> a
 771   False -> b
 772 \stopbuffer
 773
 774 \startbuffer[to]
 775 let x = foo a in
 776   case x of
 777     True -> a
 778     False -> b
 779 \stopbuffer
 780
 781 \transexample{Let flattening}{from}{to}
 782
 783
 784 \subsection{Case simplification}
 785 This transformation ensures that all case expressions become normal form. This
 786 means they will become one of:
 787 \startitemize
 788 \item An extractor case with a single alternative that picks a single field
 789 from a datatype, \eg \lam{case x of (a, b) -> a}.
 790 \item A selector case with multiple alternatives and only wild binders, that
 791 makes a choice between expressions based on the constructor of another
 792 expression, \eg \lam{case x of Low -> a; High -> b}.
 793 \stopitemize
 794
 795 \starttrans
 796 case E of
 797   C0 v0,0 ... v0,m -> E0
 798   \vdots
 799   Cn vn,0 ... vn,m -> En
 800 --------------------------------------------------- \forall i \forall j, 0 <= i <= n, 0 <= i < m (\lam{wi,j} is a wild (unused) binder)
 801 letnonrec
 802   v0,0 = case x of C0 v0,0 .. v0,m -> v0,0
 803   \vdots
 804   v0,m = case x of C0 v0,0 .. v0,m -> v0,m
 805   x0 = E0
 806   \dots
 807   vn,m = case x of Cn vn,0 .. vn,m -> vn,m
 808   xn = En
 809 in
 810   case E of
 811     C0 w0,0 ... w0,m -> x0
 812     \vdots
 813     Cn wn,0 ... wn,m -> xn
 814 \stoptrans
 815
 816 TODO: This transformation specified like this is complicated and misses
 817 conditions to prevent looping with itself. Perhaps we should split it here for
 818 discussion?
 819
 820 \startbuffer[from]
 821 case a of
 822   True -> add b 1
 823   False -> add b 2
 824 \stopbuffer
 825
 826 \startbuffer[to]
 827 letnonrec
 828   x0 = add b 1
 829   x1 = add b 2
 830 in
 831   case a of
 832     True -> x0
 833     False -> x1
 834 \stopbuffer
 835
 836 \transexample{Selector case simplification}{from}{to}
 837
 838 \startbuffer[from]
 839 case a of
 840   (,) b c -> add b c
 841 \stopbuffer
 842 \startbuffer[to]
 843 letnonrec
 844   b = case a of (,) b c -> b
 845   c = case a of (,) b c -> c
 846   x0 = add b c
 847 in
 848   case a of
 849     (,) w0 w1 -> x0
 850 \stopbuffer
 851
 852 \transexample{Extractor case simplification}{from}{to}
 853
 854 \subsection{Case removal}
 855 This transform removes any case statements with a single alternative and
 856 only wild binders.
 857
 858 These "useless" case statements are usually leftovers from case simplification
 859 on extractor case (see the previous example).
 860
 861 \starttrans
 862 case x of
 863   C v0 ... vm -> E
 864 ----------------------     \lam{\forall i, 0 <= i <= m} (\lam{vi} does not occur free in E)
 865 E
 866 \stoptrans
 867
 868 \startbuffer[from]
 869 case a of
 870   (,) w0 w1 -> x0
 871 \stopbuffer
 872
 873 \startbuffer[to]
 874 x0
 875 \stopbuffer
 876
 877 \transexample{Case removal}{from}{to}
 878
 879 \subsection{Argument simplification}
 880 The transforms in this section deal with simplifying application
 881 arguments into normal form. The goal here is to:
 882
 883 \startitemize
 884  \item Make all arguments of user-defined functions (\eg, of which
 885  we have a function body) simple variable references of a runtime
 886  representable type. This is needed, since these applications will be turned
 887  into component instantiations.
 888  \item Make all arguments of builtin functions one of:
 889    \startitemize
 890     \item A type argument.
 891     \item A dictionary argument.
 892     \item A type level expression.
 893     \item A variable reference of a runtime representable type.
 894     \item A variable reference or partial application of a function type.
 895    \stopitemize
 896 \stopitemize
 897
 898 When looking at the arguments of a user-defined function, we can
 899 divide them into two categories:
 900 \startitemize
 901   \item Arguments of a runtime representable type (\eg bits or vectors).
 902
 903         These arguments can be preserved in the program, since they can
 904         be translated to input ports later on.  However, since we can
 905         only connect signals to input ports, these arguments must be
 906         reduced to simple variables (for which signals will be
 907         produced). This is taken care of by the argument extraction
 908         transform.
 909   \item Non-runtime representable typed arguments.
 910
 911         These arguments cannot be preserved in the program, since we
 912         cannot represent them as input or output ports in the resulting
 913         \small{VHDL}. To remove them, we create a specialized version of the
 914         called function with these arguments filled in. This is done by
 915         the argument propagation transform.
 916
 917         Typically, these arguments are type and dictionary arguments that are
 918         used to make functions polymorphic. By propagating these arguments, we
 919         are essentially doing the same which GHC does when it specializes
 920         functions: Creating multiple variants of the same function, one for
 921         each type for which it is used. Other common non-representable
 922         arguments are functions, e.g. when calling a higher order function
 923         with another function or a lambda abstraction as an argument.
 924
 925         The reason for doing this is similar to the reasoning provided for
 926         the inlining of non-representable let bindings above. In fact, this
 927         argument propagation could be viewed as a form of cross-function
 928         inlining.
 929 \stopitemize
 930
 931 TODO: Check the following itemization.
 932
 933 When looking at the arguments of a builtin function, we can divide them
 934 into categories:
 935
 936 \startitemize
 937   \item Arguments of a runtime representable type.
 938
 939         As we have seen with user-defined functions, these arguments can
 940         always be reduced to a simple variable reference, by the
 941         argument extraction transform. Performing this transform for
 942         builtin functions as well, means that the translation of builtin
 943         functions can be limited to signal references, instead of
 944         needing to support all possible expressions.
 945
 946   \item Arguments of a function type.
 947
 948         These arguments are functions passed to higher order builtins,
 949         like \lam{map} and \lam{foldl}. Since implementing these
 950         functions for arbitrary function-typed expressions (\eg, lambda
 951         expressions) is rather comlex, we reduce these arguments to
 952         (partial applications of) global functions.
 953
 954         We can still support arbitrary expressions from the user code,
 955         by creating a new global function containing that expression.
 956         This way, we can simply replace the argument with a reference to
 957         that new function. However, since the expression can contain any
 958         number of free variables we also have to include partial
 959         applications in our normal form.
 960
 961         This category of arguments is handled by the function extraction
 962         transform.
 963   \item Other unrepresentable arguments.
 964
 965         These arguments can take a few different forms:
 966         \startdesc{Type arguments}
 967           In the core language, type arguments can only take a single
 968           form: A type wrapped in the Type constructor. Also, there is
 969           nothing that can be done with type expressions, except for
 970           applying functions to them, so we can simply leave type
 971           arguments as they are.
 972         \stopdesc
 973         \startdesc{Dictionary arguments}
 974           In the core language, dictionary arguments are used to find
 975           operations operating on one of the type arguments (mostly for
 976           finding class methods). Since we will not actually evaluatie
 977           the function body for builtin functions and can generate
 978           code for builtin functions by just looking at the type
 979           arguments, these arguments can be ignored and left as they
 980           are.
 981         \stopdesc
 982         \startdesc{Type level arguments}
 983           Sometimes, we want to pass a value to a builtin function, but
 984           we need to know the value at compile time. Additionally, the
 985           value has an impact on the type of the function. This is
 986           encoded using type-level values, where the actual value of the
 987           argument is not important, but the type encodes some integer,
 988           for example. Since the value is not important, the actual form
 989           of the expression does not matter either and we can leave
 990           these arguments as they are.
 991         \stopdesc
 992         \startdesc{Other arguments}
 993           Technically, there is still a wide array of arguments that can
 994           be passed, but does not fall into any of the above categories.
 995           However, none of the supported builtin functions requires such
 996           an argument. This leaves use with passing unsupported types to
 997           a function, such as calling \lam{head} on a list of functions.
 998
 999           In these cases, it would be impossible to generate hardware
1000           for such a function call anyway, so we can ignore these
1001           arguments.
1002
1003           The only way to generate hardware for builtin functions with
1004           arguments like these, is to expand the function call into an
1005           equivalent core expression (\eg, expand map into a series of
1006           function applications). But for now, we choose to simply not
1007           support expressions like these.
1008         \stopdesc
1009
1010         From the above, we can conclude that we can simply ignore these
1011         other unrepresentable arguments and focus on the first two
1012         categories instead.
1013 \stopitemize
1014
1015 \subsubsection{Argument simplification}
1016 This transform deals with arguments to functions that
1017 are of a runtime representable type. It ensures that they will all become
1018 references to global variables, or local signals in the resulting \small{VHDL}.
1019
1020 TODO: It seems we can map an expression to a port, not only a signal.
1021 Perhaps this makes this transformation not needed?
1022 TODO: Say something about dataconstructors (without arguments, like True
1023 or False), which are variable references of a runtime representable
1024 type, but do not result in a signal.
1025
1026 To reduce a complex expression to a simple variable reference, we create
1027 a new let expression around the application, which binds the complex
1028 expression to a new variable. The original function is then applied to
1029 this variable.
1030
1031 \starttrans
1032 M N
1033 --------------------    \lam{N} is of a representable type
1034 let x = N in M x        \lam{N} is not a local variable reference
1035 \stoptrans
1036
1037 \startbuffer[from]
1038 add (add a 1) 1
1039 \stopbuffer
1040
1041 \startbuffer[to]
1042 let x = add a 1 in add x 1
1043 \stopbuffer
1044
1045 \transexample{Argument extraction}{from}{to}
1046
1047 \subsubsection{Function extraction}
1048 This transform deals with function-typed arguments to builtin functions.
1049 Since these arguments cannot be propagated, we choose to extract them
1050 into a new global function instead.
1051
1052 Any free variables occuring in the extracted arguments will become
1053 parameters to the new global function. The original argument is replaced
1054 with a reference to the new function, applied to any free variables from
1055 the original argument.
1056
1057 This transformation is useful when applying higher order builtin functions
1058 like \hs{map} to a lambda abstraction, for example. In this case, the code
1059 that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
1060 partial applications, not any other expression (such as lambda abstractions or
1061 even more complicated expressions).
1062
1063 \starttrans
1064 M N                     \lam{M} is a (partial aplication of a) builtin function.
1065 ---------------------   \lam{f0 ... fn} = free local variables of \lam{N}
1066 M x f0 ... fn           \lam{N :: a -> b}
1067 ~                       \lam{N} is not a (partial application of) a top level function
1068 x = λf0 ... λfn.N
1069 \stoptrans
1070
1071 \startbuffer[from]
1072 map (λa . add a b) xs
1073
1074 map (add b) ys
1075 \stopbuffer
1076
1077 \startbuffer[to]
1078 x0 = λb.λa.add a b
1079 ~
1080 map x0 xs
1081
1082 x1 = λb.add b
1083 map x1 ys
1084 \stopbuffer
1085
1086 \transexample{Function extraction}{from}{to}
1087
1088 \subsubsection{Argument propagation}
1089 This transform deals with arguments to user-defined functions that are
1090 not representable at runtime. This means these arguments cannot be
1091 preserved in the final form and most be {\em propagated}.
1092
1093 Propagation means to create a specialized version of the called
1094 function, with the propagated argument already filled in. As a simple
1095 example, in the following program:
1096
1097 \startlambda
1098 f = λa.λb.a + b
1099 inc = λa.f a 1
1100 \stoplambda
1101
1102 we could {\em propagate} the constant argument 1, with the following
1103 result:
1104
1105 \startlambda
1106 f' = λa.a + 1
1107 inc = λa.f' a
1108 \stoplambda
1109
1110 Special care must be taken when the to-be-propagated expression has any
1111 free variables. If this is the case, the original argument should not be
1112 removed alltogether, but replaced by all the free variables of the
1113 expression. In this way, the original expression can still be evaluated
1114 inside the new function. Also, this brings us closer to our goal: All
1115 these free variables will be simple variable references.
1116
1117 To prevent us from propagating the same argument over and over, a simple
1118 local variable reference is not propagated (since is has exactly one
1119 free variable, itself, we would only replace that argument with itself).
1120
1121 This shows that any free local variables that are not runtime representable
1122 cannot be brought into normal form by this transform. We rely on an
1123 inlining transformation to replace such a variable with an expression we
1124 can propagate again.
1125
1126 \starttrans
1127 x = E
1128 ~
1129 x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
1130 ---------------------------------------------    \lam{Yi} is not a local variable reference
1131 x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} = free local vars of \lam{Yi}
1132 ~
1133 x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn .
1134       E y0 ... yi-1 Yi yi+1 ... yn
1135
1136 \stoptrans
1137
1138 TODO: Example
1139
1140 \subsection{Cast propagation / simplification}
1141 This transform pushes casts down into the expression as far as possible. Since
1142 its exact role and need is not clear yet, this transformation is not yet
1143 specified.
1144
1145 \subsection{Return value simplification}
1146 This transformation ensures that the return value of a function is always a
1147 simple local variable reference.
1148
1149 Currently implemented using lambda simplification, let simplification, and
1150 top simplification. Should change into something like the following, which
1151 works only on the result of a function instead of any subexpression. This is
1152 achieved by the contexts, like \lam{x = E}, though this is strictly not
1153 correct (you could read this as "if there is any function \lam{x} that binds
1154 \lam{E}, any \lam{E} can be transformed, while we only mean the \lam{E} that
1155 is bound by \lam{x}. This might need some extra notes or something).
1156
1157 \starttrans
1158 x = E                            \lam{E} is representable
1159 ~                                \lam{E} is not a lambda abstraction
1160 E                                \lam{E} is not a let expression
1161 ---------------------------      \lam{E} is not a local variable reference
1162 let x = E in x
1163 \stoptrans
1164
1165 \starttrans
1166 x = λv0 ... λvn.E
1167 ~                                \lam{E} is representable
1168 E                                \lam{E} is not a let expression
1169 ---------------------------      \lam{E} is not a local variable reference
1170 let x = E in x
1171 \stoptrans
1172
1173 \starttrans
1174 x = λv0 ... λvn.let ... in E
1175 ~                                \lam{E} is representable
1176 E                                \lam{E} is not a local variable reference
1177 ---------------------------
1178 let x = E in x
1179 \stoptrans
1180
1181 \startbuffer[from]
1182 x = add 1 2
1183 \stopbuffer
1184
1185 \startbuffer[to]
1186 x = let x = add 1 2 in x
1187 \stopbuffer
1188
1189 \transexample{Return value simplification}{from}{to}