Fix typo in buffer name.

[matthijs/master-project/report.git] / Chapters / Normalization.tex
diff --git a/Chapters/Normalization.tex b/Chapters/Normalization.tex

index cbef2c0a7322718897573c561dafc22356276ce8..1d65f4a4373adbce010df3e9f293bf01ca8ddcc6 100644 (file)
--- a/Chapters/Normalization.tex
+++ b/Chapters/Normalization.tex
@@ -1,789 +1,2561 @@
-\chapter{Normalization}
-
-% A helper to print a single example in the half the page width. The example
-% text should be in a buffer whose name is given in an argument.
-%
-% The align=right option really does left-alignment, but without the program
-% will end up on a single line. The strut=no option prevents a bunch of empty
-% space at the start of the frame.
-\define[1]\example{
-  \framed[offset=1mm,align=right,strut=no]{
-    \setuptyping[option=LAM,style=sans,before=,after=]
-    \typebuffer[#1]
-    \setuptyping[option=none,style=\tttf]
+\chapter[chap:normalization]{Normalization}
+  % A helper to print a single example in the half the page width. The example
+  % text should be in a buffer whose name is given in an argument.
+  %
+  % The align=right option really does left-alignment, but without the program
+  % will end up on a single line. The strut=no option prevents a bunch of empty
+  % space at the start of the frame.
+  \define[1]\example{
+    \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
+      \setuptyping[option=LAM,style=sans,before=,after=,strip=auto]
+      \typebuffer[#1]
+      \setuptyping[option=none,style=\tttf,strip=auto]
+    }
    }
-}
-
-
-% A transformation example
-\definefloat[example][examples]
-\setupcaption[example][location=top] % Put captions on top
-
-\define[3]\transexample{
-  \placeexample[here]{#1}
-  \startcombination[2*1]
-    {\example{#2}}{Original program}
-    {\example{#3}}{Transformed program}
-  \stopcombination
-}
-%
-%\define[3]\transexampleh{
-%%  \placeexample[here]{#1}
-%%  \startcombination[1*2]
-%%    {\example{#2}}{Original program}
-%%    {\example{#3}}{Transformed program}
-%%  \stopcombination
-%}
-
-The first step in the core to VHDL translation process, is normalization. We
-aim to bring the core description into a simpler form, which we can
-subsequently translate into VHDL easily. This normal form is needed because
-the full core language is more expressive than VHDL in some areas and because
-core can describe expressions that do not have a direct hardware
-interpretation.
-
-TODO: Describe core properties not supported in VHDL, and describe how the
-VHDL we want to generate should look like.
-
-\section{Goal}
-The transformations described here have a well-defined goal: To bring the
-program in a well-defined form that is directly translatable to hardware,
-while fully preserving the semantics of the program.
-
-This {\em normal form} is again a Core program, but with a very specific
-structure. A function in normal form has nested lambda's at the top, which
-produce a let expression. This let expression binds every function application
-in the function and produces a simple identifier. Every bound value in
-the let expression is either a simple function application or a case
-expression to extract a single element from a tuple returned by a
-function.
-
-An example of a program in canonical form would be:
-
-\startlambda
-  -- All arguments are an inital lambda
-  λa.λd.λsp.
-  -- There are nested let expressions at top level
-  let
-    -- Unpack the state by coercion
-    s = sp :: (Word, Word)
-    -- Extract both registers from the state
-    r1 = case s of (fst, snd) -> fst
-    r2 = case s of (fst, snd) -> snd
-    -- Calling some other user-defined function.
-    d' = foo d
-    -- Conditional connections
-    out = case a of
-      High -> r1
-      Low -> r2
-    r1' = case a of
-      High -> d
-      Low -> r1
-    r2' = case a of
-      High -> r2
-      Low -> d
-    -- Packing a tuple
-    s' = (,) r1' r2'
-    -- Packing the state by coercion
-    sp' = s' :: State (Word, Word)
-    -- Pack our return value
-    res = (,) sp' out
-  in
-    -- The actual result
-    res
-\stoplambda
-
-When looking at such a program from a hardware perspective, the top level
-lambda's define the input ports. The value produced by the let expression is
-the output port. Most function applications bound by the let expression
-define a component instantiation, where the input and output ports are mapped
-to local signals or arguments. Some of the others use a builtin
-construction (\eg the \lam{case} statement) or call a builtin function
-(\eg \lam{add} or \lam{sub}). For these, a hardcoded VHDL translation is
-available.
-
-\subsection{Normal definition}
-Formally, the normal form is a core program obeying the following
-constraints. TODO: Update this section, this is probably not completely
-accurate or relevant anymore.
-
-\startitemize[R,inmargin]
-%\item All top level binds must have the form $\expr{\bind{fun}{lamexpr}}$.
-%$fun$ is an identifier that will be bound as a global identifier.
-%\item A $lamexpr$ has the form $\expr{\lam{arg}{lamexpr}}$ or
-%$\expr{letexpr}$. $arg$ is an identifier which will be bound as an $argument$.
-%\item[letexpr] A $letexpr$ has the form $\expr{\letexpr{letbinds}{retexpr}}$.
-%\item $letbinds$ is a list with elements of the form
-%$\expr{\bind{res}{appexpr}}$ or $\expr{\bind{res}{builtinexpr}}$, where $res$ is
-%an identifier that will be bound as local identifier. The type of the bound
-%value must be a $hardware\;type$.
-%\item[builtinexpr] A $builtinexpr$ is an expression that can be mapped to an
-%equivalent VHDL expression. Since there are many supported forms for this,
-%these are defined in a separate table.
-%\item An $appexpr$ has the form $\expr{fun}$ or $\expr{\app{appexpr}{x}}$,
-%where $fun$ is a global identifier and $x$ is a local identifier.
-%\item[retexpr] A $retexpr$ has the form $\expr{x}$ or $\expr{tupexpr}$, where $x$ is a local identifier that is bound as an $argument$ or $result$.  A $retexpr$ must
-%be of a $hardware\;type$.
-%\item A $tupexpr$ has the form $\expr{con}$ or $\expr{\app{tupexpr}{x}}$,
-%where $con$ is a tuple constructor ({\em e.g.} $(,)$ or $(,,,)$) and $x$ is
-%a local identifier.
-%\item A $hardware\;type$ is a type that can be directly translated to
-%hardware. This includes the types $Bit$, $SizedWord$, tuples containing
-%elements of $hardware\;type$s, and will include others. This explicitely
-%excludes function types.
-\stopitemize
-
-TODO: Say something about uniqueness of identifiers
-
-\subsection{Builtin expressions}
-A $builtinexpr$, as defined at \in[builtinexpr] can have any of the following forms.
-
-\startitemize[m,inmargin]
-%\item
-%$tuple\_extract=\expr{\case{t}{\alt{\app{con}{x_0\;x_1\;..\;x_n}}{x_i}}}$,
-%where $t$ can be any local identifier, $con$ is a tuple constructor ({\em
-%e.g.} $(,)$ or $(,,,)$), $x_0$ to $x_n$ can be any identifier, and $x_i$ can
-%be any of $x_0$ to $x_n$. A case expression must have a $hardware\;type$.
-%\item TODO: Many more!
-\stopitemize
-
-\section{Transform passes}
-
-In this section we describe the actual transforms. Here we're using
-the core language in a notation that resembles lambda calculus.
-
-Each of these transforms is meant to be applied to every (sub)expression
-in a program, for as long as it applies. Only when none of the
-expressions can be applied anymore, the program is in normal form. We
-hope to be able to prove that this form will obey all of the constraints
-defined above, but this has yet to happen (though it seems likely that
-it will).
-
-Each of the transforms will be described informally first, explaining
-the need for and goal of the transform. Then, a formal definition is
-given, using a familiar syntax from the world of logic. Each transform
-is specified as a number of conditions (above the horizontal line) and a
-number of conclusions (below the horizontal line). The details of using
-this notation are still a bit fuzzy, so comments are welcom.
-
-TODO: Formally describe the "apply to every (sub)expression" in terms of
-rules with full transformations in the conditions.
-
-\subsection{η-abstraction}
-This transformation makes sure that all arguments of a function-typed
-expression are named, by introducing lambda expressions. When combined with
-β-reduction and function inlining below, all function-typed expressions should
-be lambda abstractions or global identifiers.
-
-\starttrans
-E                 \lam{E :: * -> *}
---------------    \lam{E} is not the first argument of an application.
-λx.E x            \lam{E} is not a lambda abstraction.
-                  \lam{x} is a variable that does not occur free in \lam{E}.
-\stoptrans
-
-\startbuffer[from]
-foo = λa -> case a of 
-  True -> λb.mul b b
-  False -> id
-\stopbuffer
-
-\startbuffer[to]
-foo = λa.λx -> (case a of 
-    True -> λb.mul b b
-    False -> λy.id y) x
-\stopbuffer
-
-\transexample{η-abstraction}{from}{to}
-
-\subsection{Extended β-reduction}
-This transformation is meant to propagate application expressions downwards
-into expressions as far as possible. In lambda calculus, this reduction
-is known as β-reduction, but it is of course only defined for
-applications of lambda abstractions. We extend this reduction to also
-work for the rest of core (case and let expressions).
-\startbuffer[from]
-(case x of
-  p1 -> E1
-  \vdots
-  pn -> En) M
-\stopbuffer
-\startbuffer[to]
-case x of
-  p1 -> E1 M
-  \vdots
-  pn -> En M
-\stopbuffer
-
-%\transform{Extended β-reduction}
-%{
-%\conclusion
-%\trans{(λx.E) M}{E[M/x]}
-%
-%\nextrule
-%\conclusion
-%\trans{(let binds in E) M}{let binds in E M}
-%
-%\nextrule
-%\conclusion
-%\transbuf{from}{to}
-%}
-
-\startbuffer[from]
-let a = (case x of 
-           True -> id
-           False -> neg
+
+  \define[4]\transexample{
+    \placeexample[here][ex:trans:#1]{#2}
+    \startcombination[2*1]
+      {\example{#3}}{Original program}
+      {\example{#4}}{Transformed program}
+    \stopcombination
+  }
+
+  The first step in the Core to \small{VHDL} translation process, is normalization. We
+  aim to bring the Core description into a simpler form, which we can
+  subsequently translate into \small{VHDL} easily. This normal form is needed because
+  the full Core language is more expressive than \small{VHDL} in some
+  areas (higher-order expressions, limited polymorphism using type
+  classes, etc.) and because Core can describe expressions that do not
+  have a direct hardware interpretation.
+
+  \section{Normal form}
+    The transformations described here have a well-defined goal: to bring the
+    program in a well-defined form that is directly translatable to
+    \VHDL, while fully preserving the semantics of the program. We refer
+    to this form as the \emph{normal form} of the program. The formal
+    definition of this normal form is quite simple:
+
+    \placedefinition[force]{}{\startboxed A program is in \emph{normal form} if none of the
+    transformations from this chapter apply.\stopboxed}
+
+    Of course, this is an \quote{easy} definition of the normal form, since our
+    program will end up in normal form automatically. The more interesting part is
+    to see if this normal form actually has the properties we would like it to
+    have.
+
+    But, before getting into more definitions and details about this normal
+    form, let us try to get a feeling for it first. The easiest way to do this
+    is by describing the things that are unwanted in the intended normal form.
+
+    \startitemize
+      \item Any \emph{polymorphism} must be removed. When laying down hardware, we
+      cannot generate any signals that can have multiple types. All types must be
+      completely known to generate hardware.
+      
+      \item All \emph{higher-order} constructions must be removed. We cannot
+      generate a hardware signal that contains a function, so all values,
+      arguments and return values used must be first order.
+
+      \item All complex \emph{nested scopes} must be removed. In the \small{VHDL}
+      description, every signal is in a single scope. Also, full expressions are
+      not supported everywhere (in particular port maps can only map signal
+      names and constants, not complete expressions). To make the \small{VHDL}
+      generation easy, a separate binder must be bound to ever application or
+      other expression.
+    \stopitemize
+
+    \startbuffer[MulSum]
+    alu :: Bit -> Word -> Word -> Word
+    alu = λa.λb.λc.
+        let
+          mul = (*) a b
+          sum = (+) mul c
+        in
+          sum
+    \stopbuffer
+
+    \startuseMPgraphic{MulSum}
+      save a, b, c, mul, add, sum;
+
+      % I/O ports
+      newCircle.a(btex $a$ etex) "framed(false)";
+      newCircle.b(btex $b$ etex) "framed(false)";
+      newCircle.c(btex $c$ etex) "framed(false)";
+      newCircle.sum(btex $sum$ etex) "framed(false)";
+
+      % Components
+      newCircle.mul(btex * etex);
+      newCircle.add(btex + etex);
+
+      a.c      - b.c   = (0cm, 2cm);
+      b.c      - c.c   = (0cm, 2cm);
+      add.c            = c.c + (2cm, 0cm);
+      mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
+      sum.c            = add.c + (2cm, 0cm);
+      c.c              = origin;
+
+      % Draw objects and lines
+      drawObj(a, b, c, mul, add, sum);
+
+      ncarc(a)(mul) "arcangle(15)";
+      ncarc(b)(mul) "arcangle(-15)";
+      ncline(c)(add);
+      ncline(mul)(add);
+      ncline(add)(sum);
+    \stopuseMPgraphic
+
+    \placeexample[][ex:MulSum]{Simple architecture consisting of a
+    multiplier and a subtractor.}
+      \startcombination[2*1]
+        {\typebufferlam{MulSum}}{Core description in normal form.}
+        {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
+      \stopcombination
+
+    \todo{Intermezzo: functions vs plain values}
+
+    A very simple example of a program in normal form is given in
+    \in{example}[ex:MulSum]. As you can see, all arguments to the function (which
+    will become input ports in the generated \VHDL) are at the outer level.
+    This means that the body of the inner lambda abstraction is never a
+    function, but always a plain value.
+
+    As the body of the inner lambda abstraction, we see a single (recursive)
+    let expression, that binds two variables (\lam{mul} and \lam{sum}). These
+    variables will be signals in the generated \VHDL, bound to the output port
+    of the \lam{*} and \lam{+} components.
+
+    The final line (the \quote{return value} of the function) selects the
+    \lam{sum} signal to be the output port of the function. This \quote{return
+    value} can always only be a variable reference, never a more complex
+    expression.
+
+    \todo{Add generated VHDL}
+
+    \in{Example}[ex:MulSum] showed a function that just applied two
+    other functions (multiplication and addition), resulting in a simple
+    architecture with two components and some connections.  There is of
+    course also some mechanism for choice in the normal form. In a
+    normal Core program, the \emph{case} expression can be used in a few
+    different ways to describe choice. In normal form, this is limited
+    to a very specific form.
+
+    \in{Example}[ex:AddSubAlu] shows an example describing a
+    simple \small{ALU}, which chooses between two operations based on an opcode
+    bit. The main structure is similar to \in{example}[ex:MulSum], but this
+    time the \lam{res} variable is bound to a case expression. This case
+    expression scrutinizes the variable \lam{opcode} (and scrutinizing more
+    complex expressions is not supported). The case expression can select a
+    different variable based on the constructor of \lam{opcode}.
+    \refdef{case expression}
+
+    \startbuffer[AddSubAlu]
+    alu :: Bit -> Word -> Word -> Word
+    alu = λopcode.λa.λb.
+        let
+          res1 = (+) a b
+          res2 = (-) a b
+          res = case opcode of
+            Low -> res1
+            High -> res2
+        in
+          res
+    \stopbuffer
+
+    \startuseMPgraphic{AddSubAlu}
+      save opcode, a, b, add, sub, mux, res;
+
+      % I/O ports
+      newCircle.opcode(btex $opcode$ etex) "framed(false)";
+      newCircle.a(btex $a$ etex) "framed(false)";
+      newCircle.b(btex $b$ etex) "framed(false)";
+      newCircle.res(btex $res$ etex) "framed(false)";
+      % Components
+      newCircle.add(btex + etex);
+      newCircle.sub(btex - etex);
+      newMux.mux;
+
+      opcode.c - a.c   = (0cm, 2cm);
+      add.c    - a.c   = (4cm, 0cm);
+      sub.c    - b.c   = (4cm, 0cm);
+      a.c      - b.c   = (0cm, 3cm);
+      mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
+      res.c    - mux.c = (1.5cm, 0cm);
+      b.c              = origin;
+
+      % Draw objects and lines
+      drawObj(opcode, a, b, res, add, sub, mux);
+
+      ncline(a)(add) "posA(e)";
+      ncline(b)(sub) "posA(e)";
+      nccurve(a)(sub) "posA(e)", "angleA(0)";
+      nccurve(b)(add) "posA(e)", "angleA(0)";
+      nccurve(add)(mux) "posB(inpa)", "angleB(0)";
+      nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
+      nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
+      ncline(mux)(res) "posA(out)";
+    \stopuseMPgraphic
+
+    \placeexample[][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
+      \startcombination[2*1]
+        {\typebufferlam{AddSubAlu}}{Core description in normal form.}
+        {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
+      \stopcombination
+
+    As a more complete example, consider
+    \in{example}[ex:NormalComplete]. This example shows everything that
+    is allowed in normal form, except for built-in higher-order functions
+    (like \lam{map}). The graphical version of the architecture contains
+    a slightly simplified version, since the state tuple packing and
+    unpacking have been left out. Instead, two separate registers are
+    drawn. Most synthesis tools will further optimize this architecture by
+    removing the multiplexers at the register input and instead use the write
+    enable port of the register (when it is available), but we want to show
+    the architecture as close to the description as possible.
+
+    As you can see from the previous examples, the generation of the final
+    architecture from the normal form is straightforward. In each of the
+    examples, there is a direct match between the normal form structure,
+    the generated VHDL and the architecture shown in the images.
+
+    \startbuffer[NormalComplete]
+      regbank :: Bit 
+                 -> Word 
+                 -> State (Word, Word) 
+                 -> (State (Word, Word), Word)
+
+      -- All arguments are an inital lambda (address, data, packed state)
+      regbank = λa.λd.λsp.
+      -- There are nested let expressions at top level
+      let
+        -- Unpack the state by coercion (\eg, cast from
+        -- State (Word, Word) to (Word, Word))
+        s = sp ▶ (Word, Word)
+        -- Extract both registers from the state
+        r1 = case s of (a, b) -> a
+        r2 = case s of (a, b) -> b
+        -- Calling some other user-defined function.
+        d' = foo d
+        -- Conditional connections
+        out = case a of
+          High -> r1
+          Low -> r2
+        r1' = case a of
+          High -> d'
+          Low -> r1
+        r2' = case a of
+          High -> r2
+          Low -> d'
+        -- Packing a tuple
+        s' = (,) r1' r2'
+        -- pack the state by coercion (\eg, cast from
+        -- (Word, Word) to State (Word, Word))
+        sp' = s' ▶ State (Word, Word)
+        -- Pack our return value
+        res = (,) sp' out
+      in
+        -- The actual result
+        res
+    \stopbuffer
+
+    \startuseMPgraphic{NormalComplete}
+      save a, d, r, foo, muxr, muxout, out;
+
+      % I/O ports
+      newCircle.a(btex \lam{a} etex) "framed(false)";
+      newCircle.d(btex \lam{d} etex) "framed(false)";
+      newCircle.out(btex \lam{out} etex) "framed(false)";
+      % Components
+      %newCircle.add(btex + etex);
+      newBox.foo(btex \lam{foo} etex);
+      newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
+      newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
+      newMux.muxr1;
+      % Reflect over the vertical axis
+      reflectObj(muxr1)((0,0), (0,1));
+      newMux.muxr2;
+      newMux.muxout;
+      rotateObj(muxout)(-90);
+
+      d.c               = foo.c + (0cm, 1.5cm); 
+      a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
+      foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
+      muxr1.c           = r1.c + (0cm, 2cm);
+      muxr2.c           = r2.c + (0cm, 2cm);
+      r2.c              = r1.c + (4cm, 0cm);
+      r1.c              = origin;
+      muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
+      out.c             = muxout.c - (0cm, 1.5cm);
+
+    %  % Draw objects and lines
+      drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
+      
+      ncline(d)(foo);
+      nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
+      nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
+      nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
+      nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
+      nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
+      nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
+      nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
+      nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
+      % Connect port a
+      nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
+      nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
+      nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
+      ncline(muxout)(out) "posA(out)";
+    \stopuseMPgraphic
+
+    \todo{Don't split registers in this image?}
+    \placeexample[][ex:NormalComplete]{Simple architecture consisting of an adder and a
+    subtractor.}
+      \startcombination[2*1]
+        {\typebufferlam{NormalComplete}}{Core description in normal form.}
+        {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
+      \stopcombination
+    
+
+
+    \subsection[sec:normalization:intendednormalform]{Intended normal form definition}
+      Now we have some intuition for the normal form, we can describe how we want
+      the normal form to look like in a slightly more formal manner. The
+      EBNF-like description in \in{definition}[def:IntendedNormal] captures
+      most of the intended structure (and generates a subset of \GHC's Core
+      format). 
+      
+      There are two things missing from this definition: cast expressions are
+      sometimes allowed by the prototype, but not specified here and the below
+      definition allows uses of state that cannot be translated to \VHDL\
+      properly. These two problems are discussed in
+      \in{section}[sec:normalization:castproblems] and
+      \in{section}[sec:normalization:stateproblems] respectively.
+
+      Some clauses have an expression listed behind them in parentheses.
+      These are conditions that need to apply to the clause. The
+      predicates used there (\lam{lvar()}, \lam{representable()},
+      \lam{gvar()}) will be defined in
+      \in{section}[sec:normalization:predicates].
+
+      An expression is in normal form if it matches the first
+      definition, \emph{normal}.
+
+      \todo{Fix indentation}
+      \startbuffer[IntendedNormal]
+      \italic{normal} := \italic{lambda}
+      \italic{lambda} := λvar.\italic{lambda}                        (representable(var))
+                      | \italic{toplet} 
+      \italic{toplet} := letrec [\italic{binding}...] in var         (representable(var))
+      \italic{binding} := var = \italic{rhs}                         (representable(rhs))
+                       -- State packing and unpacking by coercion
+                       | var0 = var1 ▶ State ty                      (lvar(var1))
+                       | var0 = var1 ▶ ty                            (var1 :: State ty ∧ lvar(var1))
+      \italic{rhs} := \italic{userapp}
+                   | \italic{builtinapp}
+                   -- Extractor case
+                   | case var of C a0 ... an -> ai                   (lvar(var))
+                   -- Selector case
+                   | case var of                                     (lvar(var))
+                      [ DEFAULT -> var ]                             (lvar(var))
+                      C0 w0,0 ... w0,n -> var0
+                      \vdots
+                      Cm wm,0 ... wm,n -> varm                       (\forall{}i \forall{}j, wi,j \neq vari, lvar(vari))
+      \italic{userapp} := \italic{userfunc}
+                       | \italic{userapp} {userarg}
+      \italic{userfunc} := var                                       (gvar(var))
+      \italic{userarg} := var                                        (lvar(var))
+      \italic{builtinapp} := \italic{builtinfunc}
+                          | \italic{builtinapp} \italic{builtinarg}
+      \italic{built-infunc} := var                                    (bvar(var))
+      \italic{built-inarg} := var                                     (representable(var) ∧ lvar(var))
+                          | \italic{partapp}                         (partapp :: a -> b)
+                          | \italic{coreexpr}                        (¬representable(coreexpr) ∧ ¬(coreexpr :: a -> b))
+      \italic{partapp} := \italic{userapp} 
+                       | \italic{builtinapp}
+      \stopbuffer
+
+      \placedefinition[][def:IntendedNormal]{Definition of the intended nnormal form using an \small{EBNF}-like syntax.}
+          {\defref{intended normal form definition}
+           \typebufferlam{IntendedNormal}}
+
+      When looking at such a program from a hardware perspective, the top
+      level lambda abstractions (\italic{lambda}) define the input ports.
+      Lambda abstractions cannot appear anywhere else. The variable reference
+      in the body of the recursive let expression (\italic{toplet}) is the
+      output port.  Most binders bound by the let expression define a
+      component instantiation (\italic{userapp}), where the input and output
+      ports are mapped to local signals (\italic{userarg}). Some of the others
+      use a built-in construction (\eg\ the \lam{case} expression) or call a
+      built-in function (\italic{builtinapp}) such as \lam{+} or \lam{map}.
+      For these, a hardcoded \small{VHDL} translation is available.
+
+  \section[sec:normalization:transformation]{Transformation notation}
+    To be able to concisely present transformations, we use a specific format
+    for them. It is a simple format, similar to one used in logic reasoning.
+
+    Such a transformation description looks like the following.
+
+    \starttrans
+    <context conditions>
+    ~
+    <original expression>
+    --------------------------          <expression conditions>
+    <transformed expression>
+    ~
+    <context additions>
+    \stoptrans
+
+    This format describes a transformation that applies to \lam{<original
+    expression>} and transforms it into \lam{<transformed expression>}, assuming
+    that all conditions are satisfied. In this format, there are a number of placeholders
+    in pointy brackets, most of which should be rather obvious in their meaning.
+    Nevertheless, we will more precisely specify their meaning below:
+
+      \startdesc{<original expression>} The expression pattern that will be matched
+      against (subexpressions of) the expression to be transformed. We call this a
+      pattern, because it can contain \emph{placeholders} (variables), which match
+      any expression or binder. Any such placeholder is said to be \emph{bound} to
+      the expression it matches. It is convention to use an uppercase letter (\eg\
+      \lam{M} or \lam{E}) to refer to any expression (including a simple variable
+      reference) and lowercase letters (\eg\ \lam{v} or \lam{b}) to refer to
+      (references to) binders.
+
+      For example, the pattern \lam{a + B} will match the expression 
+      \lam{v + (2 * w)} (binding \lam{a} to \lam{v} and \lam{B} to 
+      \lam{(2 * w)}), but not \lam{(2 * w) + v}.
+      \stopdesc
+
+      \startdesc{<expression conditions>}
+      These are extra conditions on the expression that is matched. These
+      conditions can be used to further limit the cases in which the
+      transformation applies, commonly to prevent a transformation from
+      causing a loop with itself or another transformation.
+
+      Only if these conditions are \emph{all} satisfied, the transformation
+      applies.
+      \stopdesc
+
+      \startdesc{<context conditions>}
+      These are a number of extra conditions on the context of the function. In
+      particular, these conditions can require some (other) top level function to be
+      present, whose value matches the pattern given here. The format of each of
+      these conditions is: \lam{binder = <pattern>}.
+
+      Typically, the binder is some placeholder bound in the \lam{<original
+      expression>}, while the pattern contains some placeholders that are used in
+      the \lam{transformed expression}.
+      
+      Only if a top level binder exists that matches each binder and pattern,
+      the transformation applies.
+      \stopdesc
+
+      \startdesc{<transformed expression>}
+      This is the expression template that is the result of the transformation. If, looking
+      at the above three items, the transformation applies, the \lam{<original
+      expression>} is completely replaced by the \lam{<transformed expression>}.
+      We call this a template, because it can contain placeholders, referring to
+      any placeholder bound by the \lam{<original expression>} or the
+      \lam{<context conditions>}. The resulting expression will have those
+      placeholders replaced by the values bound to them.
+
+      Any binder (lowercase) placeholder that has no value bound to it yet will be
+      bound to (and replaced with) a fresh binder.
+      \stopdesc
+
+      \startdesc{<context additions>}
+      These are templates for new functions to be added to the context.
+      This is a way to let a transformation create new top level
+      functions.
+
+      Each addition has the form \lam{binder = template}. As above, any
+      placeholder in the addition is replaced with the value bound to it, and any
+      binder placeholder that has no value bound to it yet will be bound to (and
+      replaced with) a fresh binder.
+      \stopdesc
+
+    To understand this notation better, the step by step application of
+    the η-expansion transformation to a simple \small{ALU} will be
+    shown. Consider η-expansion, which is a common transformation from
+    labmda calculus, described using above notation as follows:
+
+    \starttrans
+    E                 \lam{E :: a -> b}
+    --------------    \lam{E} does not occur on a function position in an application
+    λx.E x            \lam{E} is not a lambda abstraction.
+    \stoptrans
+
+    η-expansion is a well known transformation from lambda calculus. What
+    this transformation does, is take any expression that has a function type
+    and turn it into a lambda expression (giving an explicit name to the
+    argument). There are some extra conditions that ensure that this
+    transformation does not apply infinitely (which are not necessarily part
+    of the conventional definition of η-expansion).
+
+    Consider the following function, in Core notation, which is a fairly obvious way to specify a
+    simple \small{ALU} (Note that it is not yet in normal form, but
+    \in{example}[ex:AddSubAlu] shows the normal form of this function).
+    The parentheses around the \lam{+} and \lam{-} operators are
+    commonly used in Haskell to show that the operators are used as
+    normal functions, instead of \emph{infix} operators (\eg, the
+    operators appear before their arguments, instead of in between).
+
+    \startlambda 
+    alu :: Bit -> Word -> Word -> Word
+    alu = λopcode. case opcode of
+      Low -> (+)
+      High -> (-)
+    \stoplambda
+
+    There are a few subexpressions in this function to which we could possibly
+    apply the transformation. Since the pattern of the transformation is only
+    the placeholder \lam{E}, any expression will match that. Whether the
+    transformation applies to an expression is thus solely decided by the
+    conditions to the right of the transformation.
+
+    We will look at each expression in the function in a top down manner. The
+    first expression is the entire expression the function is bound to.
+
+    \startlambda
+    λopcode. case opcode of
+      Low -> (+)
+      High -> (-)
+    \stoplambda
+
+    As said, the expression pattern matches this. The type of this expression is
+    \lam{Bit -> Word -> Word -> Word}, which matches \lam{a -> b} (Note that in
+    this case \lam{a = Bit} and \lam{b = Word -> Word -> Word}).
+
+    Since this expression is at top level, it does not occur at a function
+    position of an application. However, The expression is a lambda abstraction,
+    so this transformation does not apply.
+
+    The next expression we could apply this transformation to, is the body of
+    the lambda abstraction:
+
+    \startlambda
+    case opcode of
+      Low -> (+)
+      High -> (-)
+    \stoplambda
+
+    The type of this expression is \lam{Word -> Word -> Word}, which again
+    matches \lam{a -> b}. The expression is the body of a lambda expression, so
+    it does not occur at a function position of an application. Finally, the
+    expression is not a lambda abstraction but a case expression, so all the
+    conditions match. There are no context conditions to match, so the
+    transformation applies.
+
+    By now, the placeholder \lam{E} is bound to the entire expression. The
+    placeholder \lam{x}, which occurs in the replacement template, is not bound
+    yet, so we need to generate a fresh binder for that. Let us use the binder
+    \lam{a}. This results in the following replacement expression:
+
+    \startlambda
+    λa.(case opcode of
+      Low -> (+)
+      High -> (-)) a
+    \stoplambda
+
+    Continuing with this expression, we see that the transformation does not
+    apply again (it is a lambda expression). Next we look at the body of this
+    lambda abstraction:
+
+    \startlambda
+    (case opcode of
+      Low -> (+)
+      High -> (-)) a
+    \stoplambda
+    
+    Here, the transformation does apply, binding \lam{E} to the entire
+    expression (which has type \lam{Word -> Word}) and binding \lam{x}
+    to the fresh binder \lam{b}, resulting in the replacement:
+
+    \startlambda
+    λb.(case opcode of
+      Low -> (+)
+      High -> (-)) a b
+    \stoplambda
+
+    The transformation does not apply to this lambda abstraction, so we
+    look at its body. For brevity, we will put the case expression on one line from
+    now on.
+
+    \startlambda
+    (case opcode of Low -> (+); High -> (-)) a b
+    \stoplambda
+
+    The type of this expression is \lam{Word}, so it does not match \lam{a -> b}
+    and the transformation does not apply. Next, we have two options for the
+    next expression to look at: the function position and argument position of
+    the application. The expression in the argument position is \lam{b}, which
+    has type \lam{Word}, so the transformation does not apply. The expression in
+    the function position is:
+
+    \startlambda
+    (case opcode of Low -> (+); High -> (-)) a
+    \stoplambda
+
+    Obviously, the transformation does not apply here, since it occurs in
+    function position (which makes the second condition false). In the same
+    way the transformation does not apply to both components of this
+    expression (\lam{case opcode of Low -> (+); High -> (-)} and \lam{a}), so
+    we will skip to the components of the case expression: the scrutinee and
+    both alternatives. Since the opcode is not a function, it does not apply
+    here.
+
+    The first alternative is \lam{(+)}. This expression has a function type
+    (the operator still needs two arguments). It does not occur in function
+    position of an application and it is not a lambda expression, so the
+    transformation applies.
+
+    We look at the \lam{<original expression>} pattern, which is \lam{E}.
+    This means we bind \lam{E} to \lam{(+)}. We then replace the expression
+    with the \lam{<transformed expression>}, replacing all occurences of
+    \lam{E} with \lam{(+)}. In the \lam{<transformed expression>}, the This gives us the replacement expression:
+    \lam{λx.(+) x} (A lambda expression binding \lam{x}, with a body that
+    applies the addition operator to \lam{x}).
+
+    The complete function then becomes:
+    \startlambda
+    (case opcode of Low -> λa1.(+) a1; High -> (-)) a
+    \stoplambda
+
+    Now the transformation no longer applies to the complete first alternative
+    (since it is a lambda expression). It does not apply to the addition
+    operator again, since it is now in function position in an application. It
+    does, however, apply to the application of the addition operator, since
+    that is neither a lambda expression nor does it occur in function
+    position. This means after one more application of the transformation, the
+    function becomes:
+
+    \startlambda
+    (case opcode of Low -> λa1.λb1.(+) a1 b1; High -> (-)) a
+    \stoplambda
+
+    The other alternative is left as an exercise to the reader. The final
+    function, after applying η-expansion until it does no longer apply is:
+
+    \startlambda 
+    alu :: Bit -> Word -> Word -> Word
+    alu = λopcode.λa.b. (case opcode of
+      Low -> λa1.λb1 (+) a1 b1
+      High -> λa2.λb2 (-) a2 b2) a b
+    \stoplambda
+
+    \subsection{Transformation application}
+      In this chapter we define a number of transformations, but how will we apply
+      these? As stated before, our normal form is reached as soon as no
+      transformation applies anymore. This means our application strategy is to
+      simply apply any transformation that applies, and continuing to do that with
+      the result of each transformation.
+
+      In particular, we define no particular order of transformations. Since
+      transformation order should not influence the resulting normal form,
+      this leaves the implementation free to choose any application order that
+      results in an efficient implementation. Unfortunately this is not
+      entirely true for the current set of transformations. See
+      \in{section}[sec:normalization:non-determinism] for a discussion of this
+      problem.
+
+      When applying a single transformation, we try to apply it to every (sub)expression
+      in a function, not just the top level function body. This allows us to
+      keep the transformation descriptions concise and powerful.
+
+    \subsection{Definitions}
+      A \emph{global variable} is any variable (binder) that is bound at the
+      top level of a program, or an external module. A \emph{local variable} is any
+      other variable (\eg, variables local to a function, which can be bound by
+      lambda abstractions, let expressions and pattern matches of case
+      alternatives). This is a slightly different notion of global versus
+      local than what \small{GHC} uses internally, but for our purposes
+      the distinction \GHC\ makes is not useful.
+      \defref{global variable} \defref{local variable}
+
+      A \emph{hardware representable} (or just \emph{representable}) type or value
+      is (a value of) a type that we can generate a signal for in hardware. For
+      example, a bit, a vector of bits, a 32 bit unsigned word, etc. Values that are
+      not runtime representable notably include (but are not limited to): types,
+      dictionaries, functions.
+      \defref{representable}
+
+      A \emph{built-in function} is a function supplied by the Cλash
+      framework, whose implementation is not used to generate \VHDL. This is
+      either because it is no valid Cλash (like most list functions that need
+      recursion) or because a Cλash implementation would be unwanted (for the
+      addition operator, for example, we would rather use the \VHDL addition
+      operator to let the synthesis tool decide what kind of adder to use
+      instead of explicitly describing one in Cλash). \defref{built-in
+      function}
+
+      These are functions like \lam{map}, \lam{hwor}, \lam{+} and \lam{length}.
+
+      For these functions, Cλash has a \emph{built-in hardware translation},
+      so calls to these functions can still be translated.  Built-in functions
+      must have a valid Haskell implementation, of course, to allow
+      simulation. 
+
+      A \emph{user-defined} function is a function for which no built-in
+      translation is available and whose definition will thus need to be
+      translated to Cλash. \defref{user-defined function}
+
+      \subsubsection[sec:normalization:predicates]{Predicates}
+        Here, we define a number of predicates that can be used below to concisely
+        specify conditions.
+
+        \emph{gvar(expr)} is true when \emph{expr} is a variable that references a
+        global variable. It is false when it references a local variable.
+
+        \emph{lvar(expr)} is the complement of \emph{gvar}; it is true when \emph{expr}
+        references a local variable, false when it references a global variable.
+
+        \emph{representable(expr)} is true when \emph{expr} is \emph{representable}.
+
+    \subsection[sec:normalization:uniq]{Binder uniqueness}
+      A common problem in transformation systems, is binder uniqueness. When not
+      considering this problem, it is easy to create transformations that mix up
+      bindings and cause name collisions. Take for example, the following Core
+      expression:
+
+      \startlambda
+      (λa.λb.λc. a * b * c) x c
+      \stoplambda
+
+      By applying β-reduction (see \in{section}[sec:normalization:beta]) once,
+      we can simplify this expression to:
+
+      \startlambda
+      (λb.λc. x * b * c) c
+      \stoplambda
+
+      Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
+      binder. No harm done here. But note that we see multiple occurences of the
+      \lam{c} binder. The first is a binding occurence, to which the second refers.
+      The last, however refers to \emph{another} instance of \lam{c}, which is
+      bound somewhere outside of this expression. Now, if we would apply beta
+      reduction without taking heed of binder uniqueness, we would get:
+
+      \startlambda
+      λc. x * c * c
+      \stoplambda
+
+      This is obviously not what was supposed to happen! The root of this problem is
+      the reuse of binders: identical binders can be bound in different,
+      but overlapping scopes. Any variable reference in those
+      overlapping scopes then refers to the variable bound in the inner
+      (smallest) scope. There is not way to refer to the variable in the
+      outer scope. This effect is usually referred to as
+      \emph{shadowing}: when a binder is bound in a scope where the
+      binder already had a value, the inner binding is said to
+      \emph{shadow} the outer binding. In the example above, the \lam{c}
+      binder was bound outside of the expression and in the inner lambda
+      expression. Inside that lambda expression, only the inner \lam{c}
+      can be accessed.
+
+      There are a number of ways to solve this. \small{GHC} has isolated this
+      problem to their binder substitution code, which performs \emph{deshadowing}
+      during its expression traversal. This means that any binding that shadows
+      another binding on a higher level is replaced by a new binder that does not
+      shadow any other binding. This non-shadowing invariant is enough to prevent
+      binder uniqueness problems in \small{GHC}.
+
+      In our transformation system, maintaining this non-shadowing invariant is
+      a bit harder to do (mostly due to implementation issues, the prototype
+      does not use \small{GHC}'s subsitution code). Also, the following points
+      can be observed.
+
+      \startitemize
+      \item Deshadowing does not guarantee overall uniqueness. For example, the
+      following (slightly contrived) expression shows the identifier \lam{x} bound in
+      two seperate places (and to different values), even though no shadowing
+      occurs.
+
+      \startlambda
+      (let x = 1 in x) + (let x = 2 in x)
+      \stoplambda
+
+      \item In our normal form (and the resulting \small{VHDL}), all binders
+      (signals) within the same function (entity) will end up in the same
+      scope. To allow this, all binders within the same function should be
+      unique.
+
+      \item When we know that all binders in an expression are unique, moving around
+      or removing a subexpression will never cause any binder conflicts. If we have
+      some way to generate fresh binders, introducing new subexpressions will not
+      cause any problems either. The only way to cause conflicts is thus to
+      duplicate an existing subexpression.
+      \stopitemize
+
+      Given the above, our prototype maintains a unique binder invariant. This
+      means that in any given moment during normalization, all binders \emph{within
+      a single function} must be unique. To achieve this, we apply the following
+      technique.
+
+      \todo{Define fresh binders and unique supplies}
+
+      \startitemize
+      \item Before starting normalization, all binders in the function are made
+      unique. This is done by generating a fresh binder for every binder used. This
+      also replaces binders that did not cause any conflict, but it does ensure that
+      all binders within the function are generated by the same unique supply.
+      \refdef{fresh binder}
+      \item Whenever a new binder must be generated, we generate a fresh binder that
+      is guaranteed to be different from \emph{all binders generated so far}. This
+      can thus never introduce duplication and will maintain the invariant.
+      \item Whenever (a part of) an expression is duplicated (for example when
+      inlining), all binders in the expression are replaced with fresh binders
+      (using the same method as at the start of normalization). These fresh binders
+      can never introduce duplication, so this will maintain the invariant.
+      \item Whenever we move part of an expression around within the function, there
+      is no need to do anything special. There is obviously no way to introduce
+      duplication by moving expressions around. Since we know that each of the
+      binders is already unique, there is no way to introduce (incorrect) shadowing
+      either.
+      \stopitemize
+
+  \section{Transform passes}
+    In this section we describe the actual transforms.
+
+    Each transformation will be described informally first, explaining
+    the need for and goal of the transformation. Then, we will formally define
+    the transformation using the syntax introduced in
+    \in{section}[sec:normalization:transformation].
+
+    \subsection{General cleanup}
+      \placeintermezzo{}{
+        \defref{substitution notation}
+        \startframedtext[width=8cm,background=box,frame=no]
+        \startalignment[center]
+          {\tfa Substitution notation}
+        \stopalignment
+        \blank[medium]
+
+        In some of the transformations in this chapter, we need to perform
+        substitution on an expression. Substitution means replacing every
+        occurence of some expression (usually a variable reference) with
+        another expression.
+
+        There have been a lot of different notations used in literature for
+        specifying substitution. The notation that will be used in this report
+        is the following:
+
+        \startlambda
+          E[A=>B]
+        \stoplambda
+
+        This means expression \lam{E} with all occurences of \lam{A} replaced
+        with \lam{B}.
+        \stopframedtext
+      }
+
+      These transformations are general cleanup transformations, that aim to
+      make expressions simpler. These transformations usually clean up the
+      mess left behind by other transformations or clean up expressions to
+      expose new transformation opportunities for other transformations.
+
+      Most of these transformations are standard optimizations in other
+      compilers as well. However, in our compiler, most of these are not just
+      optimizations, but they are required to get our program into intended
+      normal form.
+
+      \subsubsection[sec:normalization:beta]{β-reduction}
+        β-reduction is a well known transformation from lambda calculus, where it is
+        the main reduction step. It reduces applications of lambda abstractions,
+        removing both the lambda abstraction and the application.
+
+        In our transformation system, this step helps to remove unwanted lambda
+        abstractions (basically all but the ones at the top level). Other
+        transformations (application propagation, non-representable inlining) make
+        sure that most lambda abstractions will eventually be reducable by
+        β-reduction.
+
+        Note that β-reduction also works on type lambda abstractions and type
+        applications as well. This means the substitution below also works on
+        type variables, in the case that the binder is a type variable and teh
+        expression applied to is a type.
+
+        \starttrans
+        (λx.E) M
+        -----------------
+        E[x=>M]
+        \stoptrans
+
+        % And an example
+        \startbuffer[from]
+        (λa. 2 * a) (2 * b)
+        \stopbuffer
+
+        \startbuffer[to]
+        2 * (2 * b)
+        \stopbuffer
+
+        \transexample{beta}{β-reduction}{from}{to}
+
+        \startbuffer[from]
+        (λt.λa::t. a) @Int
+        \stopbuffer
+
+        \startbuffer[to]
+        (λa::Int. a)
+        \stopbuffer
+
+        \transexample{beta-type}{β-reduction for type abstractions}{from}{to}
+       
+      \subsubsection{Unused let binding removal}
+        This transformation removes let bindings that are never used.
+        Occasionally, \GHC's desugarer introduces some unused let bindings.
+
+        This normalization pass should really be not be necessary to get
+        into intended normal form (since the intended normal form
+        definition \refdef{intended normal form definition} does not
+        require that every binding is used), but in practice the
+        desugarer or simplifier emits some bindings that cannot be
+        normalized (e.g., calls to a
+        \hs{Control.Exception.Base.patError}) but are not used anywhere
+        either. To prevent the \VHDL\ generation from breaking on these
+        artifacts, this transformation removes them.
+
+        \todo{Do not use old-style numerals in transformations}
+        \starttrans
+        letrec
+          a0 = E0
+          \vdots
+          ai = Ei
+          \vdots
+          an = En
+        in
+          M                             \lam{ai} does not occur free in \lam{M}
+        ----------------------------    \lam{\forall j, 0 ≤ j ≤ n, j ≠ i} (\lam{ai} does not occur free in \lam{Ej})
+        letrec
+          a0 = E0
+          \vdots
+          ai-1 = Ei-1
+          ai+1 = Ei+1
+          \vdots
+          an = En
+        in
+          M
+        \stoptrans
+
+        % And an example
+        \startbuffer[from]
+        let
+          x = 1
+        in
+          2
+        \stopbuffer
+
+        \startbuffer[to]
+        let
+        in
+          2
+        \stopbuffer
+
+        \transexample{unusedlet}{Unused let binding removal}{from}{to}
+
+      \subsubsection{Empty let removal}
+        This transformation is simple: it removes recursive lets that have no bindings
+        (which usually occurs when unused let binding removal removes the last
+        binding from it).
+
+        Note that there is no need to define this transformation for
+        non-recursive lets, since they always contain exactly one binding.
+
+        \starttrans
+        letrec in M
+        --------------
+        M
+        \stoptrans
+
+        % And an example
+        \startbuffer[from]
+        let
+        in
+          2
+        \stopbuffer
+
+        \startbuffer[to]
+          2
+        \stopbuffer
+
+        \transexample{emptylet}{Empty let removal}{from}{to}
+
+      \subsubsection[sec:normalization:simplelet]{Simple let binding removal}
+        This transformation inlines simple let bindings, that bind some
+        binder to some other binder instead of a more complex expression (\ie\
+        a = b).
+
+        This transformation is not needed to get an expression into intended
+        normal form (since these bindings are part of the intended normal
+        form), but makes the resulting \small{VHDL} a lot shorter.
+       
+        \refdef{substitution notation}
+        \starttrans
+        letrec
+          a0 = E0
+          \vdots
+          ai = b
+          \vdots
+          an = En
+        in
+          M
+        -----------------------------  \lam{b} is a variable reference
+        letrec                         \lam{ai} ≠ \lam{b}
+          a0 = E0 [ai=>b]
+          \vdots
+          ai-1 = Ei-1 [ai=>b]
+          ai+1 = Ei+1 [ai=>b]
+          \vdots
+          an = En [ai=>b]
+        in
+          M[ai=>b]
+        \stoptrans
+
+        \todo{example}
+
+      \subsubsection{Cast propagation / simplification}
+        This transform pushes casts down into the expression as far as
+        possible. This transformation has been added to make a few
+        specific corner cases work, but it is not clear yet if this
+        transformation handles cast expressions completely or in the
+        right way. See \in{section}[sec:normalization:castproblems].
+
+        \starttrans
+        (let binds in E) ▶ T
+        -------------------------
+        let binds in (E ▶ T)
+        \stoptrans
+
+        \starttrans
+        (case S of
+          p0 -> E0
+          \vdots
+          pn -> En
+        ) ▶ T
+        -------------------------
+        case S of
+          p0 -> E0 ▶ T
+          \vdots
+          pn -> En ▶ T
+        \stoptrans
+
+      \subsubsection{Top level binding inlining}
+        \refdef{top level binding}
+        This transform takes simple top level bindings generated by the
+        \small{GHC} compiler. \small{GHC} sometimes generates very simple
+        \quote{wrapper} bindings, which are bound to just a variable
+        reference, or contain just a (partial) function appliation with
+        the type and dictionary arguments filled in (such as the
+        \lam{(+)} in the example below).
+
+        Note that this transformation is completely optional. It is not
+        required to get any function into intended normal form, but it does help making
+        the resulting VHDL output easier to read (since it removes components
+        that do not add any real structure, but do hide away operations and
+        cause extra clutter).
+
+        This transform takes any top level binding generated by \GHC,
+        whose normalized form contains only a single let binding.
+
+        \starttrans
+        x = λa0 ... λan.let y = E in y
+        ~
+        x
+        --------------------------------------         \lam{x} is generated by the compiler
+        λa0 ... λan.let y = E in y
+        \stoptrans
+
+        \startbuffer[from]
+        (+) :: Word -> Word -> Word
+        (+) = GHC.Num.(+) @Word \$dNum
+        ~
+        (+) a b
+        \stopbuffer
+        \startbuffer[to]
+        GHC.Num.(+) @ Alu.Word \$dNum a b
+        \stopbuffer
+
+        \transexample{toplevelinline}{Top level binding inlining}{from}{to}
+       
+        \in{Example}[ex:trans:toplevelinline] shows a typical application of
+        the addition operator generated by \GHC. The type and dictionary
+        arguments used here are described in
+        \in{Section}[section:prototype:polymorphism].
+
+        Without this transformation, there would be a \lam{(+)} entity
+        in the \VHDL\ which would just add its inputs. This generates a
+        lot of overhead in the \VHDL, which is particularly annoying
+        when browsing the generated RTL schematic (especially since most
+        non-alphanumerics, like all characters in \lam{(+)}, are not
+        allowed in \VHDL\ architecture names\footnote{Technically, it is
+        allowed to use non-alphanumerics when using extended
+        identifiers, but it seems that none of the tooling likes
+        extended identifiers in filenames, so it effectively does not
+        work.}, so the entity would be called \quote{w7aA7f} or
+        something similarly meaningless and autogenerated).
+
+    \subsection{Program structure}
+      These transformations are aimed at normalizing the overall structure
+      into the intended form. This means ensuring there is a lambda abstraction
+      at the top for every argument (input port or current state), putting all
+      of the other value definitions in let bindings and making the final
+      return value a simple variable reference.
+
+      \subsubsection[sec:normalization:eta]{η-expansion}
+        This transformation makes sure that all arguments of a function-typed
+        expression are named, by introducing lambda expressions. When combined with
+        β-reduction and non-representable binding inlining, all function-typed
+        expressions should be lambda abstractions or global identifiers.
+
+        \starttrans
+        E                 \lam{E :: a -> b}
+        --------------    \lam{E} does not occur on a function position in an application
+        λx.E x            \lam{E} is not a lambda abstraction.
+        \stoptrans
+
+        \startbuffer[from]
+        foo = λa.case a of 
+          True -> λb.mul b b
+          False -> id
+        \stopbuffer
+
+        \startbuffer[to]
+        foo = λa.λx.(case a of 
+            True -> λb.mul b b
+            False -> λy.id y) x
+        \stopbuffer
+
+        \transexample{eta}{η-expansion}{from}{to}
+
+      \subsubsection[sec:normalization:appprop]{Application propagation}
+        This transformation is meant to propagate application expressions downwards
+        into expressions as far as possible. This allows partial applications inside
+        expressions to become fully applied and exposes new transformation
+        opportunities for other transformations (like β-reduction and
+        specialization).
+
+        Since all binders in our expression are unique (see
+        \in{section}[sec:normalization:uniq]), there is no risk that we will
+        introduce unintended shadowing by moving an expression into a lower
+        scope. Also, since only move expression into smaller scopes (down into
+        our expression), there is no risk of moving a variable reference out
+        of the scope in which it is defined.
+
+        \starttrans
+        (letrec binds in E) M
+        ------------------------
+        letrec binds in E M
+        \stoptrans
+
+        % And an example
+        \startbuffer[from]
+        ( letrec
+            val = 1
+          in 
+            add val
+        ) 3
+        \stopbuffer
+
+        \startbuffer[to]
+        letrec
+          val = 1
+        in 
+          add val 3
+        \stopbuffer
+
+        \transexample{appproplet}{Application propagation for a let expression}{from}{to}
+
+        \starttrans
+        (case x of
+          p0 -> E0
+          \vdots
+          pn -> En) M
+        -----------------
+        case x of
+          p0 -> E0 M
+          \vdots
+          pn -> En M
+        \stoptrans
+
+        % And an example
+        \startbuffer[from]
+        ( case x of 
+            True -> id
+            False -> neg
          ) 1
-    b = (let y = 3 in add y) 2
-in
-  (λz.add 1 z) 3
-\stopbuffer
-
-\startbuffer[to]
-let a = case x of 
-           True -> id 1
-           False -> neg 1
-    b = let y = 3 in add y 2
-in
-  add 1 3
-\stopbuffer
-
-\transexample{Extended β-reduction}{from}{to}
-
-\subsection{Argument simplification}
-The transforms in this section deal with simplifying application
-arguments into normal form. The goal here is to:
-
-\startitemize
- \item Make all arguments of user-defined functions (\eg, of which
- we have a function body) simple variable references of a runtime
- representable type.
- \item Make all arguments of builtin functions either:
-   \startitemize
-    \item A type argument.
-    \item A dictionary argument.
-    \item A type level expression.
-    \item A variable reference of a runtime representable type.
-    \item A variable reference or partial application of a function type.
-   \stopitemize
-\stopitemize
-
-When looking at the arguments of a user-defined function, we can
-divide them into two categories:
-\startitemize
-  \item Arguments with a runtime representable type (\eg bits or vectors).
-
-        These arguments can be preserved in the program, since they can
-        be translated to input ports later on.  However, since we can
-        only connect signals to input ports, these arguments must be
-        reduced to simple variables (for which signals will be
-        produced). This is taken care of by the argument extraction
-        transform.
-  \item Non-runtime representable typed arguments.
+        \stopbuffer
+
+        \startbuffer[to]
+        case x of 
+          True -> id 1
+          False -> neg 1
+        \stopbuffer
+
+        \transexample{apppropcase}{Application propagation for a case expression}{from}{to}
+
+      \subsubsection[sec:normalization:letrecurse]{Let recursification}
+        This transformation makes all non-recursive lets recursive. In the
+        end, we want a single recursive let in our normalized program, so all
+        non-recursive lets can be converted. This also makes other
+        transformations simpler: they only need to be specified for recursive
+        let expressions (and simply will not apply to non-recursive let
+        expressions until this transformation has been applied).
+
+        \starttrans
+        let
+          a = E
+        in
+          M
+        ------------------------------------------
+        letrec
+          a = E
+        in
+          M
+        \stoptrans
+
+      \subsubsection{Let flattening}
+        This transformation puts nested lets in the same scope, by lifting the
+        binding(s) of the inner let into the outer let. Eventually, this will
+        cause all let bindings to appear in the same scope.
+
+        This transformation only applies to recursive lets, since all
+        non-recursive lets will be made recursive (see
+        \in{section}[sec:normalization:letrecurse]).
+
+        Since we are joining two scopes together, there is no risk of moving a
+        variable reference out of the scope where it is defined.
+
+        \starttrans
+        letrec 
+          a0 = E0
+          \vdots
+          ai = (letrec bindings in M)
+          \vdots
+          an = En
+        in
+          N
+        ------------------------------------------
+        letrec
+          a0 = E0
+          \vdots
+          ai = M
+          \vdots
+          an = En
+          bindings
+        in
+          N
+        \stoptrans
+
+        \startbuffer[from]
+        letrec
+          a = 1
+          b = letrec
+            x = a
+            y = c
+          in
+            x + y
+          c = 2
+        in
+          b
+        \stopbuffer
+        \startbuffer[to]
+        letrec
+          a = 1
+          b = x + y
+          c = 2
+          x = a
+          y = c
+        in
+          b
+        \stopbuffer
+
+        \transexample{letflat}{Let flattening}{from}{to}
+
+      \subsubsection{Return value simplification}
+        This transformation ensures that the return value of a function is always a
+        simple local variable reference.
+
+        The basic idea of this transformation is to take the body of a
+        function and bind it with a let expression (so the body of that let
+        expression becomes a variable reference that can be used as the output
+        port). If the body of the function happens to have lambda abstractions
+        at the top level (which is allowed by the intended normal
+        form\refdef{intended normal form definition}), we take the body of the
+        inner lambda instead. If that happens to be a let expression already
+        (which is allowed by the intended normal form), we take the body of
+        that let (which is not allowed to be anything but a variable reference
+        according the the intended normal form).
+
+        This transformation uses the context conditions in a special way.
+        These contexts, like \lam{x = λv1 ... λvn.E}, are above the dotted
+        line and provide a condition on the environment (\ie\ they require a
+        certain top level binding to be present). These ensure that
+        expressions are only transformed when they are in the functions
+        \quote{return value} directly. This means the context conditions have
+        to interpreted in the right way: not \quote{if there is any function
+        \lam{x} that binds \lam{E}, any \lam{E} can be transformed}, but we
+        mean only the \lam{E} that is bound by \lam{x}).
+
+        Be careful when reading the transformations: Not the entire function
+        from the context is transformed, just a part of it.
+
+        Note that the return value is not simplified if it is not representable.
+        Otherwise, this would cause a loop with the inlining of
+        unrepresentable bindings in
+        \in{section}[sec:normalization:nonrepinline]. If the return value is
+        not representable because it has a function type, η-expansion should
+        make sure that this transformation will eventually apply.  If the
+        value is not representable for other reasons, the function result
+        itself is not representable, meaning this function is not translatable
+        anyway.
+
+        \starttrans
+        x = λv1 ... λvn.E                \lam{n} can be zero
+        ~                                \lam{E} is representable
+        E                                \lam{E} is not a lambda abstraction
+        ---------------------------      \lam{E} is not a let expression
+        letrec y = E in y                \lam{E} is not a local variable reference
+        \stoptrans
+
+        \starttrans
+        x = λv1 ... λvn.letrec binds in E     \lam{n} can be zero
+        ~                                     \lam{E} is representable
+        letrec binds in E                     \lam{E} is not a local variable reference
+        ------------------------------------
+        letrec binds; y = E in y
+        \stoptrans
+
+        \startbuffer[from]
+        x = add 1 2
+        \stopbuffer
+
+        \startbuffer[to]
+        x = letrec y = add 1 2 in y
+        \stopbuffer
+
+        \transexample{retvalsimpl}{Return value simplification}{from}{to}
+
+        \startbuffer[from]
+        x = λa. add 1 a
+        \stopbuffer
+
+        \startbuffer[to]
+        x = λa. letrec 
+          y = add 1 a 
+        in
+          y
+        \stopbuffer
+
+        \transexample{retvalsimpllam}{Return value simplification with a lambda abstraction}{from}{to}
          
-        These arguments cannot be preserved in the program, since we
-        cannot represent them as input or output ports in the resulting
-        VHDL. To remove them, we create a specialized version of the
-        called function with these arguments filled in. This is done by
-        the argument propagation transform.
-\stopitemize
-
-When looking at the arguments of a builtin function, we can divide them
-into categories: 
-
-\startitemize
-  \item Arguments with a runtime representable type.
+        \startbuffer[from]
+        x = letrec
+          a = add 1 2 
+        in 
+          add a 3
+        \stopbuffer
+
+        \startbuffer[to]
+        x = letrec
+          a = add 1 2 
+          y = add a 3 
+        in
+          y
+        \stopbuffer
+
+        \transexample{retvalsimpllet}{Return value simplification with a let expression}{from}{to}
+
+    \subsection[sec:normalization:argsimpl]{Representable arguments simplification}
+      This section contains just a single transformation that deals with
+      representable arguments in applications. Non-representable arguments are
+      handled by the transformations in
+      \in{section}[sec:normalization:nonrep]. 
+      
+      This transformation ensures that all representable arguments will become
+      references to local variables. This ensures they will become references
+      to local signals in the resulting \small{VHDL}, which is required due to
+      limitations in the component instantiation code in \VHDL\ (one can only
+      assign a signal or constant to an input port). By ensuring that all
+      arguments are always simple variable references, we always have a signal
+      available to map to the input ports.
+
+      To reduce a complex expression to a simple variable reference, we create
+      a new let expression around the application, which binds the complex
+      expression to a new variable. The original function is then applied to
+      this variable.
+
+      \refdef{global variable}
+      Note that references to \emph{global variables} (like a top level
+      function without arguments, but also an argumentless dataconstructors
+      like \lam{True}) are also simplified. Only local variables generate
+      signals in the resulting architecture. Even though argumentless
+      dataconstructors generate constants in generated \VHDL\ code and could be
+      mapped to an input port directly, they are still simplified to make the
+      normal form more regular.
+
+      \refdef{representable}
+      \starttrans
+      M N
+      --------------------    \lam{N} is representable
+      letrec x = N in M x     \lam{N} is not a local variable reference
+      \stoptrans
+      \refdef{local variable}
+
+      \startbuffer[from]
+      add (add a 1) 1
+      \stopbuffer
+
+      \startbuffer[to]
+      letrec x = add a 1 in add x 1
+      \stopbuffer
+
+      \transexample{argsimpl}{Argument simplification}{from}{to}
+
+    \subsection[sec:normalization:built-ins]{Built-in functions}
+      This section deals with (arguments to) built-in functions.  In the
+      intended normal form definition\refdef{intended normal form definition}
+      we can see that there are three sorts of arguments a built-in function
+      can receive.
+      
+      \startitemize[KR]
+        \item A representable local variable reference. This is the most
+        common argument to any function. The argument simplification
+        transformation described in \in{section}[sec:normalization:argsimpl]
+        makes sure that \emph{any} representable argument to \emph{any}
+        function (including built-in functions) is turned into a local variable
+        reference.
+        \item (A partial application of) a top level function (either built-in on
+        user-defined). The function extraction transformation described in
+        this section takes care of turning every functiontyped argument into
+        (a partial application of) a top level function.
+        \item Any expression that is not representable and does not have a
+        function type. Since these can be any expression, there is no
+        transformation needed. Note that this category is exactly all
+        expressions that are not transformed by the transformations for the
+        previous two categories. This means that \emph{any} Core expression
+        that is used as an argument to a built-in function will be either
+        transformed into one of the above categories, or end up in this
+        categorie. In any case, the result is in normal form.
+      \stopitemize
+
+      As noted, the argument simplification will handle any representable
+      arguments to a built-in function. The following transformation is needed
+      to handle non-representable arguments with a function type, all other
+      non-representable arguments do not need any special handling.
+
+      \subsubsection[sec:normalization:funextract]{Function extraction}
+        This transform deals with function-typed arguments to built-in
+        functions. 
+        Since built-in functions cannot be specialized (see
+        \in{section}[sec:normalization:specialize]) to remove the arguments,
+        these arguments are extracted into a new global function instead. In
+        other words, we create a new top level function that has exactly the
+        extracted argument as its body. This greatly simplifies the
+        translation rules needed for built-in functions, since they only need
+        to handle (partial applications of) top level functions.
+
+        Any free variables occuring in the extracted arguments will become
+        parameters to the new global function. The original argument is replaced
+        with a reference to the new function, applied to any free variables from
+        the original argument.
+
+        This transformation is useful when applying higher-order built-in functions
+        like \hs{map} to a lambda abstraction, for example. In this case, the code
+        that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
+        partial applications, not any other expression (such as lambda abstractions or
+        even more complicated expressions).
+
+        \starttrans
+        M N                     \lam{M} is (a partial aplication of) a built-in function.
+        ---------------------   \lam{f0 ... fn} are all free local variables of \lam{N}
+        M (x f0 ... fn)         \lam{N :: a -> b}
+        ~                       \lam{N} is not a (partial application of) a top level function
+        x = λf0 ... λfn.N
+        \stoptrans
+
+        \startbuffer[from]
+        addList = λb.λxs.map (λa . add a b) xs
+        \stopbuffer
+
+        \startbuffer[to]
+        addList = λb.λxs.map (f b) xs
+        ~
+        f = λb.λa.add a b
+        \stopbuffer
+
+        \transexample{funextract}{Function extraction}{from}{to}
+
+        Note that the function \lam{f} will still need normalization after
+        this.
+
+    \subsection{Case normalisation}
+      The transformations in this section ensure that case statements end up
+      in normal form.
+
+      \subsubsection{Scrutinee simplification}
+        This transform ensures that the scrutinee of a case expression is always
+        a simple variable reference.
+
+        \starttrans
+        case E of
+          alts
+        -----------------        \lam{E} is not a local variable reference
+        letrec x = E in 
+          case x of
+            alts
+        \stoptrans
+
+        \startbuffer[from]
+        case (foo a) of
+          True -> a
+          False -> b
+        \stopbuffer
+
+        \startbuffer[to]
+        letrec x = foo a in
+          case x of
+            True -> a
+            False -> b
+        \stopbuffer
+
+        \transexample{letflat}{Case normalisation}{from}{to}
+
+
+        \placeintermezzo{}{
+          \defref{wild binders}
+          \startframedtext[width=7cm,background=box,frame=no]
+          \startalignment[center]
+            {\tfa Wild binders}
+          \stopalignment
+          \blank[medium]
+            In a functional expression, a \emph{wild binder} refers to any
+            binder that is never referenced. This means that even though it
+            will be bound to a particular value, that value is never used.
+
+            The Haskell syntax offers the underscore as a wild binder that
+            cannot even be referenced (It can be seen as introducing a new,
+            anonymous, binder everytime it is used).
+            
+            In these transformations, the term wild binder will sometimes be
+            used to indicate that a binder must not be referenced.
+          \stopframedtext
+        }
+
+      \subsubsection{Scrutinee binder removal}
+        This transformation removes (or rather, makes wild) the binder to
+        which the scrutinee is bound after evaluation. This is done by
+        replacing the bndr with the scrutinee in all alternatives. To prevent
+        duplication of work, this transformation is only applied when the
+        scrutinee is already a simple variable reference (but the previous
+        transformation ensures this will eventually be the case). The
+        scrutinee binder itself is replaced by a wild binder (which is no
+        longer displayed).
+
+        Note that one could argue that this transformation can change the
+        meaning of the Core expression. In the regular Core semantics, a case
+        expression forces the evaluation of its scrutinee and can be used to
+        implement strict evaluation. However, in the generated \VHDL,
+        evaluation is always strict. So the semantics we assign to the Core
+        expression (which differ only at this particular point), this
+        transformation is completely valid.
+
+        \starttrans
+        case x of bndr
+          alts
+        -----------------        \lam{x} is a local variable reference
+        case x of
+          alts[bndr=>x]
+        \stoptrans
+
+        \startbuffer[from]
+        case x of y
+          True -> y
+          False -> not y
+        \stopbuffer
+
+        \startbuffer[to]
+        case x of
+          True -> x
+          False -> not x
+        \stopbuffer
+
+        \transexample{scrutbndrremove}{Scrutinee binder removal}{from}{to}
+
+      \subsubsection{Case normalization}
+        This transformation ensures that all case expressions get a form
+        that is allowed by the intended normal form. This means they
+        will become one of:
+
+        \startitemize
+        \item An extractor case with a single alternative that picks a field
+        from a datatype, \eg\ \lam{case x of (a, b) -> a}.
+        \item A selector case with multiple alternatives and only wild binders, that
+        makes a choice between expressions based on the constructor of another
+        expression, \eg\ \lam{case x of Low -> a; High -> b}.
+        \stopitemize
+
+        For an arbitrary case, that has \lam{n} alternatives, with
+        \lam{m} binders in each alternatives, this will result in \lam{m
+        * n} extractor case expression to get at each variable, \lam{n}
+        let bindings for each of the alternatives' value and a single
+        selector case to select the right value out of these.
+
+        Technically, the defintion of this transformation would require
+        that the constructor for every alternative has exactly the same
+        amount (\lam{m}) of arguments, but of course this transformation
+        also applies when this is not the case.
          
-        As we have seen with user-defined functions, these arguments can
-        always be reduced to a simple variable reference, by the
-        argument extraction transform. Performing this transform for
-        builtin functions as well, means that the translation of builtin
-        functions can be limited to signal references, instead of
-        needing to support all possible expressions.
-
-  \item Arguments with a function type.
+        \starttrans
+        case E of
+          C0 v0,0 ... v0,m -> E0
+          \vdots
+          Cn vn,0 ... vn,m -> En
+        --------------------------------------------------- \lam{\forall i \forall j, 0 ≤ i ≤ n, 0 ≤ i < m} (\lam{wi,j} is a wild (unused) binder)
+        letrec                                              The case expression is not an extractor case
+          v0,0 = case E of C0 x0,0 .. x0,m -> x0,0          The case expression is not a selector case
+          \vdots
+          v0,m = case E of C0 x0,0 .. x0,m -> x0,m
+          \vdots
+          vn,m = case E of Cn xn,0 .. xn,m -> xn,m
+          y0 = E0
+          \vdots
+          yn = En
+        in
+          case E of
+            C0 w0,0 ... w0,m -> y0
+            \vdots
+            Cn wn,0 ... wn,m -> yn
+        \stoptrans
+
+        Note that this transformation applies to case expressions with any
+        scrutinee. If the scrutinee is a complex expression, this might
+        result in duplication of work (hardware). An extra condition to
+        only apply this transformation when the scrutinee is already
+        simple (effectively causing this transformation to be only
+        applied after the scrutinee simplification transformation) might
+        be in order. 
+
+        \startbuffer[from]
+        case a of
+          True -> add b 1
+          False -> add b 2
+        \stopbuffer
+
+        \startbuffer[to]
+        letrec
+          x0 = add b 1
+          x1 = add b 2
+        in
+          case a of
+            True -> x0
+            False -> x1
+        \stopbuffer
+
+        \transexample{selcasesimpl}{Selector case simplification}{from}{to}
+
+        \startbuffer[from]
+        case a of
+          (,) b c -> add b c
+        \stopbuffer
+        \startbuffer[to]
+        letrec
+          b = case a of (,) b c -> b
+          c = case a of (,) b c -> c
+          x0 = add b c
+        in
+          case a of
+            (,) w0 w1 -> x0
+        \stopbuffer
+
+        \transexample{excasesimpl}{Extractor case simplification}{from}{to}
+
+        \refdef{selector case}
+        In \in{example}[ex:trans:excasesimpl] the case expression is expanded
+        into multiple case expressions, including a pretty useless expression
+        (that is neither a selector or extractor case). This case can be
+        removed by the Case removal transformation in
+        \in{section}[sec:transformation:caseremoval].
+
+      \subsubsection[sec:transformation:caseremoval]{Case removal}
+        This transform removes any case expression with a single alternative and
+        only wild binders.\refdef{wild binders}
+
+        These "useless" case expressions are usually leftovers from case simplification
+        on extractor case (see the previous example).
+
+        \starttrans
+        case x of
+          C v0 ... vm -> E
+        ----------------------     \lam{\forall i, 0 ≤ i ≤ m} (\lam{vi} does not occur free in E)
+        E
+        \stoptrans
+
+        \startbuffer[from]
+        case a of
+          (,) w0 w1 -> x0
+        \stopbuffer
+
+        \startbuffer[to]
+        x0
+        \stopbuffer
+
+        \transexample{caserem}{Case removal}{from}{to}
+
+    \subsection[sec:normalization:nonrep]{Removing unrepresentable values}
+      The transformations in this section are aimed at making all the
+      values used in our expression representable. There are two main
+      transformations that are applied to \emph{all} unrepresentable let
+      bindings and function arguments. These are meant to address three
+      different kinds of unrepresentable values: polymorphic values,
+      higher-order values and literals. The transformation are described
+      generically: they apply to all non-representable values. However,
+      non-representable values that do not fall into one of these three
+      categories will be moved around by these transformations but are
+      unlikely to completely disappear. They usually mean the program was not
+      valid in the first place, because unsupported types were used (for
+      example, a program using strings).
+     
+      Each of these three categories will be detailed below, followed by the
+      actual transformations.
+
+      \subsubsection{Removing Polymorphism}
+        As noted in \in{section}[sec:prototype:polymporphism],
+        polymorphism is made explicit in Core through type and
+        dictionary arguments. To remove the polymorphism from a
+        function, we can simply specialize the polymorphic function for
+        the particular type applied to it. The same goes for dictionary
+        arguments. To remove polymorphism from let bound values, we
+        simply inline the let bindings that have a polymorphic type,
+        which should (eventually) make sure that the polymorphic
+        expression is applied to a type and/or dictionary, which can
+        then be removed by β-reduction (\in{section}[sec:normalization:beta]).
+
+        Since both type and dictionary arguments are not representable,
+        \refdef{representable}
+        the non-representable argument specialization and
+        non-representable let binding inlining transformations below
+        take care of exactly this.
+
+        There is one case where polymorphism cannot be completely
+        removed: built-in functions are still allowed to be polymorphic
+        (Since we have no function body that we could properly
+        specialize). However, the code that generates \VHDL\ for built-in
+        functions knows how to handle this, so this is not a problem.
+
+      \subsubsection[sec:normalization:defunctionalization]{Defunctionalization}
+        These transformations remove higher-order expressions from our
+        program, making all values first-order. The approach used for
+        defunctionalization uses a combination of specialization, inlining and
+        some cleanup transformations, was also proposed in parallel research
+        by Neil Mitchell \cite[mitchell09].
+      
+        Higher order values are always introduced by lambda abstractions, none
+        of the other Core expression elements can introduce a function type.
+        However, other expressions can \emph{have} a function type, when they
+        have a lambda expression in their body. 
          
-        These arguments are functions passed to higher order builtins,
-        like \lam{map} and \lam{foldl}. Since implementing these
-        functions for arbitrary function-typed expressions (\eg, lambda
-        expressions) is rather comlex, we reduce these arguments to
-        (partial applications of) global functions.
+        For example, the following expression is a higher-order expression
+        that is not a lambda expression itself:
          
-        We can still support arbitrary expressions from the user code,
-        by creating a new global function containing that expression.
-        This way, we can simply replace the argument with a reference to
-        that new function. However, since the expression can contain any
-        number of free variables we also have to include partial
-        applications in our normal form.
-
-        This category of arguments is handled by the function extraction
-        transform.
-  \item Other unrepresentable arguments.
+        \refdef{id function}
+        \startlambda
+          case x of
+            High -> id
+            Low -> λx.x
+        \stoplambda
+
+        The reference to the \lam{id} function shows that we can introduce a
+        higher-order expression in our program without using a lambda
+        expression directly. However, inside the definition of the \lam{id}
+        function, we can be sure that a lambda expression is present.
          
-        These arguments can take a few different forms:
-        \startdesc{Type arguments}
-          In the core language, type arguments can only take a single
-          form: A type wrapped in the Type constructor. Also, there is
-          nothing that can be done with type expressions, except for
-          applying functions to them, so we can simply leave type
-          arguments as they are.
-        \stopdesc
-        \startdesc{Dictionary arguments}
-          In the core language, dictionary arguments are used to find
-          operations operating on one of the type arguments (mostly for
-          finding class methods). Since we will not actually evaluatie
-          the function body for builtin functions and can generate
-          code for builtin functions by just looking at the type
-          arguments, these arguments can be ignored and left as they
-          are.
-        \stopdesc
-        \startdesc{Type level arguments}
-          Sometimes, we want to pass a value to a builtin function, but
-          we need to know the value at compile time. Additionally, the
-          value has an impact on the type of the function. This is
-          encoded using type-level values, where the actual value of the
-          argument is not important, but the type encodes some integer,
-          for example. Since the value is not important, the actual form
-          of the expression does not matter either and we can leave
-          these arguments as they are.
-        \stopdesc
-        \startdesc{Other arguments}
-          Technically, there is still a wide array of arguments that can
-          be passed, but does not fall into any of the above categories.
-          However, none of the supported builtin functions requires such
-          an argument. This leaves use with passing unsupported types to
-          a function, such as calling \lam{head} on a list of functions.
-
-          In these cases, it would be impossible to generate hardware
-          for such a function call anyway, so we can ignore these
-          arguments.
-
-          The only way to generate hardware for builtin functions with
-          arguments like these, is to expand the function call into an
-          equivalent core expression (\eg, expand map into a series of
-          function applications). But for now, we choose to simply not
-          support expressions like these.
-        \stopdesc
-
-        From the above, we can conclude that we can simply ignore these
-        other unrepresentable arguments and focus on the first two
-        categories instead.
-\stopitemize
-
-\subsubsection{Argument extraction}
-This transform deals with arguments to functions that
-are of a runtime representable type. 
-
-TODO: It seems we can map an expression to a port, not only a signal.
-Perhaps this makes this transformation not needed?
-TODO: Say something about dataconstructors (without arguments, like True
-or False), which are variable references of a runtime representable
-type, but do not result in a signal.
-
-To reduce a complex expression to a simple variable reference, we create
-a new let expression around the application, which binds the complex
-expression to a new variable. The original function is then applied to
-this variable.
-
-%\transform{Argument extract}
-%{
-%\lam{Y} is of a hardware representable type
-%
-%\lam{Y} is not a variable referene
-%
-%\conclusion
-%
-%\trans{X Y}{let z = Y in X z}
-%}
-
-\subsubsection{Function extraction}
-This transform deals with function-typed arguments to builtin functions.
-Since these arguments cannot be propagated, we choose to extract them
-into a new global function instead.
-
-Any free variables occuring in the extracted arguments will become
-parameters to the new global function. The original argument is replaced
-with a reference to the new function, applied to any free variables from
-the original argument.
-
-%\transform{Function extraction}
-%{
-%\lam{X} is a (partial application of) a builtin function
-%
-%\lam{Y} is not an application
-%
-%\lam{Y} is not a variable reference
-%
-%\conclusion
-%
-%\lam{f0 ... fm} = free local vars of \lam{Y}
-%
-%\lam{y} is a new global variable
-%
-%\lam{y = λf0 ... fn.Y}
-%
-%\trans{X Y}{X (y f0 ... fn)}
-%}
-
-\subsubsection{Argument propagation}
-This transform deals with arguments to user-defined functions that are
-not representable at runtime. This means these arguments cannot be
-preserved in the final form and most be {\em propagated}.
-
-Propagation means to create a specialized version of the called
-function, with the propagated argument already filled in. As a simple
-example, in the following program:
-
-\startlambda
-f = λa.λb.a + b
-inc = λa.f a 1
-\stoplambda
-
-we could {\em propagate} the constant argument 1, with the following
-result:
-
-\startlambda
-f' = λa.a + 1
-inc = λa.f' a
-\stoplambda
-
-Special care must be taken when the to-be-propagated expression has any
-free variables. If this is the case, the original argument should not be
-removed alltogether, but replaced by all the free variables of the
-expression. In this way, the original expression can still be evaluated
-inside the new function. Also, this brings us closer to our goal: All
-these free variables will be simple variable references.
-
-To prevent us from propagating the same argument over and over, a simple
-local variable reference is not propagated (since is has exactly one
-free variable, itself, we would only replace that argument with itself).
-
-This shows that any free local variables that are not runtime representable
-cannot be brought into normal form by this transform. We rely on an
-inlining transformation to replace such a variable with an expression we
-can propagate again.
-
-TODO: Move these definitions somewhere sensible.
-
-Definition: A global variable is any variable that is bound at the
-top level of a program. A local variable is any other variable.
-
-Definition: A hardware representable type is a type that we can generate
-a signal for in hardware. For example, a bit, a vector of bits, a 32 bit
-unsigned word, etc. Types that are not runtime representable notably
-include (but are not limited to): Types, dictionaries, functions.
-
-Definition: A builtin function is a function for which a builtin
-hardware translation is available, because its actual definition is not
-translatable. A user-defined function is any other function.
-
-\starttrans
-x = E
-~
-x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
----------------------------------------------    \lam{Yi} is not a local variable reference
-x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn .        \lam{f0 ... fm} = free local vars of \lam{Y_i}
-      E y0 ... yi-1 Yi yi+1 ... yn   
-~
-x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn
-\stoptrans
-
-%\transform{Argument propagation}
-%{
-%\lam{x} is a global variable, bound to a user-defined function
-%
-%\lam{x = E}
-%
-%\lam{Y_i} is not of a runtime representable type
-%
-%\lam{Y_i} is not a local variable reference
-%
-%\conclusion
-%
-%\lam{f0 ... fm} = free local vars of \lam{Y_i}
-%
-%\lam{x'} is a new global variable
-%
-%\lam{x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn . E y0 ... yi-1 Yi yi+1 ... yn}
-%
-%\trans{x Y0 ... Yi ... Yn}{x' y0 ... yi-1 f0 ... fm Yi+1 ... Yn}
-%}
-%
-%TODO: The above definition looks too complicated... Can we find
-%something more concise?
-
-\subsection{Cast propagation}
-This transform pushes casts down into the expression as far as possible.
-\subsection{Let recursification}
-This transform makes all lets recursive.
-\subsection{Let simplification}
-This transform makes the result value of all let expressions a simple
-variable reference.
-\subsection{Let flattening}
-This transform turns two nested lets (\lam{let x = (let ... in ...) in
-...}) into a single let.
-\subsection{Simple let binding removal}
-This transforms inlines simple let bindings (\eg a = b).
-\subsection{Function inlining}
-This transform inlines let bindings of a funtion type. TODO: This should
-be generelized to anything that is non representable at runtime, or
-something like that.
-\subsection{Scrutinee simplification}
-This transform ensures that the scrutinee of a case expression is always
-a simple variable reference.
-\subsection{Case binder wildening}
-This transform replaces all binders of a each case alternative with a
-wild binder (\ie, one that is never referred to). This will possibly
-introduce a number of new "selector" case statements, that only select
-one element from an algebraic datatype and bind it to a variable.
-\subsection{Case value simplification}
-This transform simplifies the result value of each case alternative by
-binding the value in a let expression and replacing the value by a
-simple variable reference.
-\subsection{Case removal}
-This transform removes any case statements with a single alternative and
-only wild binders.
-
-\subsection{Example sequence}
-
-This section lists an example expression, with a sequence of transforms
-applied to it. The exact transforms given here probably don't exactly
-match the transforms given above anymore, but perhaps this can clarify
-the big picture a bit.
-
-TODO: Update or remove this section.
-
-\startlambda
-  λx.
-    let s = foo x
-    in
-      case s of
-        (a, b) ->
-          case a of
-            High -> add
-            Low -> let
-              op' = case b of
-                High -> sub
-                Low  -> λc.λd.c
-              in
-                λc.λd.op' d c
-\stoplambda
-
-After top-level η-abstraction:
-
-\startlambda
-  λx.λc.λd.
-    (let s = foo x
-    in
-      case s of
-        (a, b) ->
-          case a of
-            High -> add
-            Low -> let
-              op' = case b of
-                High -> sub
-                Low  -> λc.λd.c
-              in
-                λc.λd.op' d c
-    ) c d
-\stoplambda
-
-After (extended) β-reduction:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-    in
-      case s of
-        (a, b) ->
-          case a of
-            High -> add c d
-            Low -> let
-              op' = case b of
-                High -> sub
-                Low  -> λc.λd.c
-              in
-                op' d c
-\stoplambda
-
-After return value extraction:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        r = case s of
-              (a, b) ->
-                case a of
-                  High -> add c d
-                  Low -> let
-                    op' = case b of
-                      High -> sub
-                      Low  -> λc.λd.c
-                    in
-                      op' d c
-    in
-      r
-\stoplambda
-
-Scrutinee simplification does not apply.
-
-After case binder wildening:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) ->
-              case a of
-                High -> add c d
-                Low -> let op' = case b of
-                             High -> sub
-                             Low  -> λc.λd.c
-                       in
-                         op' d c
-    in
-      r
-\stoplambda
-
-After case value simplification
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = let rll = λc.λd.c
-                 op' = case b of
-                   High -> sub
-                   Low  -> rll
-             in
-               op' d c
-        r' = case a of
-               High -> rh
-               Low -> rl
-    in
-      r
-\stoplambda
-
-After let flattening:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = op' d c
-        rll = λc.λd.c
-        op' = case b of
-          High -> sub
-          Low  -> rll
-        r' = case a of
-               High -> rh
-               Low -> rl
-    in
-      r
-\stoplambda
-
-After function inlining:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = (case b of
-          High -> sub
-          Low  -> λc.λd.c) d c
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
-
-After (extended) β-reduction again:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = case b of
-          High -> sub d c
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
-
-After case value simplification again:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rlh = sub d c
-        rl = case b of
-          High -> rlh
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
-
-After case removal:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = r'
-        rh = add c d
-        rlh = sub d c
-        rl = case b of
-          High -> rlh
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
-
-After let bind removal:
-
-\startlambda
-  λx.λc.λd.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        rh = add c d
-        rlh = sub d c
-        rl = case b of
-          High -> rlh
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r'
-\stoplambda
-
-Application simplification is not applicable.
+        Looking closely at the definition of our normal form in
+        \in{section}[sec:normalization:intendednormalform], we can see that
+        there are three possibilities for higher-order values to appear in our
+        intended normal form:
+
+        \startitemize[KR]
+          \item[item:toplambda] Lambda abstractions can appear at the highest level of a
+          top level function. These lambda abstractions introduce the
+          arguments (input ports / current state) of the function.
+          \item[item:built-inarg] (Partial applications of) top level functions can appear as an
+          argument to a built-in function.
+          \item[item:completeapp] (Partial applications of) top level functions can appear in
+          function position of an application. Since a partial application
+          cannot appear anywhere else (except as built-in function arguments),
+          all partial applications are applied, meaning that all applications
+          will become complete applications. However, since application of
+          arguments happens one by one, in the expression:
+          \startlambda
+            f 1 2
+          \stoplambda
+          the subexpression \lam{f 1} has a function type. But this is
+          allowed, since it is inside a complete application.
+        \stopitemize
+
+        We will take a typical function with some higher-order values as an
+        example. The following function takes two arguments: a \lam{Bit} and a
+        list of numbers. Depending on the first argument, each number in the
+        list is doubled, or the list is returned unmodified. For the sake of
+        the example, no polymorphism is shown. In reality, at least map would
+        be polymorphic.
+
+        \startlambda
+        λy.let double = λx. x + x in
+             case y of
+                Low -> map double
+                High -> λz. z
+        \stoplambda
+
+        This example shows a number of higher-order values that we cannot
+        translate to \VHDL\ directly. The \lam{double} binder bound in the let
+        expression has a function type, as well as both of the alternatives of
+        the case expression. The first alternative is a partial application of
+        the \lam{map} built-in function, whereas the second alternative is a
+        lambda abstraction.
+
+        To reduce all higher-order values to one of the above items, a number
+        of transformations we have already seen are used. The η-expansion
+        transformation from \in{section}[sec:normalization:eta] ensures all
+        function arguments are introduced by lambda abstraction on the highest
+        level of a function. These lambda arguments are allowed because of
+        \in{item}[item:toplambda] above. After η-expansion, our example
+        becomes a bit bigger:
+
+        \startlambda
+        λy.λq.(let double = λx. x + x in
+                 case y of
+                   Low -> map double
+                   High -> λz. z
+              ) q
+        \stoplambda
+
+        η-expansion also introduces extra applications (the application of
+        the let expression to \lam{q} in the above example). These
+        applications can then propagated down by the application propagation
+        transformation (\in{section}[sec:normalization:appprop]). In our
+        example, the \lam{q} and \lam{r} variable will be propagated into the
+        let expression and then into the case expression:
+
+        \startlambda
+        λy.λq.let double = λx. x + x in
+                case y of
+                  Low -> map double q
+                  High -> (λz. z) q
+        \stoplambda
+        
+        This propagation makes higher-order values become applied (in
+        particular both of the alternatives of the case now have a
+        representable type). Completely applied top level functions (like the
+        first alternative) are now no longer invalid (they fall under
+        \in{item}[item:completeapp] above). (Completely) applied lambda
+        abstractions can be removed by β-expansion. For our example,
+        applying β-expansion results in the following:
+
+        \startlambda
+        λy.λq.let double = λx. x + x in
+                case y of
+                  Low -> map double q
+                  High -> q
+        \stoplambda
+
+        As you can see in our example, all of this moves applications towards
+        the higher-order values, but misses higher-order functions bound by
+        let expressions. The applications cannot be moved towards these values
+        (since they can be used in multiple places), so the values will have
+        to be moved towards the applications. This is achieved by inlining all
+        higher-order values bound by let applications, by the
+        non-representable binding inlining transformation below. When applying
+        it to our example, we get the following:
+        
+        \startlambda
+        λy.λq.case y of
+                Low -> map (λx. x + x) q
+                High -> q
+        \stoplambda
+
+        We have nearly eliminated all unsupported higher-order values from this
+        expressions. The one that is remaining is the first argument to the
+        \lam{map} function. Having higher-order arguments to a built-in
+        function like \lam{map} is allowed in the intended normal form, but
+        only if the argument is a (partial application) of a top level
+        function. This is easily done by introducing a new top level function
+        and put the lambda abstraction inside. This is done by the function
+        extraction transformation from
+        \in{section}[sec:normalization:funextract].
+
+        \startlambda
+        λy.λq.case y of
+                Low -> map func q
+                High -> q
+        \stoplambda
+
+        This also introduces a new function, that we have called \lam{func}:
+
+        \startlambda
+        func = λx. x + x
+        \stoplambda
+
+        Note that this does not actually remove the lambda, but now it is a
+        lambda at the highest level of a function, which is allowed in the
+        intended normal form.
+
+        There is one case that has not been discussed yet. What if the
+        \lam{map} function in the example above was not a built-in function
+        but a user-defined function? Then extracting the lambda expression
+        into a new function would not be enough, since user-defined functions
+        can never have higher-order arguments. For example, the following
+        expression shows an example:
+
+        \startlambda
+        twice :: (Word -> Word) -> Word -> Word
+        twice = λf.λa.f (f a)
+
+        main = λa.app (λx. x + x) a
+        \stoplambda
+
+        This example shows a function \lam{twice} that takes a function as a
+        first argument and applies that function twice to the second argument.
+        Again, we have made the function monomorphic for clarity, even though
+        this function would be a lot more useful if it was polymorphic. The
+        function \lam{main} uses \lam{twice} to apply a lambda epression twice.
+
+        When faced with a user defined function, a body is available for that
+        function. This means we could create a specialized version of the
+        function that only works for this particular higher-order argument
+        (\ie, we can just remove the argument and call the specialized
+        function without the argument). This transformation is detailed below.
+        Applying this transformation to the example gives:
+
+        \startlambda
+        twice' :: Word -> Word
+        twice' = λb.(λf.λa.f (f a)) (λx. x + x) b
+
+        main = λa.app' a
+        \stoplambda
+
+        The \lam{main} function is now in normal form, since the only
+        higher-order value there is the top level lambda expression. The new
+        \lam{twice'} function is a bit complex, but the entire original body
+        of the original \lam{twice} function is wrapped in a lambda
+        abstraction and applied to the argument we have specialized for
+        (\lam{λx. x + x}) and the other arguments. This complex expression can
+        fortunately be effectively reduced by repeatedly applying β-reduction: 
+
+        \startlambda
+        twice' :: Word -> Word
+        twice' = λb.(b + b) + (b + b)
+        \stoplambda
+
+        This example also shows that the resulting normal form might not be as
+        efficient as we might hope it to be (it is calculating \lam{(b + b)}
+        twice). This is discussed in more detail in
+        \in{section}[sec:normalization:duplicatework].
+
+      \subsubsection{Literals}
+        There are a limited number of literals available in Haskell and Core.
+        \refdef{enumerated types} When using (enumerating) algebraic
+        datatypes, a literal is just a reference to the corresponding data
+        constructor, which has a representable type (the algebraic datatype)
+        and can be translated directly. This also holds for literals of the
+        \hs{Bool} Haskell type, which is just an enumerated type.
+
+        There is, however, a second type of literal that does not have a
+        representable type: integer literals. Cλash supports using integer
+        literals for all three integer types supported (\hs{SizedWord},
+        \hs{SizedInt} and \hs{RangedWord}). This is implemented using
+        Haskell's \hs{Num} type class, which offers a \hs{fromInteger} method
+        that converts any \hs{Integer} to the Cλash datatypes.
+
+        When \GHC\ sees integer literals, it will automatically insert calls to
+        the \hs{fromInteger} method in the resulting Core expression. For
+        example, the following expression in Haskell creates a 32 bit unsigned
+        word with the value 1. The explicit type signature is needed, since
+        there is no context for \GHC\ to determine the type from otherwise.
+
+        \starthaskell
+        1 :: SizedWord D32
+        \stophaskell
+
+        This Haskell code results in the following Core expression:
+
+        \startlambda
+        fromInteger @(SizedWord D32) \$dNum (smallInteger 10)
+        \stoplambda
+
+        The literal 10 will have the type \lam{GHC.Prim.Int\#}, which is
+        converted into an \lam{Integer} by \lam{smallInteger}. Finally, the
+        \lam{fromInteger} function will finally convert this into a
+        \lam{SizedWord D32}.
+
+        Both the \lam{GHC.Prim.Int\#} and \lam{Integer} types are not
+        representable, and cannot be translated directly. Fortunately, there
+        is no need to translate them, since \lam{fromInteger} is a built-in
+        function that knows how to handle these values. However, this does
+        require that the \lam{fromInteger} function is directly applied to
+        these non-representable literal values, otherwise errors will occur.
+        For example, the following expression is not in the intended normal
+        form, since one of the let bindings has an unrepresentable type
+        (\lam{Integer}):
+
+        \startlambda
+        let l = smallInteger 10 in fromInteger @(SizedWord D32) \$dNum l
+        \stoplambda
+
+        By inlining these let-bindings, we can ensure that unrepresentable
+        literals bound by a let binding end up in an application of the
+        appropriate built-in function, where they are allowed. Since it is
+        possible that the application of that function is in a different
+        function than the definition of the literal value, we will always need
+        to specialize away any unrepresentable literals that are used as
+        function arguments. The following two transformations do exactly this.
+
+      \subsubsection[sec:normalization:nonrepinline]{Non-representable binding inlining}
+        This transform inlines let bindings that are bound to a
+        non-representable value. Since we can never generate a signal
+        assignment for these bindings (we cannot declare a signal assignment
+        with a non-representable type, for obvious reasons), we have no choice
+        but to inline the binding to remove it.
+
+        As we have seen in the previous sections, inlining these bindings
+        solves (part of) the polymorphism, higher-order values and
+        unrepresentable literals in an expression.
+
+        \refdef{substitution notation}
+        \starttrans
+        letrec 
+          a0 = E0
+          \vdots
+          ai = Ei
+          \vdots
+          an = En
+        in
+          M
+        --------------------------    \lam{Ei} has a non-representable type.
+        letrec
+          a0 = E0 [ai=>Ei] \vdots
+          ai-1 = Ei-1 [ai=>Ei]
+          ai+1 = Ei+1 [ai=>Ei]
+          \vdots
+          an = En [ai=>Ei]
+        in
+          M[ai=>Ei]
+        \stoptrans
+
+        \startbuffer[from]
+        letrec
+          a = smallInteger 10
+          inc = λb -> add b 1
+          inc' = add 1
+          x = fromInteger a 
+        in
+          inc (inc' x)
+        \stopbuffer
+
+        \startbuffer[to]
+        letrec
+          x = fromInteger (smallInteger 10)
+        in
+          (λb -> add b 1) (add 1 x)
+        \stopbuffer
+
+        \transexample{nonrepinline}{Nonrepresentable binding inlining}{from}{to}
+
+      \subsubsection[sec:normalization:specialize]{Function specialization}
+        This transform removes arguments to user-defined functions that are
+        not representable at runtime. This is done by creating a
+        \emph{specialized} version of the function that only works for one
+        particular value of that argument (in other words, the argument can be
+        removed).
+
+        Specialization means to create a specialized version of the called
+        function, with one argument already filled in. As a simple example, in
+        the following program (this is not actual Core, since it directly uses
+        a literal with the unrepresentable type \lam{GHC.Prim.Int\#}).
+
+        \startlambda
+        f = λa.λb.a + b
+        inc = λa.f a 1
+        \stoplambda
+
+        We could specialize the function \lam{f} against the literal argument
+        1, with the following result:
+
+        \startlambda
+        f' = λa.a + 1
+        inc = λa.f' a
+        \stoplambda
+
+        In some way, this transformation is similar to β-reduction, but it
+        operates across function boundaries. It is also similar to
+        non-representable let binding inlining above, since it sort of
+        \quote{inlines} an expression into a called function.
+
+        Special care must be taken when the argument has any free variables.
+        If this is the case, the original argument should not be removed
+        completely, but replaced by all the free variables of the expression.
+        In this way, the original expression can still be evaluated inside the
+        new function.
+
+        To prevent us from propagating the same argument over and over, a
+        simple local variable reference is not propagated (since is has
+        exactly one free variable, itself, we would only replace that argument
+        with itself).
+
+        This shows that any free local variables that are not runtime
+        representable cannot be brought into normal form by this transform. We
+        rely on an inlining or β-reduction transformation to replace such a
+        variable with an expression we can propagate again.
+
+        \starttrans
+        x = E
+        ~
+        x Y0 ... Yi ... Yn                               \lam{Yi} is not representable
+        ---------------------------------------------    \lam{Yi} is not a local variable reference
+        x' Y0 ... Yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} are all free local vars of \lam{Yi}
+        ~                                                \lam{T0 ... Tn} are the types of \lam{Y0 ... Yn}
+        x' = λ(y0 :: T0) ... λ(yi-1 :: Ty-1). 
+             λf0 ... λfm.
+             λ(yi+1 :: Ty+1) ...  λ(yn :: Tn).       
+               E y0 ... yi-1 Yi yi+1 ... yn   
+        \stoptrans
+
+        This is a bit of a complex transformation. It transforms an
+        application of the function \lam{x}, where one of the arguments
+        (\lam{Y_i}) is not representable. A new
+        function \lam{x'} is created that wraps the body of the old function.
+        The body of the new function becomes a number of nested lambda
+        abstractions, one for each of the original arguments that are left
+        unchanged.
+        
+        The ith argument is replaced with the free variables of
+        \lam{Y_i}. Note that we reuse the same binders as those used in
+        \lam{Y_i}, since we can then just use \lam{Y_i} inside the new
+        function body and have all of the variables it uses be in scope.
+
+        The argument that we are specializing for, \lam{Y_i}, is put inside
+        the new function body. The old function body is applied to it. Since
+        we use this new function only in place of an application with that
+        particular argument \lam{Y_i}, behaviour should not change.
+        
+        Note that the types of the arguments of our new function are taken
+        from the types of the \emph{actual} arguments (\lam{T0 ... Tn}). This
+        means that any polymorphism in the arguments is removed, even when the
+        corresponding explicit type lambda is not removed
+        yet.
+
+        \todo{Examples. Perhaps reference the previous sections}
+
+  \section{Unsolved problems}
+    The above system of transformations has been implemented in the prototype
+    and seems to work well to compile simple and more complex examples of
+    hardware descriptions. \todo{Ref christiaan?} However, this normalization
+    system has not seen enough review and work to be complete and work for
+    every Core expression that is supplied to it. A number of problems
+    have already been identified and are discussed in this section.
+
+    \subsection[sec:normalization:duplicatework]{Work duplication}
+        A possible problem of β-reduction is that it could duplicate work.
+        When the expression applied is not a simple variable reference, but
+        requires calculation and the binder the lambda abstraction binds to
+        is used more than once, more hardware might be generated than strictly
+        needed. 
+
+        As an example, consider the expression:
+
+        \startlambda
+        (λx. x + x) (a * b)
+        \stoplambda
+
+        When applying β-reduction to this expression, we get:
+
+        \startlambda
+        (a * b) + (a * b)
+        \stoplambda
+
+        which of course calculates \lam{(a * b)} twice.
+        
+        A possible solution to this would be to use the following alternative
+        transformation, which is of course no longer normal β-reduction. The
+        followin transformation has not been tested in the prototype, but is
+        given here for future reference:
+
+        \starttrans
+        (λx.E) M
+        -----------------
+        letrec x = M in E
+        \stoptrans
+        
+        This does not seem like much of an improvement, but it does get rid of
+        the lambda expression (and the associated higher-order value), while
+        at the same time introducing a new let binding. Since the result of
+        every application or case expression must be bound by a let expression
+        in the intended normal form anyway, this is probably not a problem. If
+        the argument happens to be a variable reference, then simple let
+        binding removal (\in{section}[sec:normalization:simplelet]) will
+        remove it, making the result identical to that of the original
+        β-reduction transformation.
+
+        When also applying argument simplification to the above example, we
+        get the following expression:
+
+        \startlambda
+        let y = (a * b)
+            z = (a * b)
+        in y + z
+        \stoplambda
+
+        Looking at this, we could imagine an alternative approach: create a
+        transformation that removes let bindings that bind identical values.
+        In the above expression, the \lam{y} and \lam{z} variables could be
+        merged together, resulting in the more efficient expression:
+
+        \startlambda
+        let y = (a * b) in y + y
+        \stoplambda
+
+      \subsection[sec:normalization:non-determinism]{Non-determinism}
+        As an example, again consider the following expression:
+
+        \startlambda
+        (λx. x + x) (a * b)
+        \stoplambda
+
+        We can apply both β-reduction (\in{section}[sec:normalization:beta])
+        as well as argument simplification
+        (\in{section}[sec:normalization:argsimpl]) to this expression.
+
+        When applying argument simplification first and then β-reduction, we
+        get the following expression:
+
+        \startlambda
+        let y = (a * b) in y + y
+        \stoplambda
+
+        When applying β-reduction first and then argument simplification, we
+        get the following expression:
+
+        \startlambda
+        let y = (a * b)
+            z = (a * b)
+        in y + z
+        \stoplambda
+
+        As you can see, this is a different expression. This means that the
+        order of expressions, does in fact change the resulting normal form,
+        which is something that we would like to avoid. In this particular
+        case one of the alternatives is even clearly more efficient, so we
+        would of course like the more efficient form to be the normal form.
+
+        For this particular problem, the solutions for duplication of work
+        seem from the previous section seem to fix the determinism of our
+        transformation system as well. However, it is likely that there are
+        other occurences of this problem.
+
+      \subsection[sec:normalization:castproblems]{Casts}
+        We do not fully understand the use of cast expressions in Core, so
+        there are probably expressions involving cast expressions that cannot
+        be brought into intended normal form by this transformation system.
+
+        The uses of casts in the Core system should be investigated more and
+        transformations will probably need updating to handle them in all
+        cases.
+
+      \subsection[sec:normalization:stateproblems]{Normalization of stateful descriptions}
+        Currently, the intended normal form definition\refdef{intended
+        normal form definition} offers enough freedom to describe all
+        valid stateful descriptions, but is not limiting enough. It is
+        possible to write descriptions which are in intended normal
+        form, but cannot be translated into \VHDL\ in a meaningful way
+        (\eg, a function that swaps two substates in its result, or a
+        function that changes a substate itself instead of passing it to
+        a subfunction).
+
+        It is now up to the programmer to not do anything funny with
+        these state values, whereas the normalization just tries not to
+        mess up the flow of state values. In practice, there are
+        situations where a Core program that \emph{could} be a valid
+        stateful description is not translateable by the prototype. This
+        most often happens when statefulness is mixed with pattern
+        matching, causing a state input to be unpacked multiple times or
+        be unpacked and repacked only in some of the code paths.
+
+        Without going into detail about the exact problems (of which
+        there are probably more than have shown up so far), it seems
+        unlikely that these problems can be solved entirely by just
+        improving the \VHDL\ state generation in the final stage. The
+        normalization stage seems the best place to apply the rewriting
+        needed to support more complex stateful descriptions. This does
+        of course mean that the intended normal form definition must be
+        extended as well to be more specific about how state handling
+        should look like in normal form.
+        \in{Section}[sec:prototype:statelimits] already contains a
+        tight description of the limitations on the use of state
+        variables, which could be adapted into the intended normal form.
+
+  \section[sec:normalization:properties]{Provable properties}
+    When looking at the system of transformations outlined above, there are a
+    number of questions that we can ask ourselves. The main question is of course:
+    \quote{Does our system work as intended?}. We can split this question into a
+    number of subquestions:
+
+    \startitemize[KR]
+    \item[q:termination] Does our system \emph{terminate}? Since our system will
+    keep running as long as transformations apply, there is an obvious risk that
+    it will keep running indefinitely. This typically happens when one
+    transformation produces a result that is transformed back to the original
+    by another transformation, or when one or more transformations keep
+    expanding some expression.
+    \item[q:soundness] Is our system \emph{sound}? Since our transformations
+    continuously modify the expression, there is an obvious risk that the final
+    normal form will not be equivalent to the original program: its meaning could
+    have changed.
+    \item[q:completeness] Is our system \emph{complete}? Since we have a complex
+    system of transformations, there is an obvious risk that some expressions will
+    not end up in our intended normal form, because we forgot some transformation.
+    In other words: does our transformation system result in our intended normal
+    form for all possible inputs?
+    \item[q:determinism] Is our system \emph{deterministic}? Since we have defined
+    no particular order in which the transformation should be applied, there is an
+    obvious risk that different transformation orderings will result in
+    \emph{different} normal forms. They might still both be intended normal forms
+    (if our system is \emph{complete}) and describe correct hardware (if our
+    system is \emph{sound}), so this property is less important than the previous
+    three: the translator would still function properly without it.
+    \stopitemize
+
+    Unfortunately, the final transformation system has only been
+    developed in the final part of the research, leaving no more time
+    for verifying these properties. In fact, it is likely that the
+    current transformation system still violates some of these
+    properties in some cases (see
+    \in{section}[sec:normalization:non-determinism] and
+    \in{section}[sec:normalization:stateproblems]) and should be improved (or
+    extra conditions on the input hardware descriptions should be formulated).
+
+    This is most likely the case with the completeness and determinism
+    properties, perhaps also the termination property. The soundness
+    property probably holds, since it is easier to manually verify (each
+    transformation can be reviewed separately).
+
+    Even though no complete proofs have been made, some ideas for
+    possible proof strategies are shown below.
+
+    \subsection{Graph representation}
+      Before looking into how to prove these properties, we will look at
+      transformation systems from a graph perspective. We will first define
+      the graph view and then illustrate it using a simple example from lambda
+      calculus (which is a different system than the Cλash normalization
+      system). The nodes of the graph are all possible Core expressions. The
+      (directed) edges of the graph are transformations. When a transformation
+      α applies to an expression \lam{A} to produce an expression \lam{B}, we
+      add an edge from the node for \lam{A} to the node for \lam{B}, labeled
+      α.
+
+      \startuseMPgraphic{TransformGraph}
+        save a, b, c, d;
+
+        % Nodes
+        newCircle.a(btex \lam{(λx.λy. (+) x y) 1} etex);
+        newCircle.b(btex \lam{λy. (+) 1 y} etex);
+        newCircle.c(btex \lam{(λx.(+) x) 1} etex);
+        newCircle.d(btex \lam{(+) 1} etex);
+
+        b.c = origin;
+        c.c = b.c + (4cm, 0cm);
+        a.c = midpoint(b.c, c.c) + (0cm, 4cm);
+        d.c = midpoint(b.c, c.c) - (0cm, 3cm);
+
+        % β-conversion between a and b
+        ncarc.a(a)(b) "name(bred)";
+        ObjLabel.a(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
+        ncarc.b(b)(a) "name(bexp)", "linestyle(dashed withdots)";
+        ObjLabel.b(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
+
+        % η-conversion between a and c
+        ncarc.a(a)(c) "name(ered)";
+        ObjLabel.a(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
+        ncarc.c(c)(a) "name(eexp)", "linestyle(dashed withdots)";
+        ObjLabel.c(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
+
+        % η-conversion between b and d
+        ncarc.b(b)(d) "name(ered)";
+        ObjLabel.b(btex $\xrightarrow[normal]{}{η}$ etex) "labpathname(ered)", "labdir(rt)";
+        ncarc.d(d)(b) "name(eexp)", "linestyle(dashed withdots)";
+        ObjLabel.d(btex $\xleftarrow[normal]{}{η}$ etex) "labpathname(eexp)", "labdir(lft)";
+
+        % β-conversion between c and d
+        ncarc.c(c)(d) "name(bred)";
+        ObjLabel.c(btex $\xrightarrow[normal]{}{β}$ etex) "labpathname(bred)", "labdir(rt)";
+        ncarc.d(d)(c) "name(bexp)", "linestyle(dashed withdots)";
+        ObjLabel.d(btex $\xleftarrow[normal]{}{β}$ etex) "labpathname(bexp)", "labdir(lft)";
+
+        % Draw objects and lines
+        drawObj(a, b, c, d);
+      \stopuseMPgraphic
+
+      \placeexample[right][ex:TransformGraph]{Partial graph of a lambda calculus
+      system with β and η reduction (solid lines) and expansion (dotted lines).}
+          \boxedgraphic{TransformGraph}
+
+      Of course the graph for Cλash is unbounded, since we can construct an
+      infinite amount of Core expressions. Also, there might potentially be
+      multiple edges between two given nodes (with different labels), though
+      this seems unlikely to actually happen in our system.
+
+      See \in{example}[ex:TransformGraph] for the graph representation of a very
+      simple lambda calculus that contains just the expressions \lam{(λx.λy. (+) x
+      y) 1}, \lam{λy. (+) 1 y}, \lam{(λx.(+) x) 1} and \lam{(+) 1}. The
+      transformation system consists of β-reduction and η-reduction (solid edges) or
+      β-expansion and η-expansion (dotted edges).
+
+      \todo{Define β-reduction and η-reduction?}
+
+      In such a graph a node (expression) is in normal form if it has no
+      outgoing edges (meaning no transformation applies to it). The set of
+      nodes without outgoing edges is called the \emph{normal set}. Similarly,
+      the set of nodes containing expressions in intended normal form
+      \refdef{intended normal form} is called the \emph{intended normal set}.
+
+      From such a graph, we can derive some properties easily:
+      \startitemize[KR]
+        \item A system will \emph{terminate} if there is no walk (sequence of
+        edges, or transformations) of infinite length in the graph (this
+        includes cycles, but can also happen without cycles).
+        \item Soundness is not easily represented in the graph.
+        \item A system is \emph{complete} if all of the nodes in the normal set have
+        the intended normal form. The inverse (that all of the nodes outside of
+        the normal set are \emph{not} in the intended normal form) is not
+        strictly required. In other words, our normal set must be a
+        subset of the intended normal form, but they do not need to be
+        the same set.
+        form.
+        \item A system is deterministic if all paths starting at a particular
+        node, which end in a node in the normal set, end at the same node.
+      \stopitemize
+
+      When looking at the \in{example}[ex:TransformGraph], we see that the system
+      terminates for both the reduction and expansion systems (but note that, for
+      expansion, this is only true because we have limited the possible
+      expressions.  In complete lambda calculus, there would be a path from
+      \lam{(λx.λy. (+) x y) 1} to \lam{(λx.λy.(λz.(+) z) x y) 1} to
+      \lam{(λx.λy.(λz.(λq.(+) q) z) x y) 1} etc.)
+
+      If we would consider the system with both expansion and reduction, there
+      would no longer be termination either, since there would be cycles all
+      over the place.
+
+      The reduction and expansion systems have a normal set of containing just
+      \lam{(+) 1} or \lam{(λx.λy. (+) x y) 1} respectively. Since all paths in
+      either system end up in these normal forms, both systems are \emph{complete}.
+      Also, since there is only one node in the normal set, it must obviously be
+      \emph{deterministic} as well.
+
+    \subsection{Termination}
+      In general, proving termination of an arbitrary program is a very
+      hard problem. \todo{Ref about arbitrary termination} Fortunately,
+      we only have to prove termination for our specific transformation
+      system.
+
+      A common approach for these kinds of proofs is to associate a
+      measure with each possible expression in our system. If we can
+      show that each transformation strictly decreases this measure
+      (\ie, the expression transformed to has a lower measure than the
+      expression transformed from).  \todo{ref about measure-based
+      termination proofs / analysis}
+      
+      A good measure for a system consisting of just β-reduction would
+      be the number of lambda expressions in the expression. Since every
+      application of β-reduction removes a lambda abstraction (and there
+      is always a bounded number of lambda abstractions in every
+      expression) we can easily see that a transformation system with
+      just β-reduction will always terminate.
+
+      For our complete system, this measure would be fairly complex
+      (probably the sum of a lot of things). Since the (conditions on)
+      our transformations are pretty complex, we would need to include
+      both simple things like the number of let expressions as well as
+      more complex things like the number of case expressions that are
+      not yet in normal form.
+
+      No real attempt has been made at finding a suitable measure for
+      our system yet.
+
+    \subsection{Soundness}
+      Soundness is a property that can be proven for each transformation
+      separately. Since our system only runs separate transformations
+      sequentially, if each of our transformations leaves the
+      \emph{meaning} of the expression unchanged, then the entire system
+      will of course leave the meaning unchanged and is thus
+      \emph{sound}.
+
+      The current prototype has only been verified in an ad-hoc fashion
+      by inspecting (the code for) each transformation. A more formal
+      verification would be more appropriate.
+
+      To be able to formally show that each transformation properly
+      preserves the meaning of every expression, we require an exact
+      definition of the \emph{meaning} of every expression, so we can
+      compare them. A definition of the operational semantics of \GHC's Core
+      language is available \cite[sulzmann07], but this does not seem
+      sufficient for our goals (but it is a good start).
+
+      It should be possible to have a single formal definition of
+      meaning for Core for both normal Core compilation by \GHC\ and for
+      our compilation to \VHDL. The main difference seems to be that in
+      hardware every expression is always evaluated, while in software
+      it is only evaluated if needed, but it should be possible to
+      assign a meaning to Core expressions that assumes neither.
+      
+      Since each of the transformations can be applied to any
+      subexpression as well, there is a constraint on our meaning
+      definition: the meaning of an expression should depend only on the
+      meaning of subexpressions, not on the expressions themselves. For
+      example, the meaning of the application in \lam{f (let x = 4 in
+      x)} should be the same as the meaning of the application in \lam{f
+      4}, since the argument subexpression has the same meaning (though
+      the actual expression is different).
+      
+    \subsection{Completeness}
+      Proving completeness is probably not hard, but it could be a lot
+      of work. We have seen above that to prove completeness, we must
+      show that the normal set of our graph representation is a subset
+      of the intended normal set.
+
+      However, it is hard to systematically generate or reason about the
+      normal set, since it is defined as any nodes to which no
+      transformation applies. To determine this set, each transformation
+      must be considered and when a transformation is added, the entire
+      set should be re-evaluated. This means it is hard to show that
+      each node in the normal set is also in the intended normal set.
+      Reasoning about our intended normal set is easier, since we know
+      how to generate it from its definition. \refdef{intended normal
+      form definition}
+
+      Fortunately, we can also prove the complement (which is
+      equivalent, since $A \subseteq B \Leftrightarrow \overline{B}
+      \subseteq \overline{A}$): show that the set of nodes not in
+      intended normal form is a subset of the set of nodes not in normal
+      form. In other words, show that for every expression that is not
+      in intended normal form, that there is at least one transformation
+      that applies to it (since that means it is not in normal form
+      either and since $A \subseteq C \Leftrightarrow \forall x (x \in A
+      \rightarrow x \in C)$).
+
+      By systematically reviewing the entire Core language definition
+      along with the intended normal form definition (both of which have
+      a similar structure), it should be possible to identify all
+      possible (sets of) Core expressions that are not in intended
+      normal form and identify a transformation that applies to it.
+      
+      This approach is especially useful for proving completeness of our
+      system, since if expressions exist to which none of the
+      transformations apply (\ie\ if the system is not yet complete), it
+      is immediately clear which expressions these are and adding
+      (or modifying) transformations to fix this should be relatively
+      easy.
+
+      As observed above, applying this approach is a lot of work, since
+      we need to check every (set of) transformation(s) separately.
+
+      \todo{Perhaps do a few steps of the proofs as proof-of-concept}
+
+% vim: set sw=2 sts=2 expandtab: