X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FNormalization.tex;h=b74333ae103c15d51b08fa37cc708cc7c3f1503b;hp=cbef2c0a7322718897573c561dafc22356276ce8;hb=a5a906640aceb70689f66e53b5175d93bc9505ff;hpb=841a7f87134c7216e32ea002aae01ed8c357fec5

diff --git a/Chapters/Normalization.tex b/Chapters/Normalization.tex
index cbef2c0..b74333a 100644
--- a/Chapters/Normalization.tex
+++ b/Chapters/Normalization.tex
@@ -1,4 +1,4 @@
-\chapter{Normalization}
+\chapter[chap:normalization]{Normalization}
 
 % A helper to print a single example in the half the page width. The example
 % text should be in a buffer whose name is given in an argument.
@@ -7,7 +7,7 @@
 % will end up on a single line. The strut=no option prevents a bunch of empty
 % space at the start of the frame.
 \define[1]\example{
-  \framed[offset=1mm,align=right,strut=no]{
+  \framed[offset=1mm,align=right,strut=no,background=box,frame=off]{
     \setuptyping[option=LAM,style=sans,before=,after=]
     \typebuffer[#1]
     \setuptyping[option=none,style=\tttf]
@@ -15,10 +15,6 @@
 }
 
 
-% A transformation example
-\definefloat[example][examples]
-\setupcaption[example][location=top] % Put captions on top
-
 \define[3]\transexample{
   \placeexample[here]{#1}
   \startcombination[2*1]
@@ -35,37 +31,206 @@
 %%  \stopcombination
 %}
 
-The first step in the core to VHDL translation process, is normalization. We
+The first step in the core to \small{VHDL} translation process, is normalization. We
 aim to bring the core description into a simpler form, which we can
-subsequently translate into VHDL easily. This normal form is needed because
-the full core language is more expressive than VHDL in some areas and because
+subsequently translate into \small{VHDL} easily. This normal form is needed because
+the full core language is more expressive than \small{VHDL} in some areas and because
 core can describe expressions that do not have a direct hardware
 interpretation.
 
-TODO: Describe core properties not supported in VHDL, and describe how the
-VHDL we want to generate should look like.
+TODO: Describe core properties not supported in \small{VHDL}, and describe how the
+\small{VHDL} we want to generate should look like.
 
-\section{Goal}
+\section{Normal form}
 The transformations described here have a well-defined goal: To bring the
 program in a well-defined form that is directly translatable to hardware,
-while fully preserving the semantics of the program.
+while fully preserving the semantics of the program. We refer to this form as
+the \emph{normal form} of the program. The formal definition of this normal
+form is quite simple:
 
-This {\em normal form} is again a Core program, but with a very specific
-structure. A function in normal form has nested lambda's at the top, which
-produce a let expression. This let expression binds every function application
-in the function and produces a simple identifier. Every bound value in
-the let expression is either a simple function application or a case
-expression to extract a single element from a tuple returned by a
-function.
+\placedefinition{}{A program is in \emph{normal form} if none of the
+transformations from this chapter apply.}
 
-An example of a program in canonical form would be:
+Of course, this is an \quote{easy} definition of the normal form, since our
+program will end up in normal form automatically. The more interesting part is
+to see if this normal form actually has the properties we would like it to
+have.
+
+But, before getting into more definitions and details about this normal form,
+let's try to get a feeling for it first. The easiest way to do this is by
+describing the things we want to not have in a normal form.
+
+\startitemize
+  \item Any \emph{polymorphism} must be removed. When laying down hardware, we
+  can't generate any signals that can have multiple types. All types must be
+  completely known to generate hardware.
+  
+  \item Any \emph{higher order} constructions must be removed. We can't
+  generate a hardware signal that contains a function, so all values,
+  arguments and returns values used must be first order.
+
+  \item Any complex \emph{nested scopes} must be removed. In the \small{VHDL}
+  description, every signal is in a single scope. Also, full expressions are
+  not supported everywhere (in particular port maps can only map signal names,
+  not expressions). To make the \small{VHDL} generation easy, all values must be bound
+  on the \quote{top level}.
+\stopitemize
+
+TODO: Intermezzo: functions vs plain values
+
+A very simple example of a program in normal form is given in
+\in{example}[ex:MulSum]. As you can see, all arguments to the function (which
+will become input ports in the final hardware) are at the top. This means that
+the body of the final lambda abstraction is never a function, but always a
+plain value.
+
+After the lambda abstractions, we see a single let expression, that binds two
+variables (\lam{mul} and \lam{sum}). These variables will be signals in the
+final hardware, bound to the output port of the \lam{*} and \lam{+}
+components.
+
+The final line (the \quote{return value} of the function) selects the
+\lam{sum} signal to be the output port of the function. This \quote{return
+value} can always only be a variable reference, never a more complex
+expression.
+
+\startbuffer[MulSum]
+alu :: Bit -> Word -> Word -> Word
+alu = Î»a.Î»b.Î»c.
+    let
+      mul = (*) a b
+      sum = (+) mul c
+    in
+      sum
+\stopbuffer
+
+\startuseMPgraphic{MulSum}
+  save a, b, c, mul, add, sum;
+
+  % I/O ports
+  newCircle.a(btex $a$ etex) "framed(false)";
+  newCircle.b(btex $b$ etex) "framed(false)";
+  newCircle.c(btex $c$ etex) "framed(false)";
+  newCircle.sum(btex $res$ etex) "framed(false)";
+
+  % Components
+  newCircle.mul(btex - etex);
+  newCircle.add(btex + etex);
+
+  a.c      - b.c   = (0cm, 2cm);
+  b.c      - c.c   = (0cm, 2cm);
+  add.c            = c.c + (2cm, 0cm);
+  mul.c            = midpoint(a.c, b.c) + (2cm, 0cm);
+  sum.c            = add.c + (2cm, 0cm);
+  c.c              = origin;
+
+  % Draw objects and lines
+  drawObj(a, b, c, mul, add, sum);
+
+  ncarc(a)(mul) "arcangle(15)";
+  ncarc(b)(mul) "arcangle(-15)";
+  ncline(c)(add);
+  ncline(mul)(add);
+  ncline(add)(sum);
+\stopuseMPgraphic
+
+\placeexample[here][ex:MulSum]{Simple architecture consisting of an adder and a
+subtractor.}
+  \startcombination[2*1]
+    {\typebufferlam{MulSum}}{Core description in normal form.}
+    {\boxedgraphic{MulSum}}{The architecture described by the normal form.}
+  \stopcombination
+
+The previous example described composing an architecture by calling other
+functions (operators), resulting in a simple architecture with component and
+connection. There is of course also some mechanism for choice in the normal
+form. In a normal Core program, the \emph{case} expression can be used in a
+few different ways to describe choice. In normal form, this is limited to a
+very specific form.
+
+\in{Example}[ex:AddSubAlu] shows an example describing a
+simple \small{ALU}, which chooses between two operations based on an opcode
+bit. The main structure is the same as in \in{example}[ex:MulSum], but this
+time the \lam{res} variable is bound to a case expression. This case
+expression scrutinizes the variable \lam{opcode} (and scrutinizing more
+complex expressions is not supported). The case expression can select a
+different variable based on the constructor of \lam{opcode}.
+
+\startbuffer[AddSubAlu]
+alu :: Bit -> Word -> Word -> Word
+alu = Î»opcode.Î»a.Î»b.
+    let
+      res1 = (+) a b
+      res2 = (-) a b
+      res = case opcode of
+        Low -> res1
+        High -> res2
+    in
+      res
+\stopbuffer
+
+\startuseMPgraphic{AddSubAlu}
+  save opcode, a, b, add, sub, mux, res;
+
+  % I/O ports
+  newCircle.opcode(btex $opcode$ etex) "framed(false)";
+  newCircle.a(btex $a$ etex) "framed(false)";
+  newCircle.b(btex $b$ etex) "framed(false)";
+  newCircle.res(btex $res$ etex) "framed(false)";
+  % Components
+  newCircle.add(btex + etex);
+  newCircle.sub(btex - etex);
+  newMux.mux;
+
+  opcode.c - a.c   = (0cm, 2cm);
+  add.c    - a.c   = (4cm, 0cm);
+  sub.c    - b.c   = (4cm, 0cm);
+  a.c      - b.c   = (0cm, 3cm);
+  mux.c            = midpoint(add.c, sub.c) + (1.5cm, 0cm);
+  res.c    - mux.c = (1.5cm, 0cm);
+  b.c              = origin;
+
+  % Draw objects and lines
+  drawObj(opcode, a, b, res, add, sub, mux);
+
+  ncline(a)(add) "posA(e)";
+  ncline(b)(sub) "posA(e)";
+  nccurve(a)(sub) "posA(e)", "angleA(0)";
+  nccurve(b)(add) "posA(e)", "angleA(0)";
+  nccurve(add)(mux) "posB(inpa)", "angleB(0)";
+  nccurve(sub)(mux) "posB(inpb)", "angleB(0)";
+  nccurve(opcode)(mux) "posB(n)", "angleA(0)", "angleB(-90)";
+  ncline(mux)(res) "posA(out)";
+\stopuseMPgraphic
+
+\placeexample[here][ex:AddSubAlu]{Simple \small{ALU} supporting two operations.}
+  \startcombination[2*1]
+    {\typebufferlam{AddSubAlu}}{Core description in normal form.}
+    {\boxedgraphic{AddSubAlu}}{The architecture described by the normal form.}
+  \stopcombination
+
+As a more complete example, consider \in{example}[ex:NormalComplete]. This
+example contains everything that is supported in normal form, with the
+exception of builtin higher order functions. The graphical version of the
+architecture contains a slightly simplified version, since the state tuple
+packing and unpacking have been left out. Instead, two seperate registers are
+drawn. Also note that most synthesis tools will further optimize this
+architecture by removing the multiplexers at the register input and replace
+them with some logic in the clock inputs, but we want to show the architecture
+as close to the description as possible.
+
+\startbuffer[NormalComplete]
+  regbank :: Bit 
+             -> Word 
+             -> State (Word, Word) 
+             -> (State (Word, Word), Word)
 
-\startlambda
   -- All arguments are an inital lambda
-  Î»a.Î»d.Î»sp.
+  regbank = Î»a.Î»d.Î»sp.
   -- There are nested let expressions at top level
   let
-    -- Unpack the state by coercion
+    -- Unpack the state by coercion (\eg, cast from
+    -- State (Word, Word) to (Word, Word))
     s = sp :: (Word, Word)
     -- Extract both registers from the state
     r1 = case s of (fst, snd) -> fst
@@ -77,87 +242,509 @@ An example of a program in canonical form would be:
       High -> r1
       Low -> r2
     r1' = case a of
-      High -> d
+      High -> d'
       Low -> r1
     r2' = case a of
       High -> r2
-      Low -> d
+      Low -> d'
     -- Packing a tuple
     s' = (,) r1' r2'
-    -- Packing the state by coercion
+    -- pack the state by coercion (\eg, cast from
+    -- (Word, Word) to State (Word, Word))
     sp' = s' :: State (Word, Word)
     -- Pack our return value
     res = (,) sp' out
   in
     -- The actual result
     res
+\stopbuffer
+
+\startuseMPgraphic{NormalComplete}
+  save a, d, r, foo, muxr, muxout, out;
+
+  % I/O ports
+  newCircle.a(btex \lam{a} etex) "framed(false)";
+  newCircle.d(btex \lam{d} etex) "framed(false)";
+  newCircle.out(btex \lam{out} etex) "framed(false)";
+  % Components
+  %newCircle.add(btex + etex);
+  newBox.foo(btex \lam{foo} etex);
+  newReg.r1(btex $\lam{r1}$ etex) "dx(4mm)", "dy(6mm)";
+  newReg.r2(btex $\lam{r2}$ etex) "dx(4mm)", "dy(6mm)", "reflect(true)";
+  newMux.muxr1;
+  % Reflect over the vertical axis
+  reflectObj(muxr1)((0,0), (0,1));
+  newMux.muxr2;
+  newMux.muxout;
+  rotateObj(muxout)(-90);
+
+  d.c               = foo.c + (0cm, 1.5cm); 
+  a.c               = (xpart r2.c + 2cm, ypart d.c - 0.5cm);
+  foo.c             = midpoint(muxr1.c, muxr2.c) + (0cm, 2cm);
+  muxr1.c           = r1.c + (0cm, 2cm);
+  muxr2.c           = r2.c + (0cm, 2cm);
+  r2.c              = r1.c + (4cm, 0cm);
+  r1.c              = origin;
+  muxout.c          = midpoint(r1.c, r2.c) - (0cm, 2cm);
+  out.c             = muxout.c - (0cm, 1.5cm);
+
+%  % Draw objects and lines
+  drawObj(a, d, foo, r1, r2, muxr1, muxr2, muxout, out);
+  
+  ncline(d)(foo);
+  nccurve(foo)(muxr1) "angleA(-90)", "posB(inpa)", "angleB(180)";
+  nccurve(foo)(muxr2) "angleA(-90)", "posB(inpb)", "angleB(0)";
+  nccurve(muxr1)(r1) "posA(out)", "angleA(180)", "posB(d)", "angleB(0)";
+  nccurve(r1)(muxr1) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(180)";
+  nccurve(muxr2)(r2) "posA(out)", "angleA(0)", "posB(d)", "angleB(180)";
+  nccurve(r2)(muxr2) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(0)";
+  nccurve(r1)(muxout) "posA(out)", "angleA(0)", "posB(inpb)", "angleB(-90)";
+  nccurve(r2)(muxout) "posA(out)", "angleA(180)", "posB(inpa)", "angleB(-90)";
+  % Connect port a
+  nccurve(a)(muxout) "angleA(-90)", "angleB(180)", "posB(sel)";
+  nccurve(a)(muxr1) "angleA(180)", "angleB(-90)", "posB(sel)";
+  nccurve(a)(muxr2) "angleA(180)", "angleB(-90)", "posB(sel)";
+  ncline(muxout)(out) "posA(out)";
+\stopuseMPgraphic
+
+\placeexample[here][ex:NormalComplete]{Simple architecture consisting of an adder and a
+subtractor.}
+  \startcombination[2*1]
+    {\typebufferlam{NormalComplete}}{Core description in normal form.}
+    {\boxedgraphic{NormalComplete}}{The architecture described by the normal form.}
+  \stopcombination
+
+\subsection{Normal form definition}
+Now we have some intuition for the normal form, we can describe how we want
+the normal form to look like in a slightly more formal manner. The following
+EBNF-like description completely captures the intended structure (and
+generates a subset of GHC's core format).
+
+Some clauses have an expression listed in parentheses. These are conditions
+that need to apply to the clause.
+
+\startlambda
+\italic{normal} = \italic{lambda}
+\italic{lambda} = Î»var.\italic{lambda} (representable(var))
+                | \italic{toplet} 
+\italic{toplet} = let \italic{binding} in \italic{toplet} 
+                | letrec [\italic{binding}] in \italic{toplet}
+                | var (representable(varvar))
+\italic{binding} = var = \italic{rhs} (representable(rhs))
+                 -- State packing and unpacking by coercion
+                 | var0 = var1 :: State ty (lvar(var1))
+                 | var0 = var1 :: ty (var0 :: State ty) (lvar(var1))
+\italic{rhs} = userapp
+             | builtinapp
+             -- Extractor case
+             | case var of C a0 ... an -> ai (lvar(var))
+             -- Selector case
+             | case var of (lvar(var))
+                DEFAULT -> var0 (lvar(var0))
+                C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar))
+\italic{userapp} = \italic{userfunc}
+                 | \italic{userapp} {userarg}
+\italic{userfunc} = var (gvar(var))
+\italic{userarg} = var (lvar(var))
+\italic{builtinapp} = \italic{builtinfunc}
+                    | \italic{builtinapp} \italic{builtinarg}
+\italic{builtinfunc} = var (bvar(var))
+\italic{builtinarg} = \italic{coreexpr}
 \stoplambda
 
+-- TODO: Limit builtinarg further
+
+-- TODO: There can still be other casts around (which the code can handle,
+e.g., ignore), which still need to be documented here.
+
+-- TODO: Note about the selector case. It just supports Bit and Bool
+currently, perhaps it should be generalized in the normal form?
+
 When looking at such a program from a hardware perspective, the top level
 lambda's define the input ports. The value produced by the let expression is
 the output port. Most function applications bound by the let expression
 define a component instantiation, where the input and output ports are mapped
 to local signals or arguments. Some of the others use a builtin
 construction (\eg the \lam{case} statement) or call a builtin function
-(\eg \lam{add} or \lam{sub}). For these, a hardcoded VHDL translation is
+(\eg \lam{add} or \lam{sub}). For these, a hardcoded \small{VHDL} translation is
 available.
 
-\subsection{Normal definition}
-Formally, the normal form is a core program obeying the following
-constraints. TODO: Update this section, this is probably not completely
-accurate or relevant anymore.
-
-\startitemize[R,inmargin]
-%\item All top level binds must have the form $\expr{\bind{fun}{lamexpr}}$.
-%$fun$ is an identifier that will be bound as a global identifier.
-%\item A $lamexpr$ has the form $\expr{\lam{arg}{lamexpr}}$ or
-%$\expr{letexpr}$. $arg$ is an identifier which will be bound as an $argument$.
-%\item[letexpr] A $letexpr$ has the form $\expr{\letexpr{letbinds}{retexpr}}$.
-%\item $letbinds$ is a list with elements of the form
-%$\expr{\bind{res}{appexpr}}$ or $\expr{\bind{res}{builtinexpr}}$, where $res$ is
-%an identifier that will be bound as local identifier. The type of the bound
-%value must be a $hardware\;type$.
-%\item[builtinexpr] A $builtinexpr$ is an expression that can be mapped to an
-%equivalent VHDL expression. Since there are many supported forms for this,
-%these are defined in a separate table.
-%\item An $appexpr$ has the form $\expr{fun}$ or $\expr{\app{appexpr}{x}}$,
-%where $fun$ is a global identifier and $x$ is a local identifier.
-%\item[retexpr] A $retexpr$ has the form $\expr{x}$ or $\expr{tupexpr}$, where $x$ is a local identifier that is bound as an $argument$ or $result$.  A $retexpr$ must
-%be of a $hardware\;type$.
-%\item A $tupexpr$ has the form $\expr{con}$ or $\expr{\app{tupexpr}{x}}$,
-%where $con$ is a tuple constructor ({\em e.g.} $(,)$ or $(,,,)$) and $x$ is
-%a local identifier.
-%\item A $hardware\;type$ is a type that can be directly translated to
-%hardware. This includes the types $Bit$, $SizedWord$, tuples containing
-%elements of $hardware\;type$s, and will include others. This explicitely
-%excludes function types.
+\section{Transformation notation}
+To be able to concisely present transformations, we use a specific format to
+them. It is a simple format, similar to one used in logic reasoning.
+
+Such a transformation description looks like the following.
+
+\starttrans
+<context conditions>
+~
+<original expression>
+--------------------------          <expression conditions>
+<transformed expresssion>
+~
+<context additions>
+\stoptrans
+
+This format desribes a transformation that applies to \lam{original
+expresssion} and transforms it into \lam{transformed expression}, assuming
+that all conditions apply. In this format, there are a number of placeholders
+in pointy brackets, most of which should be rather obvious in their meaning.
+Nevertheless, we will more precisely specify their meaning below:
+
+  \startdesc{<original expression>} The expression pattern that will be matched
+  against (subexpressions of) the expression to be transformed. We call this a
+  pattern, because it can contain \emph{placeholders} (variables), which match
+  any expression or binder. Any such placeholder is said to be \emph{bound} to
+  the expression it matches. It is convention to use an uppercase latter (\eg
+  \lam{M} or \lam{E} to refer to any expression (including a simple variable
+  reference) and lowercase letters (\eg \lam{v} or \lam{b}) to refer to
+  (references to) binders.
+
+  For example, the pattern \lam{a + B} will match the expression 
+  \lam{v + (2 * w)} (and bind \lam{a} to \lam{v} and \lam{B} to 
+  \lam{(2 * 2)}), but not \lam{v + (2 * w)}.
+  \stopdesc
+
+  \startdesc{<expression conditions>}
+  These are extra conditions on the expression that is matched. These
+  conditions can be used to further limit the cases in which the
+  transformation applies, in particular to prevent a transformation from
+  causing a loop with itself or another transformation.
+
+  Only if these if these conditions are \emph{all} true, this transformation
+  applies.
+  \stopdesc
+
+  \startdesc{<context conditions>}
+  These are a number of extra conditions on the context of the function. In
+  particular, these conditions can require some other top level function to be
+  present, whose value matches the pattern given here. The format of each of
+  these conditions is: \lam{binder = <pattern>}.
+
+  Typically, the binder is some placeholder bound in the \lam{<original
+  expression>}, while the pattern contains some placeholders that are used in
+  the \lam{transformed expression}.
+  
+  Only if a top level binder exists that matches each binder and pattern, this
+  transformation applies.
+  \stopdesc
+
+  \startdesc{<transformed expression>}
+  This is the expression template that is the result of the transformation. If, looking
+  at the above three items, the transformation applies, the \lam{original
+  expression} is completely replaced with the \lam{<transformed expression>}.
+  We call this a template, because it can contain placeholders, referring to
+  any placeholder bound by the \lam{<original expression>} or the
+  \lam{<context conditions>}. The resulting expression will have those
+  placeholders replaced by the values bound to them.
+
+  Any binder (lowercase) placeholder that has no value bound to it yet will be
+  bound to (and replaced with) a fresh binder.
+  \stopdesc
+
+  \startdesc{<context additions>}
+  These are templates for new functions to add to the context. This is a way
+  to have a transformation create new top level functiosn.
+
+  Each addition has the form \lam{binder = template}. As above, any
+  placeholder in the addition is replaced with the value bound to it, and any
+  binder placeholder that has no value bound to it yet will be bound to (and
+  replaced with) a fresh binder.
+  \stopdesc
+
+  As an example, we'll look at Î·-abstraction:
+
+\starttrans
+E                 \lam{E :: a -> b}
+--------------    \lam{E} does not occur on a function position in an application
+Î»x.E x            \lam{E} is not a lambda abstraction.
+\stoptrans
+
+  Consider the following function, which is a fairly obvious way to specify a
+  simple ALU (Note \at{example}[ex:AddSubAlu] is the normal form of this
+  function):
+
+\startlambda 
+alu :: Bit -> Word -> Word -> Word
+alu = Î»opcode. case opcode of
+  Low -> (+)
+  High -> (-)
+\stoplambda
+
+  There are a few subexpressions in this function to which we could possibly
+  apply the transformation. Since the pattern of the transformation is only
+  the placeholder \lam{E}, any expression will match that. Whether the
+  transformation applies to an expression is thus solely decided by the
+  conditions to the right of the transformation.
+
+  We will look at each expression in the function in a top down manner. The
+  first expression is the entire expression the function is bound to.
+
+\startlambda
+Î»opcode. case opcode of
+  Low -> (+)
+  High -> (-)
+\stoplambda
+
+  As said, the expression pattern matches this. The type of this expression is
+  \lam{Bit -> Word -> Word -> Word}, which matches \lam{a -> b} (Note that in
+  this case \lam{a = Bit} and \lam{b = Word -> Word -> Word}).
+
+  Since this expression is at top level, it does not occur at a function
+  position of an application. However, The expression is a lambda abstraction,
+  so this transformation does not apply.
+
+  The next expression we could apply this transformation to, is the body of
+  the lambda abstraction:
+
+\startlambda
+case opcode of
+  Low -> (+)
+  High -> (-)
+\stoplambda
+
+  The type of this expression is \lam{Word -> Word -> Word}, which again
+  matches \lam{a -> b}. The expression is the body of a lambda expression, so
+  it does not occur at a function position of an application. Finally, the
+  expression is not a lambda abstraction but a case expression, so all the
+  conditions match. There are no context conditions to match, so the
+  transformation applies.
+
+  By now, the placeholder \lam{E} is bound to the entire expression. The
+  placeholder \lam{x}, which occurs in the replacement template, is not bound
+  yet, so we need to generate a fresh binder for that. Let's use the binder
+  \lam{a}. This results in the following replacement expression:
+
+\startlambda
+Î»a.(case opcode of
+  Low -> (+)
+  High -> (-)) a
+\stoplambda
+
+  Continuing with this expression, we see that the transformation does not
+  apply again (it is a lambda expression). Next we look at the body of this
+  labmda abstraction:
+
+\startlambda
+(case opcode of
+  Low -> (+)
+  High -> (-)) a
+\stoplambda
+  
+  Here, the transformation does apply, binding \lam{E} to the entire
+  expression and \lam{x} to the fresh binder \lam{b}, resulting in the
+  replacement:
+
+\startlambda
+Î»b.(case opcode of
+  Low -> (+)
+  High -> (-)) a b
+\stoplambda
+
+  Again, the transformation does not apply to this lambda abstraction, so we
+  look at its body. For brevity, we'll put the case statement on one line from
+  now on.
+
+\startlambda
+(case opcode of Low -> (+); High -> (-)) a b
+\stoplambda
+
+  The type of this expression is \lam{Word}, so it does not match \lam{a -> b}
+  and the transformation does not apply. Next, we have two options for the
+  next expression to look at: The function position and argument position of
+  the application. The expression in the argument position is \lam{b}, which
+  has type \lam{Word}, so the transformation does not apply. The expression in
+  the function position is:
+
+\startlambda
+(case opcode of Low -> (+); High -> (-)) a
+\stoplambda
+
+  Obviously, the transformation does not apply here, since it occurs in
+  function position. In the same way the transformation does not apply to both
+  components of this expression (\lam{case opcode of Low -> (+); High -> (-)}
+  and \lam{a}), so we'll skip to the components of the case expression: The
+  scrutinee and both alternatives. Since the opcode is not a function, it does
+  not apply here, and we'll leave both alternatives as an exercise to the
+  reader. The final function, after all these transformations becomes:
+
+\startlambda 
+alu :: Bit -> Word -> Word -> Word
+alu = Î»opcode.Î»a.b. (case opcode of
+  Low -> Î»a1.Î»b1 (+) a1 b1
+  High -> Î»a2.Î»b2 (-) a2 b2) a b
+\stoplambda
+
+  In this case, the transformation does not apply anymore, though this might
+  not always be the case (e.g., the application of a transformation on a
+  subexpression might open up possibilities to apply the transformation
+  further up in the expression).
+
+\subsection{Transformation application}
+In this chapter we define a number of transformations, but how will we apply
+these? As stated before, our normal form is reached as soon as no
+transformation applies anymore. This means our application strategy is to
+simply apply any transformation that applies, and continuing to do that with
+the result of each transformation.
+
+In particular, we define no particular order of transformations. Since
+transformation order should not influence the resulting normal form (see TODO
+ref), this leaves the implementation free to choose any application order that
+results in an efficient implementation.
+
+When applying a single transformation, we try to apply it to every (sub)expression
+in a function, not just the top level function. This allows us to keep the
+transformation descriptions concise and powerful.
+
+\subsection{Definitions}
+In the following sections, we will be using a number of functions and
+notations, which we will define here.
+
+\subsubsection{Other concepts}
+A \emph{global variable} is any variable that is bound at the
+top level of a program, or an external module. A \emph{local variable} is any
+other variable (\eg, variables local to a function, which can be bound by
+lambda abstractions, let expressions and pattern matches of case
+alternatives).  Note that this is a slightly different notion of global versus
+local than what \small{GHC} uses internally.
+\defref{global variable} \defref{local variable}
+
+A \emph{hardware representable} (or just \emph{representable}) type or value
+is (a value of) a type that we can generate a signal for in hardware. For
+example, a bit, a vector of bits, a 32 bit unsigned word, etc. Types that are
+not runtime representable notably include (but are not limited to): Types,
+dictionaries, functions.
+\defref{representable}
+
+A \emph{builtin function} is a function supplied by the CÎ»ash framework, whose
+implementation is not valid CÎ»ash. The implementation is of course valid
+Haskell, for simulation, but it is not expressable in CÎ»ash.
+\defref{builtin function} \defref{user-defined function}
+
+For these functions, CÎ»ash has a \emph{builtin hardware translation}, so calls
+to these functions can still be translated. These are functions like
+\lam{map}, \lam{hwor} and \lam{length}.
+
+A \emph{user-defined} function is a function for which we do have a CÎ»ash
+implementation available.
+
+\subsubsection{Functions}
+Here, we define a number of functions that can be used below to concisely
+specify conditions.
+
+\refdef{global variable}\emph{gvar(expr)} is true when \emph{expr} is a variable that references a
+global variable. It is false when it references a local variable.
+
+\refdef{local variable}\emph{lvar(expr)} is the complement of \emph{gvar}; it is true when \emph{expr}
+references a local variable, false when it references a global variable.
+
+\refdef{representable}\emph{representable(expr)} or \emph{representable(var)} is true when
+\emph{expr} or \emph{var} is \emph{representable}.
+
+\subsection{Binder uniqueness}
+A common problem in transformation systems, is binder uniqueness. When not
+considering this problem, it is easy to create transformations that mix up
+bindings and cause name collisions. Take for example, the following core
+expression:
+
+\startlambda
+(Î»a.Î»b.Î»c. a * b * c) x c
+\stoplambda
+
+By applying Î²-reduction (see below) once, we can simplify this expression to:
+
+\startlambda
+(Î»b.Î»c. x * b * c) c
+\stoplambda
+
+Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
+binder. No harm done here. But note that we see multiple occurences of the
+\lam{c} binder. The first is a binding occurence, to which the second refers.
+The last, however refers to \emph{another} instance of \lam{c}, which is
+bound somewhere outside of this expression. Now, if we would apply beta
+reduction without taking heed of binder uniqueness, we would get:
+
+\startlambda
+Î»c. x * c * c
+\stoplambda
+
+This is obviously not what was supposed to happen! The root of this problem is
+the reuse of binders: Identical binders can be bound in different scopes, such
+that only the inner one is \quote{visible} in the inner expression. In the example
+above, the \lam{c} binder was bound outside of the expression and in the inner
+lambda expression. Inside that lambda expression, only the inner \lam{c} is
+visible.
+
+There are a number of ways to solve this. \small{GHC} has isolated this
+problem to their binder substitution code, which performs \emph{deshadowing}
+during its expression traversal. This means that any binding that shadows
+another binding on a higher level is replaced by a new binder that does not
+shadow any other binding. This non-shadowing invariant is enough to prevent
+binder uniqueness problems in \small{GHC}.
+
+In our transformation system, maintaining this non-shadowing invariant is
+a bit harder to do (mostly due to implementation issues, the prototype doesn't
+use \small{GHC}'s subsitution code). Also, we can observe the following
+points.
+
+\startitemize
+\item Deshadowing does not guarantee overall uniqueness. For example, the
+following (slightly contrived) expression shows the identifier \lam{x} bound in
+two seperate places (and to different values), even though no shadowing
+occurs.
+
+\startlambda
+(let x = 1 in x) + (let x = 2 in x)
+\stoplambda
+
+\item In our normal form (and the resulting \small{VHDL}), all binders
+(signals) will end up in the same scope. To allow this, all binders within the
+same function should be unique.
+
+\item When we know that all binders in an expression are unique, moving around
+or removing a subexpression will never cause any binder conflicts. If we have
+some way to generate fresh binders, introducing new subexpressions will not
+cause any problems either. The only way to cause conflicts is thus to
+duplicate an existing subexpression.
 \stopitemize
 
-TODO: Say something about uniqueness of identifiers
+Given the above, our prototype maintains a unique binder invariant. This
+meanst that in any given moment during normalization, all binders \emph{within
+a single function} must be unique. To achieve this, we apply the following
+technique.
 
-\subsection{Builtin expressions}
-A $builtinexpr$, as defined at \in[builtinexpr] can have any of the following forms.
+TODO: Define fresh binders and unique supplies
 
-\startitemize[m,inmargin]
-%\item
-%$tuple\_extract=\expr{\case{t}{\alt{\app{con}{x_0\;x_1\;..\;x_n}}{x_i}}}$,
-%where $t$ can be any local identifier, $con$ is a tuple constructor ({\em
-%e.g.} $(,)$ or $(,,,)$), $x_0$ to $x_n$ can be any identifier, and $x_i$ can
-%be any of $x_0$ to $x_n$. A case expression must have a $hardware\;type$.
-%\item TODO: Many more!
+\startitemize
+\item Before starting normalization, all binders in the function are made
+unique. This is done by generating a fresh binder for every binder used. This
+also replaces binders that did not pose any conflict, but it does ensure that
+all binders within the function are generated by the same unique supply. See
+(TODO: ref fresh binder).
+\item Whenever a new binder must be generated, we generate a fresh binder that
+is guaranteed to be different from \emph{all binders generated so far}. This
+can thus never introduce duplication and will maintain the invariant.
+\item Whenever (part of) an expression is duplicated (for example when
+inlining), all binders in the expression are replaced with fresh binders
+(using the same method as at the start of normalization). These fresh binders
+can never introduce duplication, so this will maintain the invariant.
+\item Whenever we move part of an expression around within the function, there
+is no need to do anything special. There is obviously no way to introduce
+duplication by moving expressions around. Since we know that each of the
+binders is already unique, there is no way to introduce (incorrect) shadowing
+either.
 \stopitemize
 
 \section{Transform passes}
-
 In this section we describe the actual transforms. Here we're using
 the core language in a notation that resembles lambda calculus.
 
 Each of these transforms is meant to be applied to every (sub)expression
 in a program, for as long as it applies. Only when none of the
-expressions can be applied anymore, the program is in normal form. We
-hope to be able to prove that this form will obey all of the constraints
-defined above, but this has yet to happen (though it seems likely that
-it will).
+transformations can be applied anymore, the program is in normal form (by
+definition). We hope to be able to prove that this form will obey all of the
+constraints defined above, but this has yet to happen (though it seems likely
+that it will).
 
 Each of the transforms will be described informally first, explaining
 the need for and goal of the transform. Then, a formal definition is
@@ -166,9 +753,6 @@ is specified as a number of conditions (above the horizontal line) and a
 number of conclusions (below the horizontal line). The details of using
 this notation are still a bit fuzzy, so comments are welcom.
 
-TODO: Formally describe the "apply to every (sub)expression" in terms of
-rules with full transformations in the conditions.
-
 \subsection{Î·-abstraction}
 This transformation makes sure that all arguments of a function-typed
 expression are named, by introducing lambda expressions. When combined with
@@ -176,79 +760,446 @@ expression are named, by introducing lambda expressions. When combined with
 be lambda abstractions or global identifiers.
 
 \starttrans
-E                 \lam{E :: * -> *}
+E                 \lam{E :: a -> b}
 --------------    \lam{E} is not the first argument of an application.
 Î»x.E x            \lam{E} is not a lambda abstraction.
                   \lam{x} is a variable that does not occur free in \lam{E}.
 \stoptrans
 
 \startbuffer[from]
-foo = Î»a -> case a of 
+foo = Î»a.case a of 
   True -> Î»b.mul b b
   False -> id
 \stopbuffer
 
 \startbuffer[to]
-foo = Î»a.Î»x -> (case a of 
+foo = Î»a.Î»x.(case a of 
     True -> Î»b.mul b b
     False -> Î»y.id y) x
 \stopbuffer
 
 \transexample{Î·-abstraction}{from}{to}
 
-\subsection{Extended Î²-reduction}
+\subsection{Î²-reduction}
+Î²-reduction is a well known transformation from lambda calculus, where it is
+the main reduction step. It reduces applications of labmda abstractions,
+removing both the lambda abstraction and the application.
+
+In our transformation system, this step helps to remove unwanted lambda
+abstractions (basically all but the ones at the top level). Other
+transformations (application propagation, non-representable inlining) make
+sure that most lambda abstractions will eventually be reducable by
+Î²-reduction.
+
+TODO: Define substitution syntax
+
+\starttrans
+(Î»x.E) M
+-----------------
+E[M/x]
+\stoptrans
+
+% And an example
+\startbuffer[from]
+(Î»a. 2 * a) (2 * b)
+\stopbuffer
+
+\startbuffer[to]
+2 * (2 * b)
+\stopbuffer
+
+\transexample{Î²-reduction}{from}{to}
+
+\subsection{Application propagation}
 This transformation is meant to propagate application expressions downwards
-into expressions as far as possible. In lambda calculus, this reduction
-is known as Î²-reduction, but it is of course only defined for
-applications of lambda abstractions. We extend this reduction to also
-work for the rest of core (case and let expressions).
+into expressions as far as possible. This allows partial applications inside
+expressions to become fully applied and exposes new transformation
+possibilities for other transformations (like Î²-reduction).
+
+\starttrans
+let binds in E) M
+-----------------
+let binds in E M
+\stoptrans
+
+% And an example
 \startbuffer[from]
+( let 
+    val = 1
+  in 
+    add val
+) 3
+\stopbuffer
+
+\startbuffer[to]
+let 
+  val = 1
+in 
+  add val 3
+\stopbuffer
+
+\transexample{Application propagation for a let expression}{from}{to}
+
+\starttrans
 (case x of
   p1 -> E1
   \vdots
   pn -> En) M
-\stopbuffer
-\startbuffer[to]
+-----------------
 case x of
   p1 -> E1 M
   \vdots
   pn -> En M
+\stoptrans
+
+% And an example
+\startbuffer[from]
+( case x of 
+    True -> id
+    False -> neg
+) 1
 \stopbuffer
 
-%\transform{Extended Î²-reduction}
-%{
-%\conclusion
-%\trans{(Î»x.E) M}{E[M/x]}
-%
-%\nextrule
-%\conclusion
-%\trans{(let binds in E) M}{let binds in E M}
-%
-%\nextrule
-%\conclusion
-%\transbuf{from}{to}
-%}
+\startbuffer[to]
+case x of 
+  True -> id 1
+  False -> neg 1
+\stopbuffer
+
+\transexample{Application propagation for a case expression}{from}{to}
+
+\subsection{Let derecursification}
+This transformation is meant to make lets non-recursive whenever possible.
+This might allow other optimizations to do their work better. TODO: Why is
+this needed exactly?
+
+\subsection{Let flattening}
+This transformation puts nested lets in the same scope, by lifting the
+binding(s) of the inner let into a new let around the outer let. Eventually,
+this will cause all let bindings to appear in the same scope (they will all be
+in scope for the function return value).
+
+Note that this transformation does not try to be smart when faced with
+recursive lets, it will just leave the lets recursive (possibly joining a
+recursive and non-recursive let into a single recursive let). The let
+rederecursification transformation will do this instead.
+
+\starttrans
+letnonrec x = (let bindings in M) in N
+------------------------------------------
+let bindings in (letnonrec x = M) in N
+\stoptrans
+
+\starttrans
+letrec 
+  \vdots
+  x = (let bindings in M)
+  \vdots
+in
+  N
+------------------------------------------
+letrec
+  \vdots
+  bindings
+  x = M
+  \vdots
+in
+  N
+\stoptrans
+
+\startbuffer[from]
+let
+  a = letrec
+    x = 1
+    y = 2
+  in
+    x + y
+in
+  letrec
+    b = let c = 3 in a + c
+    d = 4
+  in
+    d + b
+\stopbuffer
+\startbuffer[to]
+letrec
+  x = 1
+  y = 2
+in
+  let
+    a = x + y
+  in
+    letrec
+      c = 3
+      b = a + c
+      d = 4
+    in
+      d + b
+\stopbuffer
+
+\transexample{Let flattening}{from}{to}
+
+\subsection{Empty let removal}
+This transformation is simple: It removes recursive lets that have no bindings
+(which usually occurs when let derecursification removes the last binding from
+it).
+
+\starttrans
+letrec in M
+--------------
+M
+\stoptrans
+
+\subsection{Simple let binding removal}
+This transformation inlines simple let bindings (\eg a = b).
+
+This transformation is not needed to get into normal form, but makes the
+resulting \small{VHDL} a lot shorter.
+
+\starttrans
+letnonrec
+  a = b
+in
+  M
+-----------------
+M[b/a]
+\stoptrans
+
+\starttrans
+letrec
+  \vdots
+  a = b
+  \vdots
+in
+  M
+-----------------
+let
+  \vdots [b/a]
+  \vdots [b/a]
+in
+  M[b/a]
+\stoptrans
+
+\subsection{Unused let binding removal}
+This transformation removes let bindings that are never used. Usually,
+the desugarer introduces some unused let bindings.
+
+This normalization pass should really be unneeded to get into normal form
+(since ununsed bindings are not forbidden by the normal form), but in practice
+the desugarer or simplifier emits some unused bindings that cannot be
+normalized (e.g., calls to a \type{PatError} (TODO: Check this name)). Also,
+this transformation makes the resulting \small{VHDL} a lot shorter.
+
+\starttrans
+let a = E in M
+----------------------------    \lam{a} does not occur free in \lam{M}
+M
+\stoptrans
+
+\starttrans
+letrec
+  \vdots
+  a = E
+  \vdots
+in
+  M
+----------------------------    \lam{a} does not occur free in \lam{M}
+letrec
+  \vdots
+  \vdots
+in
+  M
+\stoptrans
+
+\subsection{Non-representable binding inlining}
+This transform inlines let bindings that have a non-representable type. Since
+we can never generate a signal assignment for these bindings (we cannot
+declare a signal assignment with a non-representable type, for obvious
+reasons), we have no choice but to inline the binding to remove it.
+
+If the binding is non-representable because it is a lambda abstraction, it is
+likely that it will inlined into an application and Î²-reduction will remove
+the lambda abstraction and turn it into a representable expression at the
+inline site. The same holds for partial applications, which can be turned into
+full applications by inlining.
+
+Other cases of non-representable bindings we see in practice are primitive
+Haskell types. In most cases, these will not result in a valid normalized
+output, but then the input would have been invalid to start with. There is one
+exception to this: When a builtin function is applied to a non-representable
+expression, things might work out in some cases. For example, when you write a
+literal \hs{SizedInt} in Haskell, like \hs{1 :: SizedInt D8}, this results in
+the following core: \lam{fromInteger (smallInteger 10)}, where for example
+\lam{10 :: GHC.Prim.Int\#} and \lam{smallInteger 10 :: Integer} have
+non-representable types. TODO: This/these paragraph(s) should probably become a
+separate discussion somewhere else.
+
+\starttrans
+letnonrec a = E in M
+--------------------------    \lam{E} has a non-representable type.
+M[E/a]
+\stoptrans
+
+\starttrans
+letrec 
+  \vdots
+  a = E
+  \vdots
+in
+  M
+--------------------------    \lam{E} has a non-representable type.
+letrec
+  \vdots [E/a]
+  \vdots [E/a]
+in
+  M[E/a]
+\stoptrans
 
 \startbuffer[from]
-let a = (case x of 
-           True -> id
-           False -> neg
-        ) 1
-    b = (let y = 3 in add y) 2
+letrec
+  a = smallInteger 10
+  inc = Î»a -> add a 1
+  inc' = add 1
+  x = fromInteger a 
+in
+  inc (inc' x)
+\stopbuffer
+
+\startbuffer[to]
+letrec
+  x = fromInteger (smallInteger 10)
+in
+  (Î»a -> add a 1) (add 1 x)
+\stopbuffer
+
+\transexample{Let flattening}{from}{to}
+
+\subsection{Compiler generated top level binding inlining}
+TODO
+
+\subsection{Scrutinee simplification}
+This transform ensures that the scrutinee of a case expression is always
+a simple variable reference.
+
+\starttrans
+case E of
+  alts
+-----------------        \lam{E} is not a local variable reference
+let x = E in 
+  case E of
+    alts
+\stoptrans
+
+\startbuffer[from]
+case (foo a) of
+  True -> a
+  False -> b
+\stopbuffer
+
+\startbuffer[to]
+let x = foo a in
+  case x of
+    True -> a
+    False -> b
+\stopbuffer
+
+\transexample{Let flattening}{from}{to}
+
+
+\subsection{Case simplification}
+This transformation ensures that all case expressions become normal form. This
+means they will become one of:
+\startitemize
+\item An extractor case with a single alternative that picks a single field
+from a datatype, \eg \lam{case x of (a, b) -> a}.
+\item A selector case with multiple alternatives and only wild binders, that
+makes a choice between expressions based on the constructor of another
+expression, \eg \lam{case x of Low -> a; High -> b}.
+\stopitemize
+
+\starttrans
+case E of
+  C0 v0,0 ... v0,m -> E0
+  \vdots
+  Cn vn,0 ... vn,m -> En
+--------------------------------------------------- \forall i \forall j, 0 <= i <= n, 0 <= i < m (\lam{wi,j} is a wild (unused) binder)
+letnonrec
+  v0,0 = case x of C0 v0,0 .. v0,m -> v0,0
+  \vdots
+  v0,m = case x of C0 v0,0 .. v0,m -> v0,m
+  x0 = E0
+  \dots
+  vn,m = case x of Cn vn,0 .. vn,m -> vn,m
+  xn = En
+in
+  case E of
+    C0 w0,0 ... w0,m -> x0
+    \vdots
+    Cn wn,0 ... wn,m -> xn
+\stoptrans
+
+TODO: This transformation specified like this is complicated and misses
+conditions to prevent looping with itself. Perhaps we should split it here for
+discussion?
+
+\startbuffer[from]
+case a of
+  True -> add b 1
+  False -> add b 2
+\stopbuffer
+
+\startbuffer[to]
+letnonrec
+  x0 = add b 1
+  x1 = add b 2
 in
-  (Î»z.add 1 z) 3
+  case a of
+    True -> x0
+    False -> x1
 \stopbuffer
 
+\transexample{Selector case simplification}{from}{to}
+
+\startbuffer[from]
+case a of
+  (,) b c -> add b c
+\stopbuffer
 \startbuffer[to]
-let a = case x of 
-           True -> id 1
-           False -> neg 1
-    b = let y = 3 in add y 2
+letnonrec
+  b = case a of (,) b c -> b
+  c = case a of (,) b c -> c
+  x0 = add b c
 in
-  add 1 3
+  case a of
+    (,) w0 w1 -> x0
 \stopbuffer
 
-\transexample{Extended Î²-reduction}{from}{to}
+\transexample{Extractor case simplification}{from}{to}
+
+\subsection{Case removal}
+This transform removes any case statements with a single alternative and
+only wild binders.
+
+These "useless" case statements are usually leftovers from case simplification
+on extractor case (see the previous example).
+
+\starttrans
+case x of
+  C v0 ... vm -> E
+----------------------     \lam{\forall i, 0 <= i <= m} (\lam{vi} does not occur free in E)
+E
+\stoptrans
+
+\startbuffer[from]
+case a of
+  (,) w0 w1 -> x0
+\stopbuffer
+
+\startbuffer[to]
+x0
+\stopbuffer
+
+\transexample{Case removal}{from}{to}
 
 \subsection{Argument simplification}
 The transforms in this section deal with simplifying application
@@ -257,8 +1208,9 @@ arguments into normal form. The goal here is to:
 \startitemize
  \item Make all arguments of user-defined functions (\eg, of which
  we have a function body) simple variable references of a runtime
- representable type.
- \item Make all arguments of builtin functions either:
+ representable type. This is needed, since these applications will be turned
+ into component instantiations.
+ \item Make all arguments of builtin functions one of:
    \startitemize
     \item A type argument.
     \item A dictionary argument.
@@ -271,7 +1223,7 @@ arguments into normal form. The goal here is to:
 When looking at the arguments of a user-defined function, we can
 divide them into two categories:
 \startitemize
-  \item Arguments with a runtime representable type (\eg bits or vectors).
+  \item Arguments of a runtime representable type (\eg bits or vectors).
 
         These arguments can be preserved in the program, since they can
         be translated to input ports later on.  However, since we can
@@ -283,16 +1235,31 @@ divide them into two categories:
         
         These arguments cannot be preserved in the program, since we
         cannot represent them as input or output ports in the resulting
-        VHDL. To remove them, we create a specialized version of the
+        \small{VHDL}. To remove them, we create a specialized version of the
         called function with these arguments filled in. This is done by
         the argument propagation transform.
+
+        Typically, these arguments are type and dictionary arguments that are
+        used to make functions polymorphic. By propagating these arguments, we
+        are essentially doing the same which GHC does when it specializes
+        functions: Creating multiple variants of the same function, one for
+        each type for which it is used. Other common non-representable
+        arguments are functions, e.g. when calling a higher order function
+        with another function or a lambda abstraction as an argument.
+
+        The reason for doing this is similar to the reasoning provided for
+        the inlining of non-representable let bindings above. In fact, this
+        argument propagation could be viewed as a form of cross-function
+        inlining.
 \stopitemize
 
+TODO: Check the following itemization.
+
 When looking at the arguments of a builtin function, we can divide them
 into categories: 
 
 \startitemize
-  \item Arguments with a runtime representable type.
+  \item Arguments of a runtime representable type.
         
         As we have seen with user-defined functions, these arguments can
         always be reduced to a simple variable reference, by the
@@ -301,7 +1268,7 @@ into categories:
         functions can be limited to signal references, instead of
         needing to support all possible expressions.
 
-  \item Arguments with a function type.
+  \item Arguments of a function type.
         
         These arguments are functions passed to higher order builtins,
         like \lam{map} and \lam{foldl}. Since implementing these
@@ -370,9 +1337,10 @@ into categories:
         categories instead.
 \stopitemize
 
-\subsubsection{Argument extraction}
+\subsubsection{Argument simplification}
 This transform deals with arguments to functions that
-are of a runtime representable type. 
+are of a runtime representable type. It ensures that they will all become
+references to global variables, or local signals in the resulting \small{VHDL}. 
 
 TODO: It seems we can map an expression to a port, not only a signal.
 Perhaps this makes this transformation not needed?
@@ -385,16 +1353,21 @@ a new let expression around the application, which binds the complex
 expression to a new variable. The original function is then applied to
 this variable.
 
-%\transform{Argument extract}
-%{
-%\lam{Y} is of a hardware representable type
-%
-%\lam{Y} is not a variable referene
-%
-%\conclusion
-%
-%\trans{X Y}{let z = Y in X z}
-%}
+\starttrans
+M N
+--------------------    \lam{N} is of a representable type
+let x = N in M x        \lam{N} is not a local variable reference
+\stoptrans
+
+\startbuffer[from]
+add (add a 1) 1
+\stopbuffer
+
+\startbuffer[to]
+let x = add a 1 in add x 1
+\stopbuffer
+
+\transexample{Argument extraction}{from}{to}
 
 \subsubsection{Function extraction}
 This transform deals with function-typed arguments to builtin functions.
@@ -406,24 +1379,36 @@ parameters to the new global function. The original argument is replaced
 with a reference to the new function, applied to any free variables from
 the original argument.
 
-%\transform{Function extraction}
-%{
-%\lam{X} is a (partial application of) a builtin function
-%
-%\lam{Y} is not an application
-%
-%\lam{Y} is not a variable reference
-%
-%\conclusion
-%
-%\lam{f0 ... fm} = free local vars of \lam{Y}
-%
-%\lam{y} is a new global variable
-%
-%\lam{y = Î»f0 ... fn.Y}
-%
-%\trans{X Y}{X (y f0 ... fn)}
-%}
+This transformation is useful when applying higher order builtin functions
+like \hs{map} to a lambda abstraction, for example. In this case, the code
+that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
+partial applications, not any other expression (such as lambda abstractions or
+even more complicated expressions).
+
+\starttrans
+M N                     \lam{M} is a (partial aplication of a) builtin function.
+---------------------   \lam{f0 ... fn} = free local variables of \lam{N}
+M x f0 ... fn           \lam{N :: a -> b}
+~                       \lam{N} is not a (partial application of) a top level function
+x = Î»f0 ... Î»fn.N
+\stoptrans
+
+\startbuffer[from]
+map (Î»a . add a b) xs
+
+map (add b) ys
+\stopbuffer
+
+\startbuffer[to]
+x0 = Î»b.Î»a.add a b
+~
+map x0 xs
+
+x1 = Î»b.add b
+map x1 ys
+\stopbuffer
+
+\transexample{Function extraction}{from}{to}
 
 \subsubsection{Argument propagation}
 This transform deals with arguments to user-defined functions that are
@@ -463,327 +1448,67 @@ cannot be brought into normal form by this transform. We rely on an
 inlining transformation to replace such a variable with an expression we
 can propagate again.
 
-TODO: Move these definitions somewhere sensible.
-
-Definition: A global variable is any variable that is bound at the
-top level of a program. A local variable is any other variable.
-
-Definition: A hardware representable type is a type that we can generate
-a signal for in hardware. For example, a bit, a vector of bits, a 32 bit
-unsigned word, etc. Types that are not runtime representable notably
-include (but are not limited to): Types, dictionaries, functions.
-
-Definition: A builtin function is a function for which a builtin
-hardware translation is available, because its actual definition is not
-translatable. A user-defined function is any other function.
-
 \starttrans
 x = E
 ~
 x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
 ---------------------------------------------    \lam{Yi} is not a local variable reference
-x' = Î»y0 ... yi-1 f0 ... fm yi+1 ... yn .        \lam{f0 ... fm} = free local vars of \lam{Y_i}
-      E y0 ... yi-1 Yi yi+1 ... yn   
+x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} = free local vars of \lam{Yi}
 ~
-x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn
-\stoptrans
-
-%\transform{Argument propagation}
-%{
-%\lam{x} is a global variable, bound to a user-defined function
-%
-%\lam{x = E}
-%
-%\lam{Y_i} is not of a runtime representable type
-%
-%\lam{Y_i} is not a local variable reference
-%
-%\conclusion
-%
-%\lam{f0 ... fm} = free local vars of \lam{Y_i}
-%
-%\lam{x'} is a new global variable
-%
-%\lam{x' = Î»y0 ... yi-1 f0 ... fm yi+1 ... yn . E y0 ... yi-1 Yi yi+1 ... yn}
-%
-%\trans{x Y0 ... Yi ... Yn}{x' y0 ... yi-1 f0 ... fm Yi+1 ... Yn}
-%}
-%
-%TODO: The above definition looks too complicated... Can we find
-%something more concise?
-
-\subsection{Cast propagation}
-This transform pushes casts down into the expression as far as possible.
-\subsection{Let recursification}
-This transform makes all lets recursive.
-\subsection{Let simplification}
-This transform makes the result value of all let expressions a simple
-variable reference.
-\subsection{Let flattening}
-This transform turns two nested lets (\lam{let x = (let ... in ...) in
-...}) into a single let.
-\subsection{Simple let binding removal}
-This transforms inlines simple let bindings (\eg a = b).
-\subsection{Function inlining}
-This transform inlines let bindings of a funtion type. TODO: This should
-be generelized to anything that is non representable at runtime, or
-something like that.
-\subsection{Scrutinee simplification}
-This transform ensures that the scrutinee of a case expression is always
-a simple variable reference.
-\subsection{Case binder wildening}
-This transform replaces all binders of a each case alternative with a
-wild binder (\ie, one that is never referred to). This will possibly
-introduce a number of new "selector" case statements, that only select
-one element from an algebraic datatype and bind it to a variable.
-\subsection{Case value simplification}
-This transform simplifies the result value of each case alternative by
-binding the value in a let expression and replacing the value by a
-simple variable reference.
-\subsection{Case removal}
-This transform removes any case statements with a single alternative and
-only wild binders.
-
-\subsection{Example sequence}
-
-This section lists an example expression, with a sequence of transforms
-applied to it. The exact transforms given here probably don't exactly
-match the transforms given above anymore, but perhaps this can clarify
-the big picture a bit.
-
-TODO: Update or remove this section.
-
-\startlambda
-  Î»x.
-    let s = foo x
-    in
-      case s of
-        (a, b) ->
-          case a of
-            High -> add
-            Low -> let
-              op' = case b of
-                High -> sub
-                Low  -> Î»c.Î»d.c
-              in
-                Î»c.Î»d.op' d c
-\stoplambda
-
-After top-level Î·-abstraction:
-
-\startlambda
-  Î»x.Î»c.Î»d.
-    (let s = foo x
-    in
-      case s of
-        (a, b) ->
-          case a of
-            High -> add
-            Low -> let
-              op' = case b of
-                High -> sub
-                Low  -> Î»c.Î»d.c
-              in
-                Î»c.Î»d.op' d c
-    ) c d
-\stoplambda
-
-After (extended) Î²-reduction:
-
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-    in
-      case s of
-        (a, b) ->
-          case a of
-            High -> add c d
-            Low -> let
-              op' = case b of
-                High -> sub
-                Low  -> Î»c.Î»d.c
-              in
-                op' d c
-\stoplambda
-
-After return value extraction:
-
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        r = case s of
-              (a, b) ->
-                case a of
-                  High -> add c d
-                  Low -> let
-                    op' = case b of
-                      High -> sub
-                      Low  -> Î»c.Î»d.c
-                    in
-                      op' d c
-    in
-      r
-\stoplambda
-
-Scrutinee simplification does not apply.
-
-After case binder wildening:
-
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) ->
-              case a of
-                High -> add c d
-                Low -> let op' = case b of
-                             High -> sub
-                             Low  -> Î»c.Î»d.c
-                       in
-                         op' d c
-    in
-      r
-\stoplambda
-
-After case value simplification
-
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = let rll = Î»c.Î»d.c
-                 op' = case b of
-                   High -> sub
-                   Low  -> rll
-             in
-               op' d c
-        r' = case a of
-               High -> rh
-               Low -> rl
-    in
-      r
-\stoplambda
-
-After let flattening:
-
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = op' d c
-        rll = Î»c.Î»d.c
-        op' = case b of
-          High -> sub
-          Low  -> rll
-        r' = case a of
-               High -> rh
-               Low -> rl
-    in
-      r
-\stoplambda
+x' = Î»y0 ... yi-1 f0 ... fm yi+1 ... yn .       
+      E y0 ... yi-1 Yi yi+1 ... yn   
 
-After function inlining:
+\stoptrans
 
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = (case b of
-          High -> sub
-          Low  -> Î»c.Î»d.c) d c
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
+TODO: Example
 
-After (extended) Î²-reduction again:
+\subsection{Cast propagation / simplification}
+This transform pushes casts down into the expression as far as possible. Since
+its exact role and need is not clear yet, this transformation is not yet
+specified.
 
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rl = case b of
-          High -> sub d c
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
+\subsection{Return value simplification}
+This transformation ensures that the return value of a function is always a
+simple local variable reference.
 
-After case value simplification again:
+Currently implemented using lambda simplification, let simplification, and
+top simplification. Should change into something like the following, which
+works only on the result of a function instead of any subexpression. This is
+achieved by the contexts, like \lam{x = E}, though this is strictly not
+correct (you could read this as "if there is any function \lam{x} that binds
+\lam{E}, any \lam{E} can be transformed, while we only mean the \lam{E} that
+is bound by \lam{x}. This might need some extra notes or something).
 
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = case s of (_, _) -> r'
-        rh = add c d
-        rlh = sub d c
-        rl = case b of
-          High -> rlh
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
+\starttrans
+x = E                            \lam{E} is representable
+~                                \lam{E} is not a lambda abstraction
+E                                \lam{E} is not a let expression
+---------------------------      \lam{E} is not a local variable reference
+let x = E in x
+\stoptrans
 
-After case removal:
+\starttrans
+x = Î»v0 ... Î»vn.E
+~                                \lam{E} is representable
+E                                \lam{E} is not a let expression
+---------------------------      \lam{E} is not a local variable reference
+let x = E in x
+\stoptrans
 
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        r = r'
-        rh = add c d
-        rlh = sub d c
-        rl = case b of
-          High -> rlh
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r
-\stoplambda
+\starttrans
+x = Î»v0 ... Î»vn.let ... in E
+~                                \lam{E} is representable
+E                                \lam{E} is not a local variable reference
+---------------------------
+let x = E in x
+\stoptrans
 
-After let bind removal:
+\startbuffer[from]
+x = add 1 2
+\stopbuffer
 
-\startlambda
-  Î»x.Î»c.Î»d.
-    let s = foo x
-        a = case s of (a, _) -> a
-        b = case s of (_, b) -> b
-        rh = add c d
-        rlh = sub d c
-        rl = case b of
-          High -> rlh
-          Low  -> d
-        r' = case a of
-          High -> rh
-          Low -> rl
-    in
-      r'
-\stoplambda
+\startbuffer[to]
+x = let x = add 1 2 in x
+\stopbuffer
 
-Application simplification is not applicable.
+\transexample{Return value simplification}{from}{to}