Add a section on binder uniqueness.

[matthijs/master-project/report.git] / Chapters / Normalization.tex
diff --git a/Chapters/Normalization.tex b/Chapters/Normalization.tex

index f64f7909aab0fbedf59e4d5c5a0c7d2777a5f927..a2ccfa4c9b6111f27acf6a23d486244d1d0c7b1b 100644 (file)
--- a/Chapters/Normalization.tex
+++ b/Chapters/Normalization.tex
@@ -35,15 +35,15 @@
  %%  \stopcombination
  %}
  
-The first step in the core to VHDL translation process, is normalization. We
+The first step in the core to \small{VHDL} translation process, is normalization. We
  aim to bring the core description into a simpler form, which we can
-subsequently translate into VHDL easily. This normal form is needed because
-the full core language is more expressive than VHDL in some areas and because
+subsequently translate into \small{VHDL} easily. This normal form is needed because
+the full core language is more expressive than \small{VHDL} in some areas and because
  core can describe expressions that do not have a direct hardware
  interpretation.
  
-TODO: Describe core properties not supported in VHDL, and describe how the
-VHDL we want to generate should look like.
+TODO: Describe core properties not supported in \small{VHDL}, and describe how the
+\small{VHDL} we want to generate should look like.
  
  \section{Normal form}
  The transformations described here have a well-defined goal: To bring the
@@ -73,10 +73,10 @@ describing the things we want to not have in a normal form.
    generate a hardware signal that contains a function, so all values,
    arguments and returns values used must be first order.
  
-  \item Any complex \emph{nested scopes} must be removed. In the VHDL
+  \item Any complex \emph{nested scopes} must be removed. In the \small{VHDL}
    description, every signal is in a single scope. Also, full expressions are
    not supported everywhere (in particular port maps can only map signal names,
-  not expressions). To make the VHDL generation easy, all values must be bound
+  not expressions). To make the \small{VHDL} generation easy, all values must be bound
    on the \quote{top level}.
  \stopitemize
  
@@ -215,7 +215,13 @@ alu = λopcode.λa.λb.
  
  As a more complete example, consider \in{example}[ex:NormalComplete]. This
  example contains everything that is supported in normal form, with the
-exception of builtin higher order functions.
+exception of builtin higher order functions. The graphical version of the
+architecture contains a slightly simplified version, since the state tuple
+packing and unpacking have been left out. Instead, two seperate registers are
+drawn. Also note that most synthesis tools will further optimize this
+architecture by removing the multiplexers at the register input and replace
+them with some logic in the clock inputs, but we want to show the architecture
+as close to the description as possible.
  
  \startbuffer[NormalComplete]
    regbank :: Bit 
@@ -364,7 +370,7 @@ the output port. Most function applications bound by the let expression
  define a component instantiation, where the input and output ports are mapped
  to local signals or arguments. Some of the others use a builtin
  construction (\eg the \lam{case} statement) or call a builtin function
-(\eg \lam{add} or \lam{sub}). For these, a hardcoded VHDL translation is
+(\eg \lam{add} or \lam{sub}). For these, a hardcoded \small{VHDL} translation is
  available.
  
  \subsection{Definitions}
@@ -459,6 +465,100 @@ this notation are still a bit fuzzy, so comments are welcom.
  TODO: Formally describe the "apply to every (sub)expression" in terms of
  rules with full transformations in the conditions.
  
+\subsection{Binder uniqueness}
+A common problem in transformation systems, is binder uniqueness. When not
+considering this problem, it is easy to create transformations that mix up
+bindings and cause name collisions. Take for example, the following core
+expression:
+
+\startlambda
+(λa.λb.λc. a * b * c) x c
+\stoplambda
+
+By applying β-reduction (see below) once, we can simplify this expression to:
+
+\startlambda
+(λb.λc. x * b * c) c
+\stoplambda
+
+Now, we have replaced the \lam{a} binder with a reference to the \lam{x}
+binder. No harm done here. But note that we see multiple occurences of the
+\lam{c} binder. The first is a binding occurence, to which the second refers.
+The last, however refers to \emph{another} instance of \lam{c}, which is
+bound somewhere outside of this expression. Now, if we would apply beta
+reduction without taking heed of binder uniqueness, we would get:
+
+\startlambda
+λc. x * c * c
+\stoplambda
+
+This is obviously not what was supposed to happen! The root of this problem is
+the reuse of binders: Identical binders can be bound in different scopes, such
+that only the inner one is \quote{visible} in the inner expression. In the example
+above, the \lam{c} binder was bound outside of the expression and in the inner
+lambda expression. Inside that lambda expression, only the inner \lam{c} is
+visible.
+
+There are a number of ways to solve this. \small{GHC} has isolated this
+problem to their binder substitution code, which performs \emph{deshadowing}
+during its expression traversal. This means that any binding that shadows
+another binding on a higher level is replaced by a new binder that does not
+shadow any other binding. This non-shadowing invariant is enough to prevent
+binder uniqueness problems in \small{GHC}.
+
+In our transformation system, maintaining this non-shadowing invariant is
+a bit harder to do (mostly due to implementation issues, the prototype doesn't
+use \small{GHC}'s subsitution code). Also, we can observe the following
+points.
+
+\startitemize
+\item Deshadowing does not guarantee overall uniqueness. For example, the
+following (slightly contrived) expression shows the identifier \lam{x} bound in
+two seperate places (and to different values), even though no shadowing
+occurs.
+
+\startlambda
+(let x = 1 in x) + (let x = 2 in x)
+\stoplambda
+
+\item In our normal form (and the resulting \small{VHDL}), all binders
+(signals) will end up in the same scope. To allow this, all binders within the
+same function should be unique.
+
+\item When we know that all binders in an expression are unique, moving around
+or removing a subexpression will never cause any binder conflicts. If we have
+some way to generate fresh binders, introducing new subexpressions will not
+cause any problems either. The only way to cause conflicts is thus to
+duplicate an existing subexpression.
+\stopitemize
+
+Given the above, our prototype maintains a unique binder invariant. This
+meanst that in any given moment during normalization, all binders \emph{within
+a single function} must be unique. To achieve this, we apply the following
+technique.
+
+TODO: Define fresh binders and unique supplies
+
+\startitemize
+\item Before starting normalization, all binders in the function are made
+unique. This is done by generating a fresh binder for every binder used. This
+also replaces binders that did not pose any conflict, but it does ensure that
+all binders within the function are generated by the same unique supply. See
+(TODO: ref fresh binder).
+\item Whenever a new binder must be generated, we generate a fresh binder that
+is guaranteed to be different from \emph{all binders generated so far}. This
+can thus never introduce duplication and will maintain the invariant.
+\item Whenever (part of) an expression is duplicated (for example when
+inlining), all binders in the expression are replaced with fresh binders
+(using the same method as at the start of normalization). These fresh binders
+can never introduce duplication, so this will maintain the invariant.
+\item Whenever we move part of an expression around within the function, there
+is no need to do anything special. There is obviously no way to introduce
+duplication by moving expressions around. Since we know that each of the
+binders is already unique, there is no way to introduce (incorrect) shadowing
+either.
+\stopitemize
+
  \subsection{η-abstraction}
  This transformation makes sure that all arguments of a function-typed
  expression are named, by introducing lambda expressions. When combined with
@@ -629,7 +729,7 @@ M
  This transformation inlines simple let bindings (\eg a = b).
  
  This transformation is not needed to get into normal form, but makes the
-resulting VHDL a lot shorter.
+resulting \small{VHDL} a lot shorter.
  
  \starttrans
  letnonrec
@@ -663,7 +763,7 @@ This normalization pass should really be unneeded to get into normal form
  (since ununsed bindings are not forbidden by the normal form), but in practice
  the desugarer or simplifier emits some unused bindings that cannot be
  normalized (e.g., calls to a \type{PatError} (TODO: Check this name)). Also,
-this transformation makes the resulting VHDL a lot shorter.
+this transformation makes the resulting \small{VHDL} a lot shorter.
  
  \starttrans
  let a = E in M
@@ -910,7 +1010,7 @@ divide them into two categories:
          
          These arguments cannot be preserved in the program, since we
          cannot represent them as input or output ports in the resulting
-        VHDL. To remove them, we create a specialized version of the
+        \small{VHDL}. To remove them, we create a specialized version of the
          called function with these arguments filled in. This is done by
          the argument propagation transform.
  
@@ -1015,7 +1115,7 @@ into categories:
  \subsubsection{Argument simplification}
  This transform deals with arguments to functions that
  are of a runtime representable type. It ensures that they will all become
-references to global variables, or local signals in the resulting VHDL. 
+references to global variables, or local signals in the resulting \small{VHDL}. 
  
  TODO: It seems we can map an expression to a port, not only a signal.
  Perhaps this makes this transformation not needed?
@@ -1056,7 +1156,7 @@ the original argument.
  
  This transformation is useful when applying higher order builtin functions
  like \hs{map} to a lambda abstraction, for example. In this case, the code
-that generates VHDL for \hs{map} only needs to handle top level functions and
+that generates \small{VHDL} for \hs{map} only needs to handle top level functions and
  partial applications, not any other expression (such as lambda abstractions or
  even more complicated expressions).