Add note about β-reduction for type variables.

[matthijs/master-project/report.git] / Chapters / Normalization.tex
diff --git a/Chapters/Normalization.tex b/Chapters/Normalization.tex

index ed55016047d2f4c8e33cd1bddb89ef4d8962c080..cdd57bb9c52da6a0d885ae2cadf02d114f76f548 100644 (file)
--- a/Chapters/Normalization.tex
+++ b/Chapters/Normalization.tex
@@ -325,6 +325,7 @@
        Some clauses have an expression listed in parentheses. These are conditions
        that need to apply to the clause.
  
+      \defref{intended normal form definition}
        \todo{Fix indentation}
        \startlambda
        \italic{normal} = \italic{lambda}
@@ -800,6 +801,7 @@
         normal form.
  
        \subsubsection[sec:normalization:beta]{β-reduction}
+        \defref{beta-reduction}
          β-reduction is a well known transformation from lambda calculus, where it is
          the main reduction step. It reduces applications of lambda abstractions,
          removing both the lambda abstraction and the application.
@@ -810,6 +812,11 @@
          sure that most lambda abstractions will eventually be reducable by
          β-reduction.
  
+        Note that β-reduction also works on type lambda abstractions and type
+        applications as well. This means the substitution below also works on
+        type variables, in the case that the binder is a type variable and teh
+        expression applied to is a type.
+
          \starttrans
          (λx.E) M
          -----------------
@@ -827,6 +834,17 @@
  
          \transexample{beta}{β-reduction}{from}{to}
  
+        \startbuffer[from]
+        (λt.λa::t. a) @Int
+        \stopbuffer
+
+        \startbuffer[to]
+        (λa::Int. a)
+        \stopbuffer
+
+        \transexample{beta-type}{β-reduction for type abstractions}{from}{to}
+
+
        \subsubsection{Empty let removal}
          This transformation is simple: It removes recursive lets that have no bindings
          (which usually occurs when unused let binding removal removes the last
@@ -951,20 +969,22 @@
  
          \transexample{toplevelinline}{Top level binding inlining}{from}{to}
         
-        Example \in{ex:trans:toplevelinline} shows a typical application of
+        \in{Example}[ex:trans:toplevelinline] shows a typical application of
          the addition operator generated by \GHC. The type and dictionary
          arguments used here are described in
-        \in{section:prototype:polymorphism}.
-
-        Without this transformation, there would be a (+) entity in the
-        architecture which would just add its inputs. This generates a lot of
-        overhead in the VHDL, which is particularly annoying when browsing the
-        generated RTL schematic (especially since + is not allowed in VHDL
-        architecture names\footnote{Technically, it is allowed to use
-        non-alphanumerics when using extended identifiers, but it seems that
-        none of the tooling likes extended identifiers in filenames, so it
-        effectively doesn't work}, so the entity would be called
-        \quote{w7aA7f} or something similarly unreadable and autogenerated).
+        \in{Section}[section:prototype:polymorphism].
+
+        Without this transformation, there would be a \lam{(+)} entity
+        in the \VHDL which would just add its inputs. This generates a
+        lot of overhead in the \VHDL, which is particularly annoying
+        when browsing the generated RTL schematic (especially since most
+        non-alphanumerics, like all characters in \lam{(+)}, are not
+        allowed in \VHDL architecture names\footnote{Technically, it is
+        allowed to use non-alphanumerics when using extended
+        identifiers, but it seems that none of the tooling likes
+        extended identifiers in filenames, so it effectively doesn't
+        work.}, so the entity would be called \quote{w7aA7f} or
+        something similarly unreadable and autogenerated).
  
      \subsection{Program structure}
        These transformations are aimed at normalizing the overall structure
@@ -1380,7 +1400,7 @@
  
          \transexample{argextract}{Argument extraction}{from}{to}
        
-      \subsubsection{Function extraction}
+      \subsubsection[sec:normalization:funextract]{Function extraction}
          \todo{Move to section about builtin functions}
          This transform deals with function-typed arguments to builtin
          functions.  Since builtin functions cannot be specialized to remove
@@ -1428,64 +1448,7 @@
  
          Note that \lam{x0} and {x1} will still need normalization after this.
  
-      \subsubsection{Argument propagation}
-        \todo{Rename this section to specialization and move it into a
-        separate section}
-
-        This transform deals with arguments to user-defined functions that are
-        not representable at runtime. This means these arguments cannot be
-        preserved in the final form and most be {\em propagated}.
-
-        Propagation means to create a specialized version of the called
-        function, with the propagated argument already filled in. As a simple
-        example, in the following program:
-
-        \startlambda
-        f = λa.λb.a + b
-        inc = λa.f a 1
-        \stoplambda
-
-        We could {\em propagate} the constant argument 1, with the following
-        result:
-
-        \startlambda
-        f' = λa.a + 1
-        inc = λa.f' a
-        \stoplambda
-
-        Special care must be taken when the to-be-propagated expression has any
-        free variables. If this is the case, the original argument should not be
-        removed completely, but replaced by all the free variables of the
-        expression. In this way, the original expression can still be evaluated
-        inside the new function. Also, this brings us closer to our goal: All
-        these free variables will be simple variable references.
-
-        To prevent us from propagating the same argument over and over, a simple
-        local variable reference is not propagated (since is has exactly one
-        free variable, itself, we would only replace that argument with itself).
-
-        This shows that any free local variables that are not runtime representable
-        cannot be brought into normal form by this transform. We rely on an
-        inlining transformation to replace such a variable with an expression we
-        can propagate again.
-
-        \starttrans
-        x = E
-        ~
-        x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
-        ---------------------------------------------    \lam{Yi} is not a local variable reference
-        x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} are all free local vars of \lam{Yi}
-        ~
-        x' = λy0 ... λyi-1. λf0 ... λfm. λyi+1 ... λyn .       
-              E y0 ... yi-1 Yi yi+1 ... yn   
-        \stoptrans
-
-        \todo{Describe what the formal specification means}
-        \todo{Note that we don't change the sepcialised function body, only
-        wrap it}
-
-        \todo{Example}
-
+      \todo{Fill the gap left by moving argument propagation away}
  
      \subsection{Case normalisation}
        \subsubsection{Scrutinee simplification}
@@ -1629,20 +1592,58 @@
  
          \transexample{caserem}{Case removal}{from}{to}
  
-  \todo{Move these two sections somewhere? Perhaps not?}
-  \subsection{Removing polymorphism}
-    Reference type-specialization (== argument propagation)
+    \subsection{Removing unrepresentable values}
+      The transformations in this section are aimed at making all the
+      values used in our expression representable. There are two main
+      transformations that are applied to \emph{all} unrepresentable let
+      bindings and function arguments, but these are really meant to
+      address three different kinds of unrepresentable values:
+      Polymorphic values, higher order values and literals. Each of these
+      will be detailed below, followed by the actual transformations.
+
+      \subsubsection{Removing Polymorphism}
+        As noted in \in{section}[sec:prototype:polymporphism],
+        polymorphism is made explicit in Core through type and
+        dictionary arguments. To remove the polymorphism from a
+        function, we can simply specialize the polymorphic function for
+        the particular type applied to it. The same goes for dictionary
+        arguments. To remove polymorphism from let bound values, we
+        simply inline the let bindings that have a polymorphic type,
+        which should (eventually) make sure that the polymorphic
+        expression is applied to a type and/or dictionary, which can
+        \refdef{beta-reduction}
+        then be removed by β-reduction.
+
+        Since both type and dictionary arguments are not representable,
+        \refdef{representable}
+        the non-representable argument specialization and
+        non-representable let binding inlining transformations below
+        take care of exactly this.
  
-    Reference polymporphic binding inlining (== non-representable binding
-    inlining).
+        There is one case where polymorphism cannot be completely
+        removed: Builtin functions are still allowed to be polymorphic
+        (Since we have no function body that we could properly
+        specialize). However, the code that generates \VHDL for builtin
+        functions knows how to handle this, so this is not a problem.
  
-  \subsection{Defunctionalization}
-    These transformations remove most higher order expressions from our
-    program, making it completely first-order (the only exception here is for
-    arguments to builtin functions, since we can't specialize builtin
-    function. \todo{Talk more about this somewhere}
+      \subsubsection{Defunctionalization}
+        These transformations remove higher order expressions from our
+        program, making all values first-order.
  
-    Reference higher-order-specialization (== argument propagation)
+        \todo{Finish this section}
+        
+        There is one case where higher order values cannot be completely
+        removed: Builtin functions are still allowed to have higher
+        order arguments (Since we have no function body that we could
+        properly specialize). These are limited to (partial applications
+        of) top level functions, however, which is handled by the
+        top-level function extraction (see
+        \in{section}[sec:normalization:funextract]). However, the code
+        that generates \VHDL for builtin functions knows how to handle
+        these, so this is not a problem.
+
+      \subsubsection{Literals}
+        \todo{Fill this section}
  
        \subsubsection{Non-representable binding inlining}
          \todo{Move this section into a new section (together with
@@ -1670,7 +1671,9 @@
          non-representable types. \todo{Expand on this. This/these paragraph(s)
          should probably become a separate discussion somewhere else}
  
-        \todo{Can this duplicate work?}
+        \todo{Can this duplicate work? -- For polymorphism probably, for
+        higher order expressions only if they are inlined before they
+        are themselves normalized.}
  
          \starttrans
          letrec 
@@ -1711,8 +1714,70 @@
  
          \transexample{nonrepinline}{Nonrepresentable binding inlining}{from}{to}
  
+      \subsubsection{Argument propagation}
+        \todo{Rename this section to specialization}
  
-  \section{Provable properties}
+        This transform deals with arguments to user-defined functions that are
+        not representable at runtime. This means these arguments cannot be
+        preserved in the final form and most be {\em propagated}.
+
+        Propagation means to create a specialized version of the called
+        function, with the propagated argument already filled in. As a simple
+        example, in the following program:
+
+        \startlambda
+        f = λa.λb.a + b
+        inc = λa.f a 1
+        \stoplambda
+
+        We could {\em propagate} the constant argument 1, with the following
+        result:
+
+        \startlambda
+        f' = λa.a + 1
+        inc = λa.f' a
+        \stoplambda
+
+        Special care must be taken when the to-be-propagated expression has any
+        free variables. If this is the case, the original argument should not be
+        removed completely, but replaced by all the free variables of the
+        expression. In this way, the original expression can still be evaluated
+        inside the new function. Also, this brings us closer to our goal: All
+        these free variables will be simple variable references.
+
+        To prevent us from propagating the same argument over and over, a simple
+        local variable reference is not propagated (since is has exactly one
+        free variable, itself, we would only replace that argument with itself).
+
+        This shows that any free local variables that are not runtime representable
+        cannot be brought into normal form by this transform. We rely on an
+        inlining transformation to replace such a variable with an expression we
+        can propagate again.
+
+        \starttrans
+        x = E
+        ~
+        x Y0 ... Yi ... Yn                               \lam{Yi} is not of a runtime representable type
+        ---------------------------------------------    \lam{Yi} is not a local variable reference
+        x' y0 ... yi-1 f0 ...  fm Yi+1 ... Yn            \lam{f0 ... fm} are all free local vars of \lam{Yi}
+        ~
+        x' = λy0 ... λyi-1. λf0 ... λfm. λyi+1 ... λyn .       
+              E y0 ... yi-1 Yi yi+1 ... yn   
+        \stoptrans
+
+        \todo{Describe what the formal specification means}
+        \todo{Note that we don't change the sepcialised function body, only
+        wrap it}
+        \todo{This does not take care of updating the types of y0 ...
+        yn. The code uses the types of Y0 ... Yn for this, regardless of
+        whether the type arguments were properly propagated...}
+
+        \todo{Example}
+
+
+
+
+  \section[sec:normalization:properties]{Provable properties}
      When looking at the system of transformations outlined above, there are a
      number of questions that we can ask ourselves. The main question is of course:
      \quote{Does our system work as intended?}. We can split this question into a
@@ -1743,6 +1808,21 @@
      three: The translator would still function properly without it.
      \stopitemize
  
+    Unfortunately, the final transformation system has only been
+    developed in the final part of the research, leaving no more time
+    for verifying these properties. In fact, it is likely that the
+    current transformation system still violates some of these
+    properties in some cases and should be improved (or extra conditions
+    on the input hardware descriptions should be formulated).
+
+    This is most likely the case with the completeness and determinism
+    properties, perhaps als the termination property. The soundness
+    property probably holds, since it is easier to manually verify (each
+    transformation can be reviewed separately).
+
+    Even though no complete proofs have been made, some ideas for
+    possible proof strategies are shown below.
+
      \subsection{Graph representation}
        Before looking into how to prove these properties, we'll look at our
        transformation system from a graph perspective. The nodes of the graph are
@@ -1813,7 +1893,9 @@
        Note that the normal form of such a system consists of the set of nodes
        (expressions) without outgoing edges, since those are the expression to which
        no transformation applies anymore. We call this set of nodes the \emph{normal
-      set}.
+      set}. The set of nodes containing expressions in intended normal
+      form \refdef{intended normal form} is called the \emph{intended
+      normal set}.
  
        From such a graph, we can derive some properties easily:
        \startitemize[KR]
@@ -1823,7 +1905,10 @@
          \item A system is \emph{complete} if all of the nodes in the normal set have
          the intended normal form. The inverse (that all of the nodes outside of
          the normal set are \emph{not} in the intended normal form) is not
-        strictly required.
+        strictly required. In other words, our normal set must be a
+        subset of the intended normal form, but they do not need to be
+        the same set.
+        form.
          \item A system is deterministic if all paths starting at a particular
          node, which end in a node in the normal set, end at the same node.
        \stopitemize
@@ -1847,18 +1932,112 @@
  
      \todo{Add content to these sections}
      \subsection{Termination}
-      Approach: Counting.
-
-      Church-Rosser?
+      In general, proving termination of an arbitrary program is a very
+      hard problem. \todo{Ref about arbitrary termination} Fortunately,
+      we only have to prove termination for our specific transformation
+      system.
+
+      A common approach for these kinds of proofs is to associate a
+      measure with each possible expression in our system. If we can
+      show that each transformation strictly decreases this measure
+      (\ie, the expression transformed to has a lower measure than the
+      expression transformed from).  \todo{ref about measure-based
+      termination proofs / analysis}
+      
+      A good measure for a system consisting of just β-reduction would
+      be the number of lambda expressions in the expression. Since every
+      application of β-reduction removes a lambda abstraction (and there
+      is always a bounded number of lambda abstractions in every
+      expression) we can easily see that a transformation system with
+      just β-reduction will always terminate.
+
+      For our complete system, this measure would be fairly complex
+      (probably the sum of a lot of things). Since the (conditions on)
+      our transformations are pretty complex, we would need to include
+      both simple things like the number of let expressions as well as
+      more complex things like the number of case expressions that are
+      not yet in normal form.
+
+      No real attempt has been made at finding a suitable measure for
+      our system yet.
  
      \subsection{Soundness}
-      Needs formal definition of semantics.
-      Prove for each transformation seperately, implies soundness of the system.
-     
+      Soundness is a property that can be proven for each transformation
+      separately. Since our system only runs separate transformations
+      sequentially, if each of our transformations leaves the
+      \emph{meaning} of the expression unchanged, then the entire system
+      will of course leave the meaning unchanged and is thus
+      \emph{sound}.
+
+      The current prototype has only been verified in an ad-hoc fashion
+      by inspecting (the code for) each transformation. A more formal
+      verification would be more appropriate.
+
+      To be able to formally show that each transformation properly
+      preserves the meaning of every expression, we require an exact
+      definition of the \emph{meaning} of every expression, so we can
+      compare them. Currently there seems to be no formal definition of
+      the meaning or semantics of \GHC's core language, only informal
+      descriptions are available.
+
+      It should be possible to have a single formal definition of
+      meaning for Core for both normal Core compilation by \GHC and for
+      our compilation to \VHDL. The main difference seems to be that in
+      hardware every expression is always evaluated, while in software
+      it is only evaluated if needed, but it should be possible to
+      assign a meaning to core expressions that assumes neither.
+      
+      Since each of the transformations can be applied to any
+      subexpression as well, there is a constraint on our meaning
+      definition: The meaning of an expression should depend only on the
+      meaning of subexpressions, not on the expressions themselves. For
+      example, the meaning of the application in \lam{f (let x = 4 in
+      x)} should be the same as the meaning of the application in \lam{f
+      4}, since the argument subexpression has the same meaning (though
+      the actual expression is different).
+      
      \subsection{Completeness}
-      Show that any transformation applies to every Core expression that is not
-      in normal form. To prove: no transformation applies => in intended form.
-      Show the reverse: Not in intended form => transformation applies.
+      Proving completeness is probably not hard, but it could be a lot
+      of work. We have seen above that to prove completeness, we must
+      show that the normal set of our graph representation is a subset
+      of the intended normal set.
+
+      However, it is hard to systematically generate or reason about the
+      normal set, since it is defined as any nodes to which no
+      transformation applies. To determine this set, each transformation
+      must be considered and when a transformation is added, the entire
+      set should be re-evaluated. This means it is hard to show that
+      each node in the normal set is also in the intended normal set.
+      Reasoning about our intended normal set is easier, since we know
+      how to generate it from its definition. \refdef{intended normal
+      form definition}.
+
+      Fortunately, we can also prove the complement (which is
+      equivalent, since $A \subseteq B \Leftrightarrow \overline{B}
+      \subseteq \overline{A}$): Show that the set of nodes not in
+      intended normal form is a subset of the set of nodes not in normal
+      form. In other words, show that for every expression that is not
+      in intended normal form, that there is at least one transformation
+      that applies to it (since that means it is not in normal form
+      either and since $A \subseteq C \Leftrightarrow \forall x (x \in A
+      \rightarrow x \in C)$).
+
+      By systematically reviewing the entire Core language definition
+      along with the intended normal form definition (both of which have
+      a similar structure), it should be possible to identify all
+      possible (sets of) core expressions that are not in intended
+      normal form and identify a transformation that applies to it.
+      
+      This approach is especially useful for proving completeness of our
+      system, since if expressions exist to which none of the
+      transformations apply (\ie if the system is not yet complete), it
+      is immediately clear which expressions these are and adding
+      (or modifying) transformations to fix this should be relatively
+      easy.
+
+      As observed above, applying this approach is a lot of work, since
+      we need to check every (set of) transformation(s) separately.
+
+      \todo{Perhaps do a few steps of the proofs as proof-of-concept}
  
-    \subsection{Determinism}
-      How to prove this?
+% vim: set sw=2 sts=2 expandtab: