X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FNormalization.tex;h=37895b3a5543f6c189625da9b4645ae9d418bccc;hp=fbad0ea11f481bdf1b0c061ea92c0f0b2fea7895;hb=b01be2e1ed672b31da67a7a24228f8442d918f9a;hpb=bff5a598be513f497ad61d29b7c7584f94d2b993

diff --git a/Chapters/Normalization.tex b/Chapters/Normalization.tex
index fbad0ea..37895b3 100644
--- a/Chapters/Normalization.tex
+++ b/Chapters/Normalization.tex
@@ -28,11 +28,7 @@
   core can describe expressions that do not have a direct hardware
   interpretation.
 
-  \todo{Describe core properties not supported in \VHDL, and describe how the
-  \VHDL we want to generate should look like.}
-
   \section{Normal form}
-    \todo{Refresh or refer to distinct hardware per application principle}
     The transformations described here have a well-defined goal: To bring the
     program in a well-defined form that is directly translatable to hardware,
     while fully preserving the semantics of the program. We refer to this form as
@@ -328,41 +324,40 @@
       \defref{intended normal form definition}
       \todo{Fix indentation}
       \startlambda
-      \italic{normal} = \italic{lambda}
-      \italic{lambda} = Î»var.\italic{lambda} (representable(var))
+      \italic{normal} := \italic{lambda}
+      \italic{lambda} := Î»var.\italic{lambda} (representable(var))
                       | \italic{toplet} 
-      \italic{toplet} = letrec [\italic{binding}...] in var (representable(var))
-      \italic{binding} = var = \italic{rhs} (representable(rhs))
+      \italic{toplet} := letrec [\italic{binding}...] in var (representable(var))
+      \italic{binding} := var = \italic{rhs} (representable(rhs))
                        -- State packing and unpacking by coercion
                        | var0 = var1 â¶ State ty (lvar(var1))
-                       | var0 = var1 â¶ ty (var1 :: State ty) (lvar(var1))
-      \italic{rhs} = userapp
+                       | var0 = var1 â¶ ty (var1 :: State ty â§ lvar(var1))
+      \italic{rhs} := userapp
                    | builtinapp
                    -- Extractor case
                    | case var of C a0 ... an -> ai (lvar(var))
                    -- Selector case
                    | case var of (lvar(var))
-                      DEFAULT -> var0 (lvar(var0))
-                      C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar))
-      \italic{userapp} = \italic{userfunc}
+                      [ DEFAULT -> var ]  (lvar(var))
+                      C0 w0,0 ... w0,n -> var0
+                      \vdots
+                      Cm wm,0 ... wm,n -> varm       (\forall{}i \forall{}j, wi,j \neq vari, lvar(vari))
+      \italic{userapp} := \italic{userfunc}
                        | \italic{userapp} {userarg}
-      \italic{userfunc} = var (gvar(var))
-      \italic{userarg} = var (lvar(var))
-      \italic{builtinapp} = \italic{builtinfunc}
+      \italic{userfunc} := var (gvar(var))
+      \italic{userarg} := var (lvar(var))
+      \italic{builtinapp} := \italic{builtinfunc}
                           | \italic{builtinapp} \italic{builtinarg}
-      \italic{builtinfunc} = var (bvar(var))
-      \italic{builtinarg} = \italic{coreexpr}
+      \italic{builtinfunc} := var (bvar(var))
+      \italic{builtinarg} := var (representable(var) â§ lvar(var))
+                          | \italic{partapp} (partapp :: a -> b)
+                          | \italic{coreexpr} (Â¬representable(coreexpr) â§ Â¬(coreexpr :: a -> b))
+      \italic{partapp} := \italic{userapp} | \italic{builtinapp}
       \stoplambda
 
-      \todo{Limit builtinarg further}
-
       \todo{There can still be other casts around (which the code can handle,
       e.g., ignore), which still need to be documented here}
 
-      \todo{Note about the selector case. It just supports Bit and Bool
-      currently, perhaps it should be generalized in the normal form? This is
-      no longer true, btw}
-
       When looking at such a program from a hardware perspective, the top level
       lambda's define the input ports. The variable reference in the body of
       the recursive let expression is the output port. Most function
@@ -630,9 +625,11 @@
 
       In particular, we define no particular order of transformations. Since
       transformation order should not influence the resulting normal form,
-      \todo{This is not really true, but would like it to be...} this leaves
-      the implementation free to choose any application order that results in
-      an efficient implementation.
+      this leaves the implementation free to choose any application order that
+      results in an efficient implementation. Unfortunately this is not
+      entirely true for the current set of transformations. See
+      \in{section}[sec:normalization:non-determinism] for a discussion of this
+      problem.
 
       When applying a single transformation, we try to apply it to every (sub)expression
       in a function, not just the top level function body. This allows us to
@@ -642,8 +639,6 @@
       In the following sections, we will be using a number of functions and
       notations, which we will define here.
 
-      \todo{Define substitution (notation)}
-
       \subsubsection{Concepts}
         A \emph{global variable} is any variable (binder) that is bound at the
         top level of a program, or an external module. A \emph{local variable} is any
@@ -800,8 +795,34 @@
        optimizations, but they are required to get our program into intended
        normal form.
 
+        \placeintermezzo{}{
+          \defref{substitution notation}
+          \startframedtext[width=8cm,background=box,frame=no]
+          \startalignment[center]
+            {\tfa Substitution notation}
+          \stopalignment
+          \blank[medium]
+
+          In some of the transformations in this chapter, we need to perform
+          substitution on an expression. Substitution means replacing every
+          occurence of some expression (usually a variable reference) with
+          another expression.
+
+          There have been a lot of different notations used in literature for
+          specifying substitution. The notation that will be used in this report
+          is the following:
+
+          \startlambda
+            E[A=>B]
+          \stoplambda
+
+          This means expression \lam{E} with all occurences of \lam{A} replaced
+          with \lam{B}.
+          \stopframedtext
+        }
+
+      \defref{beta-reduction}
       \subsubsection[sec:normalization:beta]{Î²-reduction}
-        \defref{beta-reduction}
         Î²-reduction is a well known transformation from lambda calculus, where it is
         the main reduction step. It reduces applications of lambda abstractions,
         removing both the lambda abstraction and the application.
@@ -868,7 +889,8 @@
         This transformation is not needed to get an expression into intended
         normal form (since these bindings are part of the intended normal
         form), but makes the resulting \small{VHDL} a lot shorter.
-        
+       
+        \refdef{substitution notation}
         \starttrans
         letrec
           a0 = E0
@@ -1085,7 +1107,7 @@
 
         \transexample{apppropcase}{Application propagation for a case expression}{from}{to}
 
-      \subsubsection{Let recursification}
+      \subsubsection[sec:normalization:letrecurse]{Let recursification}
         This transformation makes all non-recursive lets recursive. In the
         end, we want a single recursive let in our normalized program, so all
         non-recursive lets can be converted. This also makes other
@@ -1218,195 +1240,94 @@
         
         \todo{More examples}
 
-    \subsection{Argument simplification}
-      The transforms in this section deal with simplifying application
-      arguments into normal form. The goal here is to:
-
-      \todo{This section should only talk about representable arguments. Non
-      representable arguments are treated by specialization.}
-
-      \startitemize
-       \item Make all arguments of user-defined functions (\eg, of which
-       we have a function body) simple variable references of a runtime
-       representable type. This is needed, since these applications will be turned
-       into component instantiations.
-       \item Make all arguments of builtin functions one of:
-         \startitemize
-          \item A type argument.
-          \item A dictionary argument.
-          \item A type level expression.
-          \item A variable reference of a runtime representable type.
-          \item A variable reference or partial application of a function type.
-         \stopitemize
-      \stopitemize
-
-      When looking at the arguments of a user-defined function, we can
-      divide them into two categories:
-      \startitemize
-        \item Arguments of a runtime representable type (\eg bits or vectors).
-
-              These arguments can be preserved in the program, since they can
-              be translated to input ports later on.  However, since we can
-              only connect signals to input ports, these arguments must be
-              reduced to simple variables (for which signals will be
-              produced). This is taken care of by the argument extraction
-              transform.
-        \item Non-runtime representable typed arguments. \todo{Move this
-        bullet to specialization}
-              
-              These arguments cannot be preserved in the program, since we
-              cannot represent them as input or output ports in the resulting
-              \small{VHDL}. To remove them, we create a specialized version of the
-              called function with these arguments filled in. This is done by
-              the argument propagation transform.
-
-              Typically, these arguments are type and dictionary arguments that are
-              used to make functions polymorphic. By propagating these arguments, we
-              are essentially doing the same which GHC does when it specializes
-              functions: Creating multiple variants of the same function, one for
-              each type for which it is used. Other common non-representable
-              arguments are functions, e.g. when calling a higher order function
-              with another function or a lambda abstraction as an argument.
-
-              The reason for doing this is similar to the reasoning provided for
-              the inlining of non-representable let bindings above. In fact, this
-              argument propagation could be viewed as a form of cross-function
-              inlining.
-      \stopitemize
-
-      \todo{Move this itemization into a new section about builtin functions}
-      When looking at the arguments of a builtin function, we can divide them
-      into categories: 
-
-      \startitemize
-        \item Arguments of a runtime representable type.
-              
-              As we have seen with user-defined functions, these arguments can
-              always be reduced to a simple variable reference, by the
-              argument extraction transform. Performing this transform for
-              builtin functions as well, means that the translation of builtin
-              functions can be limited to signal references, instead of
-              needing to support all possible expressions.
-
-        \item Arguments of a function type.
-              
-              These arguments are functions passed to higher order builtins,
-              like \lam{map} and \lam{foldl}. Since implementing these
-              functions for arbitrary function-typed expressions (\eg, lambda
-              expressions) is rather comlex, we reduce these arguments to
-              (partial applications of) global functions.
-              
-              We can still support arbitrary expressions from the user code,
-              by creating a new global function containing that expression.
-              This way, we can simply replace the argument with a reference to
-              that new function. However, since the expression can contain any
-              number of free variables we also have to include partial
-              applications in our normal form.
-
-              This category of arguments is handled by the function extraction
-              transform.
-        \item Other unrepresentable arguments.
-              
-              These arguments can take a few different forms:
-              \startdesc{Type arguments}
-                In the core language, type arguments can only take a single
-                form: A type wrapped in the Type constructor. Also, there is
-                nothing that can be done with type expressions, except for
-                applying functions to them, so we can simply leave type
-                arguments as they are.
-              \stopdesc
-              \startdesc{Dictionary arguments}
-                In the core language, dictionary arguments are used to find
-                operations operating on one of the type arguments (mostly for
-                finding class methods). Since we will not actually evaluatie
-                the function body for builtin functions and can generate
-                code for builtin functions by just looking at the type
-                arguments, these arguments can be ignored and left as they
-                are.
-              \stopdesc
-              \startdesc{Type level arguments}
-                Sometimes, we want to pass a value to a builtin function, but
-                we need to know the value at compile time. Additionally, the
-                value has an impact on the type of the function. This is
-                encoded using type-level values, where the actual value of the
-                argument is not important, but the type encodes some integer,
-                for example. Since the value is not important, the actual form
-                of the expression does not matter either and we can leave
-                these arguments as they are.
-              \stopdesc
-              \startdesc{Other arguments}
-                Technically, there is still a wide array of arguments that can
-                be passed, but does not fall into any of the above categories.
-                However, none of the supported builtin functions requires such
-                an argument. This leaves use with passing unsupported types to
-                a function, such as calling \lam{head} on a list of functions.
-
-                In these cases, it would be impossible to generate hardware
-                for such a function call anyway, so we can ignore these
-                arguments.
-
-                The only way to generate hardware for builtin functions with
-                arguments like these, is to expand the function call into an
-                equivalent core expression (\eg, expand map into a series of
-                function applications). But for now, we choose to simply not
-                support expressions like these.
-              \stopdesc
-
-              From the above, we can conclude that we can simply ignore these
-              other unrepresentable arguments and focus on the first two
-              categories instead.
+    \subsection[sec:normalization:argsimpl]{Representable arguments simplification}
+      This section contains just a single transformation that deals with
+      representable arguments in applications. Non-representable arguments are
+      handled by the transformations in
+      \in{section}[sec:normalization:nonrep]. 
+      
+      This transformation ensures that all representable arguments will become
+      references to local variables. This ensures they will become references
+      to local signals in the resulting \small{VHDL}, which is required due to
+      limitations in the component instantiation code in \VHDL (one can only
+      assign a signal or constant to an input port). By ensuring that all
+      arguments are always simple variable references, we always have a signal
+      available to map to the input ports.
+
+      To reduce a complex expression to a simple variable reference, we create
+      a new let expression around the application, which binds the complex
+      expression to a new variable. The original function is then applied to
+      this variable.
+
+      \refdef{global variable}
+      Note that references to \emph{global variables} (like a top level
+      function without arguments, but also an argumentless dataconstructors
+      like \lam{True}) are also simplified. Only local variables generate
+      signals in the resulting architecture. Even though argumentless
+      dataconstructors generate constants in generated \VHDL code and could be
+      mapped to an input port directly, they are still simplified to make the
+      normal form more regular.
+
+      \refdef{representable}
+      \starttrans
+      M N
+      --------------------    \lam{N} is representable
+      letrec x = N in M x     \lam{N} is not a local variable reference
+      \stoptrans
+      \refdef{local variable}
+
+      \startbuffer[from]
+      add (add a 1) 1
+      \stopbuffer
+
+      \startbuffer[to]
+      letrec x = add a 1 in add x 1
+      \stopbuffer
+
+      \transexample{argsimpl}{Argument simplification}{from}{to}
+
+    \subsection[sec:normalization:builtins]{Builtin functions}
+      This section deals with (arguments to) builtin functions.  In the
+      intended normal form definition\refdef{intended normal form definition}
+      we can see that there are three sorts of arguments a builtin function
+      can receive.
+      
+      \startitemize[KR]
+        \item A representable local variable reference. This is the most
+        common argument to any function. The argument simplification
+        transformation described in \in{section}[sec:normalization:argsimpl]
+        makes sure that \emph{any} representable argument to \emph{any}
+        function (including builtin functions) is turned into a local variable
+        reference.
+        \item (A partial application of) a top level function (either builtin on
+        user-defined). The function extraction transformation described in
+        this section takes care of turning every functiontyped argument into
+        (a partial application of) a top level function.
+        \item Any expression that is not representable and does not have a
+        function type. Since these can be any expression, there is no
+        transformation needed. Note that this category is exactly all
+        expressions that are not transformed by the transformations for the
+        previous two categories. This means that \emph{any} core expression
+        that is used as an argument to a builtin function will be either
+        transformed into one of the above categories, or end up in this
+        categorie. In any case, the result is in normal form.
       \stopitemize
 
-      \subsubsection[sec:normalization:argsimpl]{Argument simplification}
-        This transform deals with arguments to functions that
-        are of a runtime representable type. It ensures that they will all become
-        references to global variables, or local signals in the resulting
-        \small{VHDL}, which is required due to limitations in the component
-        instantiation code in \VHDL (one can only assign a signal or constant
-        to an input port). By ensuring that all arguments are always simple
-        variable references, we always have a signal available to assign to
-        input ports.
-
-        \todo{Say something about dataconstructors (without arguments, like True
-        or False), which are variable references of a runtime representable
-        type, but do not result in a signal.}
-
-        To reduce a complex expression to a simple variable reference, we create
-        a new let expression around the application, which binds the complex
-        expression to a new variable. The original function is then applied to
-        this variable.
-
-        Note that a reference to a \emph{global variable} (like a top level
-        function without arguments, but also an argumentless dataconstructors
-        like \lam{True}) is also simplified. Only local variables generate
-        signals in the resulting architecture. 
-
-        \refdef{representable}
-        \starttrans
-        M N
-        --------------------    \lam{N} is representable
-        letrec x = N in M x     \lam{N} is not a local variable reference
-        \stoptrans
-        \refdef{local variable}
-
-        \startbuffer[from]
-        add (add a 1) 1
-        \stopbuffer
+      As noted, the argument simplification will handle any representable
+      arguments to a builtin function. The following transformation is needed
+      to handle non-representable arguments with a function type, all other
+      non-representable arguments don't need any special handling.
 
-        \startbuffer[to]
-        letrec x = add a 1 in add x 1
-        \stopbuffer
-
-        \transexample{argextract}{Argument extraction}{from}{to}
-      
       \subsubsection[sec:normalization:funextract]{Function extraction}
-        \todo{Move to section about builtin functions}
         This transform deals with function-typed arguments to builtin
-        functions.  Since builtin functions cannot be specialized to remove
-        the arguments, we choose to extract these arguments into a new global
-        function instead. This greatly simplifies the translation rules needed
-        for builtin functions. \todo{Should we talk about these? Reference
-        Christiaan?}
+        functions. 
+        Since builtin functions cannot be specialized (see
+        \in{section}[sec:normalization:specialize]) to remove the arguments,
+        these arguments are extracted into a new global function instead. In
+        other words, we create a new top level function that has exactly the
+        extracted argument as its body. This greatly simplifies the
+        translation rules needed for builtin functions, since they only need
+        to handle (partial applications of) top level functions.
 
         Any free variables occuring in the extracted arguments will become
         parameters to the new global function. The original argument is replaced
@@ -1427,27 +1348,20 @@
         x = Î»f0 ... Î»fn.N
         \stoptrans
 
-        \todo{Split this example}
         \startbuffer[from]
-        map (Î»a . add a b) xs
-
-        map (add b) ys
+        addList = Î»b.Î»xs.map (Î»a . add a b) xs
         \stopbuffer
 
         \startbuffer[to]
-        map (x0 b) xs
-
-        map x1 ys
+        addList = Î»b.Î»xs.map (f b) xs
         ~
-        x0 = Î»b.Î»a.add a b
-        x1 = Î»b.add b
+        f = Î»b.Î»a.add a b
         \stopbuffer
 
         \transexample{funextract}{Function extraction}{from}{to}
 
-        Note that \lam{x0} and {x1} will still need normalization after this.
-
-      \todo{Fill the gap left by moving argument propagation away}
+        Note that the function \lam{f} will still need normalization after
+        this.
 
     \subsection{Case normalisation}
       \subsubsection{Scrutinee simplification}
@@ -1476,7 +1390,7 @@
             False -> b
         \stopbuffer
 
-        \transexample{letflat}{Let flattening}{from}{to}
+        \transexample{letflat}{Case normalisation}{from}{to}
 
 
       \subsubsection{Case simplification}
@@ -1591,14 +1505,22 @@
 
         \transexample{caserem}{Case removal}{from}{to}
 
-    \subsection{Removing unrepresentable values}
+    \subsection[sec:normalization:nonrep]{Removing unrepresentable values}
       The transformations in this section are aimed at making all the
       values used in our expression representable. There are two main
       transformations that are applied to \emph{all} unrepresentable let
-      bindings and function arguments, but these are really meant to
-      address three different kinds of unrepresentable values:
-      Polymorphic values, higher order values and literals. Each of these
-      will be detailed below, followed by the actual transformations.
+      bindings and function arguments. These are meant to address three
+      different kinds of unrepresentable values: Polymorphic values, higher
+      order values and literals. The transformation are described generically:
+      They apply to all non-representable values. However, non-representable
+      values that don't fall into one of these three categories will be moved
+      around by these transformations but are unlikely to completely
+      disappear. They usually mean the program was not valid in the first
+      place, because unsupported types were used (for example, a program using
+      strings).
+     
+      Each of these three categories will be detailed below, followed by the
+      actual transformations.
 
       \subsubsection{Removing Polymorphism}
         As noted in \in{section}[sec:prototype:polymporphism],
@@ -1610,8 +1532,7 @@
         simply inline the let bindings that have a polymorphic type,
         which should (eventually) make sure that the polymorphic
         expression is applied to a type and/or dictionary, which can
-        \refdef{beta-reduction}
-        then be removed by Î²-reduction.
+        then be removed by Î²-reduction (\in{section}[sec:normalization:beta]).
 
         Since both type and dictionary arguments are not representable,
         \refdef{representable}
@@ -1713,7 +1634,7 @@
         Î·-abstraction also introduces extra applications (the application of
         the let expression to \lam{q} in the above example). These
         applications can then propagated down by the application propagation
-        transformation (\in{section}[sec:normalization:approp]). In our
+        transformation (\in{section}[sec:normalization:appprop]). In our
         example, the \lam{q} and \lam{r} variable will be propagated into the
         let expression and then into the case expression:
 
@@ -1726,7 +1647,7 @@
         
         This propagation makes higher order values become applied (in
         particular both of the alternatives of the case now have a
-        representable type. Completely applied top level functions (like the
+        representable type). Completely applied top level functions (like the
         first alternative) are now no longer invalid (they fall under
         \in{item}[item:completeapp] above). (Completely) applied lambda
         abstractions can be removed by Î²-abstraction. For our example,
@@ -1901,6 +1822,7 @@
         solves (part of) the polymorphism, higher order values and
         unrepresentable literals in an expression.
 
+        \refdef{substitution notation}
         \starttrans
         letrec 
           a0 = E0
@@ -1940,7 +1862,7 @@
 
         \transexample{nonrepinline}{Nonrepresentable binding inlining}{from}{to}
 
-      \subsubsection{Function specialization}
+      \subsubsection[sec:normalization:specialize]{Function specialization}
         This transform removes arguments to user-defined functions that are
         not representable at runtime. This is done by creating a
         \emph{specialized} version of the function that only works for one
@@ -2023,7 +1945,6 @@
 
         \todo{Examples. Perhaps reference the previous sections}
 
-
   \section{Unsolved problems}
     The above system of transformations has been implemented in the prototype
     and seems to work well to compile simple and more complex examples of
@@ -2092,7 +2013,7 @@
         let y = (a * b) in y + y
         \stoplambda
 
-      \subsection{Non-determinism}
+      \subsection[sec:normalization:non-determinism]{Non-determinism}
         As an example, again consider the following expression:
 
         \startlambda
@@ -2139,6 +2060,37 @@
         transformations will probably need updating to handle them in all
         cases.
 
+      \subsection{Normalization of stateful descriptions}
+        Currently, the intended normal form definition\refdef{intended
+        normal form definition} offers enough freedom to describe all
+        valid stateful descriptions, but is not limiting enough. It is
+        possible to write descriptions which are in intended normal
+        form, but cannot be translated into \VHDL in a meaningful way
+        (\eg, a function that swaps two substates in its result, or a
+        function that changes a substate itself instead of passing it to
+        a subfunction).
+
+        It is now up to the programmer to not do anything funny with
+        these state values, whereas the normalization just tries not to
+        mess up the flow of state values. In practice, there are
+        situations where a Core program that \emph{could} be a valid
+        stateful description is not translateable by the prototype. This
+        most often happens when statefulness is mixed with pattern
+        matching, causing a state input to be unpacked multiple times or
+        be unpacked and repacked only in some of the code paths.
+
+        \todo{example?}
+
+        Without going into detail about the exact problems (of which
+        there are probably more than have shown up so far), it seems
+        unlikely that these problems can be solved entirely by just
+        improving the \VHDL state generation in the final stage. The
+        normalization stage seems the best place to apply the rewriting
+        needed to support more complex stateful descriptions. This does
+        of course mean that the intended normal form definition must be
+        extended as well to be more specific about how state handling
+        should look like in normal form.
+
   \section[sec:normalization:properties]{Provable properties}
     When looking at the system of transformations outlined above, there are a
     number of questions that we can ask ourselves. The main question is of course: