X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FNormalization.tex;h=37895b3a5543f6c189625da9b4645ae9d418bccc;hp=fbad0ea11f481bdf1b0c061ea92c0f0b2fea7895;hb=b01be2e1ed672b31da67a7a24228f8442d918f9a;hpb=bff5a598be513f497ad61d29b7c7584f94d2b993 diff --git a/Chapters/Normalization.tex b/Chapters/Normalization.tex index fbad0ea..37895b3 100644 --- a/Chapters/Normalization.tex +++ b/Chapters/Normalization.tex @@ -28,11 +28,7 @@ core can describe expressions that do not have a direct hardware interpretation. - \todo{Describe core properties not supported in \VHDL, and describe how the - \VHDL we want to generate should look like.} - \section{Normal form} - \todo{Refresh or refer to distinct hardware per application principle} The transformations described here have a well-defined goal: To bring the program in a well-defined form that is directly translatable to hardware, while fully preserving the semantics of the program. We refer to this form as @@ -328,41 +324,40 @@ \defref{intended normal form definition} \todo{Fix indentation} \startlambda - \italic{normal} = \italic{lambda} - \italic{lambda} = λvar.\italic{lambda} (representable(var)) + \italic{normal} := \italic{lambda} + \italic{lambda} := λvar.\italic{lambda} (representable(var)) | \italic{toplet} - \italic{toplet} = letrec [\italic{binding}...] in var (representable(var)) - \italic{binding} = var = \italic{rhs} (representable(rhs)) + \italic{toplet} := letrec [\italic{binding}...] in var (representable(var)) + \italic{binding} := var = \italic{rhs} (representable(rhs)) -- State packing and unpacking by coercion | var0 = var1 ▶ State ty (lvar(var1)) - | var0 = var1 ▶ ty (var1 :: State ty) (lvar(var1)) - \italic{rhs} = userapp + | var0 = var1 ▶ ty (var1 :: State ty ∧ lvar(var1)) + \italic{rhs} := userapp | builtinapp -- Extractor case | case var of C a0 ... an -> ai (lvar(var)) -- Selector case | case var of (lvar(var)) - DEFAULT -> var0 (lvar(var0)) - C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar)) - \italic{userapp} = \italic{userfunc} + [ DEFAULT -> var ] (lvar(var)) + C0 w0,0 ... w0,n -> var0 + \vdots + Cm wm,0 ... wm,n -> varm (\forall{}i \forall{}j, wi,j \neq vari, lvar(vari)) + \italic{userapp} := \italic{userfunc} | \italic{userapp} {userarg} - \italic{userfunc} = var (gvar(var)) - \italic{userarg} = var (lvar(var)) - \italic{builtinapp} = \italic{builtinfunc} + \italic{userfunc} := var (gvar(var)) + \italic{userarg} := var (lvar(var)) + \italic{builtinapp} := \italic{builtinfunc} | \italic{builtinapp} \italic{builtinarg} - \italic{builtinfunc} = var (bvar(var)) - \italic{builtinarg} = \italic{coreexpr} + \italic{builtinfunc} := var (bvar(var)) + \italic{builtinarg} := var (representable(var) ∧ lvar(var)) + | \italic{partapp} (partapp :: a -> b) + | \italic{coreexpr} (¬representable(coreexpr) ∧ ¬(coreexpr :: a -> b)) + \italic{partapp} := \italic{userapp} | \italic{builtinapp} \stoplambda - \todo{Limit builtinarg further} - \todo{There can still be other casts around (which the code can handle, e.g., ignore), which still need to be documented here} - \todo{Note about the selector case. It just supports Bit and Bool - currently, perhaps it should be generalized in the normal form? This is - no longer true, btw} - When looking at such a program from a hardware perspective, the top level lambda's define the input ports. The variable reference in the body of the recursive let expression is the output port. Most function @@ -630,9 +625,11 @@ In particular, we define no particular order of transformations. Since transformation order should not influence the resulting normal form, - \todo{This is not really true, but would like it to be...} this leaves - the implementation free to choose any application order that results in - an efficient implementation. + this leaves the implementation free to choose any application order that + results in an efficient implementation. Unfortunately this is not + entirely true for the current set of transformations. See + \in{section}[sec:normalization:non-determinism] for a discussion of this + problem. When applying a single transformation, we try to apply it to every (sub)expression in a function, not just the top level function body. This allows us to @@ -642,8 +639,6 @@ In the following sections, we will be using a number of functions and notations, which we will define here. - \todo{Define substitution (notation)} - \subsubsection{Concepts} A \emph{global variable} is any variable (binder) that is bound at the top level of a program, or an external module. A \emph{local variable} is any @@ -800,8 +795,34 @@ optimizations, but they are required to get our program into intended normal form. + \placeintermezzo{}{ + \defref{substitution notation} + \startframedtext[width=8cm,background=box,frame=no] + \startalignment[center] + {\tfa Substitution notation} + \stopalignment + \blank[medium] + + In some of the transformations in this chapter, we need to perform + substitution on an expression. Substitution means replacing every + occurence of some expression (usually a variable reference) with + another expression. + + There have been a lot of different notations used in literature for + specifying substitution. The notation that will be used in this report + is the following: + + \startlambda + E[A=>B] + \stoplambda + + This means expression \lam{E} with all occurences of \lam{A} replaced + with \lam{B}. + \stopframedtext + } + + \defref{beta-reduction} \subsubsection[sec:normalization:beta]{β-reduction} - \defref{beta-reduction} β-reduction is a well known transformation from lambda calculus, where it is the main reduction step. It reduces applications of lambda abstractions, removing both the lambda abstraction and the application. @@ -868,7 +889,8 @@ This transformation is not needed to get an expression into intended normal form (since these bindings are part of the intended normal form), but makes the resulting \small{VHDL} a lot shorter. - + + \refdef{substitution notation} \starttrans letrec a0 = E0 @@ -1085,7 +1107,7 @@ \transexample{apppropcase}{Application propagation for a case expression}{from}{to} - \subsubsection{Let recursification} + \subsubsection[sec:normalization:letrecurse]{Let recursification} This transformation makes all non-recursive lets recursive. In the end, we want a single recursive let in our normalized program, so all non-recursive lets can be converted. This also makes other @@ -1218,195 +1240,94 @@ \todo{More examples} - \subsection{Argument simplification} - The transforms in this section deal with simplifying application - arguments into normal form. The goal here is to: - - \todo{This section should only talk about representable arguments. Non - representable arguments are treated by specialization.} - - \startitemize - \item Make all arguments of user-defined functions (\eg, of which - we have a function body) simple variable references of a runtime - representable type. This is needed, since these applications will be turned - into component instantiations. - \item Make all arguments of builtin functions one of: - \startitemize - \item A type argument. - \item A dictionary argument. - \item A type level expression. - \item A variable reference of a runtime representable type. - \item A variable reference or partial application of a function type. - \stopitemize - \stopitemize - - When looking at the arguments of a user-defined function, we can - divide them into two categories: - \startitemize - \item Arguments of a runtime representable type (\eg bits or vectors). - - These arguments can be preserved in the program, since they can - be translated to input ports later on. However, since we can - only connect signals to input ports, these arguments must be - reduced to simple variables (for which signals will be - produced). This is taken care of by the argument extraction - transform. - \item Non-runtime representable typed arguments. \todo{Move this - bullet to specialization} - - These arguments cannot be preserved in the program, since we - cannot represent them as input or output ports in the resulting - \small{VHDL}. To remove them, we create a specialized version of the - called function with these arguments filled in. This is done by - the argument propagation transform. - - Typically, these arguments are type and dictionary arguments that are - used to make functions polymorphic. By propagating these arguments, we - are essentially doing the same which GHC does when it specializes - functions: Creating multiple variants of the same function, one for - each type for which it is used. Other common non-representable - arguments are functions, e.g. when calling a higher order function - with another function or a lambda abstraction as an argument. - - The reason for doing this is similar to the reasoning provided for - the inlining of non-representable let bindings above. In fact, this - argument propagation could be viewed as a form of cross-function - inlining. - \stopitemize - - \todo{Move this itemization into a new section about builtin functions} - When looking at the arguments of a builtin function, we can divide them - into categories: - - \startitemize - \item Arguments of a runtime representable type. - - As we have seen with user-defined functions, these arguments can - always be reduced to a simple variable reference, by the - argument extraction transform. Performing this transform for - builtin functions as well, means that the translation of builtin - functions can be limited to signal references, instead of - needing to support all possible expressions. - - \item Arguments of a function type. - - These arguments are functions passed to higher order builtins, - like \lam{map} and \lam{foldl}. Since implementing these - functions for arbitrary function-typed expressions (\eg, lambda - expressions) is rather comlex, we reduce these arguments to - (partial applications of) global functions. - - We can still support arbitrary expressions from the user code, - by creating a new global function containing that expression. - This way, we can simply replace the argument with a reference to - that new function. However, since the expression can contain any - number of free variables we also have to include partial - applications in our normal form. - - This category of arguments is handled by the function extraction - transform. - \item Other unrepresentable arguments. - - These arguments can take a few different forms: - \startdesc{Type arguments} - In the core language, type arguments can only take a single - form: A type wrapped in the Type constructor. Also, there is - nothing that can be done with type expressions, except for - applying functions to them, so we can simply leave type - arguments as they are. - \stopdesc - \startdesc{Dictionary arguments} - In the core language, dictionary arguments are used to find - operations operating on one of the type arguments (mostly for - finding class methods). Since we will not actually evaluatie - the function body for builtin functions and can generate - code for builtin functions by just looking at the type - arguments, these arguments can be ignored and left as they - are. - \stopdesc - \startdesc{Type level arguments} - Sometimes, we want to pass a value to a builtin function, but - we need to know the value at compile time. Additionally, the - value has an impact on the type of the function. This is - encoded using type-level values, where the actual value of the - argument is not important, but the type encodes some integer, - for example. Since the value is not important, the actual form - of the expression does not matter either and we can leave - these arguments as they are. - \stopdesc - \startdesc{Other arguments} - Technically, there is still a wide array of arguments that can - be passed, but does not fall into any of the above categories. - However, none of the supported builtin functions requires such - an argument. This leaves use with passing unsupported types to - a function, such as calling \lam{head} on a list of functions. - - In these cases, it would be impossible to generate hardware - for such a function call anyway, so we can ignore these - arguments. - - The only way to generate hardware for builtin functions with - arguments like these, is to expand the function call into an - equivalent core expression (\eg, expand map into a series of - function applications). But for now, we choose to simply not - support expressions like these. - \stopdesc - - From the above, we can conclude that we can simply ignore these - other unrepresentable arguments and focus on the first two - categories instead. + \subsection[sec:normalization:argsimpl]{Representable arguments simplification} + This section contains just a single transformation that deals with + representable arguments in applications. Non-representable arguments are + handled by the transformations in + \in{section}[sec:normalization:nonrep]. + + This transformation ensures that all representable arguments will become + references to local variables. This ensures they will become references + to local signals in the resulting \small{VHDL}, which is required due to + limitations in the component instantiation code in \VHDL (one can only + assign a signal or constant to an input port). By ensuring that all + arguments are always simple variable references, we always have a signal + available to map to the input ports. + + To reduce a complex expression to a simple variable reference, we create + a new let expression around the application, which binds the complex + expression to a new variable. The original function is then applied to + this variable. + + \refdef{global variable} + Note that references to \emph{global variables} (like a top level + function without arguments, but also an argumentless dataconstructors + like \lam{True}) are also simplified. Only local variables generate + signals in the resulting architecture. Even though argumentless + dataconstructors generate constants in generated \VHDL code and could be + mapped to an input port directly, they are still simplified to make the + normal form more regular. + + \refdef{representable} + \starttrans + M N + -------------------- \lam{N} is representable + letrec x = N in M x \lam{N} is not a local variable reference + \stoptrans + \refdef{local variable} + + \startbuffer[from] + add (add a 1) 1 + \stopbuffer + + \startbuffer[to] + letrec x = add a 1 in add x 1 + \stopbuffer + + \transexample{argsimpl}{Argument simplification}{from}{to} + + \subsection[sec:normalization:builtins]{Builtin functions} + This section deals with (arguments to) builtin functions. In the + intended normal form definition\refdef{intended normal form definition} + we can see that there are three sorts of arguments a builtin function + can receive. + + \startitemize[KR] + \item A representable local variable reference. This is the most + common argument to any function. The argument simplification + transformation described in \in{section}[sec:normalization:argsimpl] + makes sure that \emph{any} representable argument to \emph{any} + function (including builtin functions) is turned into a local variable + reference. + \item (A partial application of) a top level function (either builtin on + user-defined). The function extraction transformation described in + this section takes care of turning every functiontyped argument into + (a partial application of) a top level function. + \item Any expression that is not representable and does not have a + function type. Since these can be any expression, there is no + transformation needed. Note that this category is exactly all + expressions that are not transformed by the transformations for the + previous two categories. This means that \emph{any} core expression + that is used as an argument to a builtin function will be either + transformed into one of the above categories, or end up in this + categorie. In any case, the result is in normal form. \stopitemize - \subsubsection[sec:normalization:argsimpl]{Argument simplification} - This transform deals with arguments to functions that - are of a runtime representable type. It ensures that they will all become - references to global variables, or local signals in the resulting - \small{VHDL}, which is required due to limitations in the component - instantiation code in \VHDL (one can only assign a signal or constant - to an input port). By ensuring that all arguments are always simple - variable references, we always have a signal available to assign to - input ports. - - \todo{Say something about dataconstructors (without arguments, like True - or False), which are variable references of a runtime representable - type, but do not result in a signal.} - - To reduce a complex expression to a simple variable reference, we create - a new let expression around the application, which binds the complex - expression to a new variable. The original function is then applied to - this variable. - - Note that a reference to a \emph{global variable} (like a top level - function without arguments, but also an argumentless dataconstructors - like \lam{True}) is also simplified. Only local variables generate - signals in the resulting architecture. - - \refdef{representable} - \starttrans - M N - -------------------- \lam{N} is representable - letrec x = N in M x \lam{N} is not a local variable reference - \stoptrans - \refdef{local variable} - - \startbuffer[from] - add (add a 1) 1 - \stopbuffer + As noted, the argument simplification will handle any representable + arguments to a builtin function. The following transformation is needed + to handle non-representable arguments with a function type, all other + non-representable arguments don't need any special handling. - \startbuffer[to] - letrec x = add a 1 in add x 1 - \stopbuffer - - \transexample{argextract}{Argument extraction}{from}{to} - \subsubsection[sec:normalization:funextract]{Function extraction} - \todo{Move to section about builtin functions} This transform deals with function-typed arguments to builtin - functions. Since builtin functions cannot be specialized to remove - the arguments, we choose to extract these arguments into a new global - function instead. This greatly simplifies the translation rules needed - for builtin functions. \todo{Should we talk about these? Reference - Christiaan?} + functions. + Since builtin functions cannot be specialized (see + \in{section}[sec:normalization:specialize]) to remove the arguments, + these arguments are extracted into a new global function instead. In + other words, we create a new top level function that has exactly the + extracted argument as its body. This greatly simplifies the + translation rules needed for builtin functions, since they only need + to handle (partial applications of) top level functions. Any free variables occuring in the extracted arguments will become parameters to the new global function. The original argument is replaced @@ -1427,27 +1348,20 @@ x = λf0 ... λfn.N \stoptrans - \todo{Split this example} \startbuffer[from] - map (λa . add a b) xs - - map (add b) ys + addList = λb.λxs.map (λa . add a b) xs \stopbuffer \startbuffer[to] - map (x0 b) xs - - map x1 ys + addList = λb.λxs.map (f b) xs ~ - x0 = λb.λa.add a b - x1 = λb.add b + f = λb.λa.add a b \stopbuffer \transexample{funextract}{Function extraction}{from}{to} - Note that \lam{x0} and {x1} will still need normalization after this. - - \todo{Fill the gap left by moving argument propagation away} + Note that the function \lam{f} will still need normalization after + this. \subsection{Case normalisation} \subsubsection{Scrutinee simplification} @@ -1476,7 +1390,7 @@ False -> b \stopbuffer - \transexample{letflat}{Let flattening}{from}{to} + \transexample{letflat}{Case normalisation}{from}{to} \subsubsection{Case simplification} @@ -1591,14 +1505,22 @@ \transexample{caserem}{Case removal}{from}{to} - \subsection{Removing unrepresentable values} + \subsection[sec:normalization:nonrep]{Removing unrepresentable values} The transformations in this section are aimed at making all the values used in our expression representable. There are two main transformations that are applied to \emph{all} unrepresentable let - bindings and function arguments, but these are really meant to - address three different kinds of unrepresentable values: - Polymorphic values, higher order values and literals. Each of these - will be detailed below, followed by the actual transformations. + bindings and function arguments. These are meant to address three + different kinds of unrepresentable values: Polymorphic values, higher + order values and literals. The transformation are described generically: + They apply to all non-representable values. However, non-representable + values that don't fall into one of these three categories will be moved + around by these transformations but are unlikely to completely + disappear. They usually mean the program was not valid in the first + place, because unsupported types were used (for example, a program using + strings). + + Each of these three categories will be detailed below, followed by the + actual transformations. \subsubsection{Removing Polymorphism} As noted in \in{section}[sec:prototype:polymporphism], @@ -1610,8 +1532,7 @@ simply inline the let bindings that have a polymorphic type, which should (eventually) make sure that the polymorphic expression is applied to a type and/or dictionary, which can - \refdef{beta-reduction} - then be removed by β-reduction. + then be removed by β-reduction (\in{section}[sec:normalization:beta]). Since both type and dictionary arguments are not representable, \refdef{representable} @@ -1713,7 +1634,7 @@ η-abstraction also introduces extra applications (the application of the let expression to \lam{q} in the above example). These applications can then propagated down by the application propagation - transformation (\in{section}[sec:normalization:approp]). In our + transformation (\in{section}[sec:normalization:appprop]). In our example, the \lam{q} and \lam{r} variable will be propagated into the let expression and then into the case expression: @@ -1726,7 +1647,7 @@ This propagation makes higher order values become applied (in particular both of the alternatives of the case now have a - representable type. Completely applied top level functions (like the + representable type). Completely applied top level functions (like the first alternative) are now no longer invalid (they fall under \in{item}[item:completeapp] above). (Completely) applied lambda abstractions can be removed by β-abstraction. For our example, @@ -1901,6 +1822,7 @@ solves (part of) the polymorphism, higher order values and unrepresentable literals in an expression. + \refdef{substitution notation} \starttrans letrec a0 = E0 @@ -1940,7 +1862,7 @@ \transexample{nonrepinline}{Nonrepresentable binding inlining}{from}{to} - \subsubsection{Function specialization} + \subsubsection[sec:normalization:specialize]{Function specialization} This transform removes arguments to user-defined functions that are not representable at runtime. This is done by creating a \emph{specialized} version of the function that only works for one @@ -2023,7 +1945,6 @@ \todo{Examples. Perhaps reference the previous sections} - \section{Unsolved problems} The above system of transformations has been implemented in the prototype and seems to work well to compile simple and more complex examples of @@ -2092,7 +2013,7 @@ let y = (a * b) in y + y \stoplambda - \subsection{Non-determinism} + \subsection[sec:normalization:non-determinism]{Non-determinism} As an example, again consider the following expression: \startlambda @@ -2139,6 +2060,37 @@ transformations will probably need updating to handle them in all cases. + \subsection{Normalization of stateful descriptions} + Currently, the intended normal form definition\refdef{intended + normal form definition} offers enough freedom to describe all + valid stateful descriptions, but is not limiting enough. It is + possible to write descriptions which are in intended normal + form, but cannot be translated into \VHDL in a meaningful way + (\eg, a function that swaps two substates in its result, or a + function that changes a substate itself instead of passing it to + a subfunction). + + It is now up to the programmer to not do anything funny with + these state values, whereas the normalization just tries not to + mess up the flow of state values. In practice, there are + situations where a Core program that \emph{could} be a valid + stateful description is not translateable by the prototype. This + most often happens when statefulness is mixed with pattern + matching, causing a state input to be unpacked multiple times or + be unpacked and repacked only in some of the code paths. + + \todo{example?} + + Without going into detail about the exact problems (of which + there are probably more than have shown up so far), it seems + unlikely that these problems can be solved entirely by just + improving the \VHDL state generation in the final stage. The + normalization stage seems the best place to apply the rewriting + needed to support more complex stateful descriptions. This does + of course mean that the intended normal form definition must be + extended as well to be more specific about how state handling + should look like in normal form. + \section[sec:normalization:properties]{Provable properties} When looking at the system of transformations outlined above, there are a number of questions that we can ask ourselves. The main question is of course: