X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FNormalization.tex;h=77dd977c8e87f7175c5f16bdc375c2fa95717eda;hp=fd62fea830bb20103c8c653ef1ceeb5db745d2da;hb=28b745e6faf774843ecdfbce67bdd22b8b4fc550;hpb=58a5aa7ca5edc07ba1070f4f4ed384b42e36e8f3 diff --git a/Chapters/Normalization.tex b/Chapters/Normalization.tex index fd62fea..77dd977 100644 --- a/Chapters/Normalization.tex +++ b/Chapters/Normalization.tex @@ -21,12 +21,12 @@ \stopcombination } - The first step in the core to \small{VHDL} translation process, is normalization. We - aim to bring the core description into a simpler form, which we can + The first step in the Core to \small{VHDL} translation process, is normalization. We + aim to bring the Core description into a simpler form, which we can subsequently translate into \small{VHDL} easily. This normal form is needed because - the full core language is more expressive than \small{VHDL} in some + the full Core language is more expressive than \small{VHDL} in some areas (higher-order expressions, limited polymorphism using type - classes, etc.) and because core can describe expressions that do not + classes, etc.) and because Core can describe expressions that do not have a direct hardware interpretation. \section{Normal form} @@ -224,7 +224,8 @@ -> State (Word, Word) -> (State (Word, Word), Word) - -- All arguments are an inital lambda (address, data, packed state) + -- All arguments are an inital lambda + -- (address, data, packed state) regbank = λa.λd.λsp. -- There are nested let expressions at top level let @@ -320,7 +321,7 @@ Now we have some intuition for the normal form, we can describe how we want the normal form to look like in a slightly more formal manner. The EBNF-like description in \in{definition}[def:IntendedNormal] captures - most of the intended structure (and generates a subset of \GHC's core + most of the intended structure (and generates a subset of \GHC's Core format). There are two things missing from this definition: cast expressions are @@ -712,7 +713,7 @@ \subsection[sec:normalization:uniq]{Binder uniqueness} A common problem in transformation systems, is binder uniqueness. When not considering this problem, it is easy to create transformations that mix up - bindings and cause name collisions. Take for example, the following core + bindings and cause name collisions. Take for example, the following Core expression: \startlambda @@ -796,7 +797,6 @@ unique. This is done by generating a fresh binder for every binder used. This also replaces binders that did not cause any conflict, but it does ensure that all binders within the function are generated by the same unique supply. - \refdef{fresh binder} \item Whenever a new binder must be generated, we generate a fresh binder that is guaranteed to be different from \emph{all binders generated so far}. This can thus never introduce duplication and will maintain the invariant. @@ -820,41 +820,41 @@ \in{section}[sec:normalization:transformation]. \subsection{General cleanup} - These transformations are general cleanup transformations, that aim to - make expressions simpler. These transformations usually clean up the - mess left behind by other transformations or clean up expressions to - expose new transformation opportunities for other transformations. - - Most of these transformations are standard optimizations in other - compilers as well. However, in our compiler, most of these are not just - optimizations, but they are required to get our program into intended - normal form. - - \placeintermezzo{}{ - \defref{substitution notation} - \startframedtext[width=8cm,background=box,frame=no] - \startalignment[center] - {\tfa Substitution notation} - \stopalignment - \blank[medium] + \placeintermezzo{}{ + \defref{substitution notation} + \startframedtext[width=8cm,background=box,frame=no] + \startalignment[center] + {\tfa Substitution notation} + \stopalignment + \blank[medium] + + In some of the transformations in this chapter, we need to perform + substitution on an expression. Substitution means replacing every + occurence of some expression (usually a variable reference) with + another expression. + + There have been a lot of different notations used in literature for + specifying substitution. The notation that will be used in this report + is the following: - In some of the transformations in this chapter, we need to perform - substitution on an expression. Substitution means replacing every - occurence of some expression (usually a variable reference) with - another expression. + \startlambda + E[A=>B] + \stoplambda - There have been a lot of different notations used in literature for - specifying substitution. The notation that will be used in this report - is the following: + This means expression \lam{E} with all occurences of \lam{A} replaced + with \lam{B}. + \stopframedtext + } - \startlambda - E[A=>B] - \stoplambda + These transformations are general cleanup transformations, that aim to + make expressions simpler. These transformations usually clean up the + mess left behind by other transformations or clean up expressions to + expose new transformation opportunities for other transformations. - This means expression \lam{E} with all occurences of \lam{A} replaced - with \lam{B}. - \stopframedtext - } + Most of these transformations are standard optimizations in other + compilers as well. However, in our compiler, most of these are not just + optimizations, but they are required to get our program into intended + normal form. \subsubsection[sec:normalization:beta]{β-reduction} β-reduction is a well known transformation from lambda calculus, where it is @@ -1078,7 +1078,7 @@ \in{Example}[ex:trans:toplevelinline] shows a typical application of the addition operator generated by \GHC. The type and dictionary arguments used here are described in - \in{Section}[section:prototype:polymorphism]. + \in{Section}[sec:prototype:coretypes]. Without this transformation, there would be a \lam{(+)} entity in the \VHDL\ which would just add its inputs. This generates a @@ -1273,45 +1273,54 @@ This transformation ensures that the return value of a function is always a simple local variable reference. - This transformation only applies to the entire body of a - function instead of any subexpression in a function. This is - achieved by the contexts, like \lam{x = E}, though this is - strictly not correct (you could read this as "if there is any - function \lam{x} that binds \lam{E}, any \lam{E} can be - transformed, while we only mean the \lam{E} that is bound by - \lam{x}). - - Note that the return value is not simplified if its not - representable. Otherwise, this would cause a direct loop with - the inlining of unrepresentable bindings. If the return value is - not representable because it has a function type, η-expansion - should make sure that this transformation will eventually apply. - If the value is not representable for other reasons, the - function result itself is not representable, meaning this - function is not translatable anyway. - - \starttrans - x = E \lam{E} is representable - ~ \lam{E} is not a lambda abstraction - E \lam{E} is not a let expression - --------------------------- \lam{E} is not a local variable reference - letrec x = E in x - \stoptrans + The basic idea of this transformation is to take the body of a + function and bind it with a let expression (so the body of that let + expression becomes a variable reference that can be used as the output + port). If the body of the function happens to have lambda abstractions + at the top level (which is allowed by the intended normal + form\refdef{intended normal form definition}), we take the body of the + inner lambda instead. If that happens to be a let expression already + (which is allowed by the intended normal form), we take the body of + that let (which is not allowed to be anything but a variable reference + according the the intended normal form). + + This transformation uses the context conditions in a special way. + These contexts, like \lam{x = λv1 ... λvn.E}, are above the dotted + line and provide a condition on the environment (\ie\ they require a + certain top level binding to be present). These ensure that + expressions are only transformed when they are in the functions + \quote{return value} directly. This means the context conditions have + to interpreted in the right way: not \quote{if there is any function + \lam{x} that binds \lam{E}, any \lam{E} can be transformed}, but we + mean only the \lam{E} that is bound by \lam{x}). + + Be careful when reading the transformations: Not the entire function + from the context is transformed, just a part of it. + + Note that the return value is not simplified if it is not representable. + Otherwise, this would cause a loop with the inlining of + unrepresentable bindings in + \in{section}[sec:normalization:nonrepinline]. If the return value is + not representable because it has a function type, η-expansion should + make sure that this transformation will eventually apply. If the + value is not representable for other reasons, the function result + itself is not representable, meaning this function is not translatable + anyway. \starttrans - x = λv0 ... λvn.E \lam{E} is representable - ~ \lam{E} is not a lambda abstraction - E \lam{E} is not a let expression - --------------------------- \lam{E} is not a local variable reference - letrec x = E in x + x = λv1 ... λvn.E \lam{n} can be zero + ~ \lam{E} is representable + E \lam{E} is not a lambda abstraction + --------------------------- \lam{E} is not a let expression + letrec y = E in y \lam{E} is not a local variable reference \stoptrans \starttrans - x = λv0 ... λvn.let ... in E - ~ \lam{E} is representable - E \lam{E} is not a local variable reference - ----------------------------- - letrec x = E in x + x = λv1 ... λvn.letrec binds in E \lam{n} can be zero + ~ \lam{E} is representable + letrec binds in E \lam{E} is not a local variable reference + ------------------------------------ + letrec binds; y = E in y \stoptrans \startbuffer[from] @@ -1319,12 +1328,40 @@ \stopbuffer \startbuffer[to] - x = letrec x = add 1 2 in x + x = letrec y = add 1 2 in y \stopbuffer \transexample{retvalsimpl}{Return value simplification}{from}{to} + + \startbuffer[from] + x = λa. add 1 a + \stopbuffer + + \startbuffer[to] + x = λa. letrec + y = add 1 a + in + y + \stopbuffer + + \transexample{retvalsimpllam}{Return value simplification with a lambda abstraction}{from}{to} - \todo{More examples} + \startbuffer[from] + x = letrec + a = add 1 2 + in + add a 3 + \stopbuffer + + \startbuffer[to] + x = letrec + a = add 1 2 + y = add a 3 + in + y + \stopbuffer + + \transexample{retvalsimpllet}{Return value simplification with a let expression}{from}{to} \subsection[sec:normalization:argsimpl]{Representable arguments simplification} This section contains just a single transformation that deals with @@ -1393,7 +1430,7 @@ function type. Since these can be any expression, there is no transformation needed. Note that this category is exactly all expressions that are not transformed by the transformations for the - previous two categories. This means that \emph{any} core expression + previous two categories. This means that \emph{any} Core expression that is used as an argument to a built-in function will be either transformed into one of the above categories, or end up in this categorie. In any case, the result is in normal form. @@ -1549,10 +1586,12 @@ \startitemize \item An extractor case with a single alternative that picks a field - from a datatype, \eg\ \lam{case x of (a, b) -> a}. + from a datatype, \eg\ \lam{case x of (a, b) -> + a}.\defref{extractor case} \item A selector case with multiple alternatives and only wild binders, that makes a choice between expressions based on the constructor of another - expression, \eg\ \lam{case x of Low -> a; High -> b}. + expression, \eg\ \lam{case x of Low -> a; High -> + b}.\defref{selector case} \stopitemize For an arbitrary case, that has \lam{n} alternatives, with @@ -1680,7 +1719,7 @@ actual transformations. \subsubsection{Removing Polymorphism} - As noted in \in{section}[sec:prototype:polymporphism], + As noted in \in{section}[sec:prototype:coretypes], polymorphism is made explicit in Core through type and dictionary arguments. To remove the polymorphism from a function, we can simply specialize the polymorphic function for @@ -1971,7 +2010,7 @@ to specialize away any unrepresentable literals that are used as function arguments. The following two transformations do exactly this. - \subsubsection{Non-representable binding inlining} + \subsubsection[sec:normalization:nonrepinline]{Non-representable binding inlining} This transform inlines let bindings that are bound to a non-representable value. Since we can never generate a signal assignment for these bindings (we cannot declare a signal assignment @@ -2218,7 +2257,7 @@ there are probably expressions involving cast expressions that cannot be brought into intended normal form by this transformation system. - The uses of casts in the core system should be investigated more and + The uses of casts in the Core system should be investigated more and transformations will probably need updating to handle them in all cases. @@ -2376,7 +2415,7 @@ outgoing edges (meaning no transformation applies to it). The set of nodes without outgoing edges is called the \emph{normal set}. Similarly, the set of nodes containing expressions in intended normal form - \refdef{intended normal form} is called the \emph{intended normal set}. + \refdef{intended normal form definition} is called the \emph{intended normal set}. From such a graph, we can derive some properties easily: \startitemize[KR] @@ -2466,7 +2505,7 @@ our compilation to \VHDL. The main difference seems to be that in hardware every expression is always evaluated, while in software it is only evaluated if needed, but it should be possible to - assign a meaning to core expressions that assumes neither. + assign a meaning to Core expressions that assumes neither. Since each of the transformations can be applied to any subexpression as well, there is a constraint on our meaning @@ -2506,7 +2545,7 @@ By systematically reviewing the entire Core language definition along with the intended normal form definition (both of which have a similar structure), it should be possible to identify all - possible (sets of) core expressions that are not in intended + possible (sets of) Core expressions that are not in intended normal form and identify a transformation that applies to it. This approach is especially useful for proving completeness of our