X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Core2Core.tex;h=5df24873befeb7b28d0cf018dfb0924cc3efd85a;hp=384c84d970e67cc232c2c0fd48b0eb2e2c31efe4;hb=32d52cda513c45334ab256c9f42d41ed6938fc48;hpb=c5928a5f3125abb89a1a9d09d7762d0ac12f95a9 diff --git a/Core2Core.tex b/Core2Core.tex index 384c84d..5df2487 100644 --- a/Core2Core.tex +++ b/Core2Core.tex @@ -45,12 +45,24 @@ \stopframedtext } +% A shortcut for italicized e.g. and i.e. +\define[0]\eg{{\em e.g.}} +\define[0]\ie{{\em i.e.}} + +\definedescription + [desc] + [location=hanging,hang=20,width=broad] + %command=\hskip-1cm,margin=1cm] + % Install the lambda calculus pretty-printer, as defined in pret-lam.lua. \installprettytype [LAM] [LAM] \definetyping[lambda][option=LAM,style=sans] \definetype[lam][option=LAM,style=sans] +\installprettytype [TRANS] [TRANS] +\definetyping[trans][option=TRANS,style=normal,before=,after=] + % An (invisible) frame to hold a lambda expression \define[1]\lamframe{ % Put a frame around lambda expressions, so they can have multiple @@ -131,42 +143,31 @@ Matthijs Kooijman \section{Introduction} As a new approach to translating Core to VHDL, we investigate a number of -transformations on our Core program, which should bring the program into a -well-defined "canonical" form, which is subsequently trivial to translate to -VHDL. - -The transformations as presented here are far from complete, but are meant as -an exploration of possible transformations. In the running example below, we -apply each of the transformations exactly once, in sequence. As will be -apparent from the end result, there will be additional transformations needed -to fully reach our goal, and some transformations must be applied more than -once. How exactly to (efficiently) do this, has not been investigated. - -Lastly, I hope to be able to state a number of pre- and postconditions for -each transformation. If these can be proven for each transformation, and it -can be shown that there exists some ordering of transformations for which the -postcondition implies the canonical form, we can show that the transformations -do indeed transform any program (probably satisfying a number of -preconditions) to the canonical form. +transforms on our Core program, which should bring the program into a +well-defined {\em normal} form, which is subsequently trivial to +translate to VHDL. + +The transforms as presented here are far from complete, but are meant as +an exploration of possible transformations. \section{Goal} The transformations described here have a well-defined goal: To bring the program in a well-defined form that is directly translatable to hardware, while fully preserving the semantics of the program. -This {\em canonical form} is again a Core program, but with a very specific -structure. A function in canonical form has nested lambda's at the top, which +This {\em normal form} is again a Core program, but with a very specific +structure. A function in normal form has nested lambda's at the top, which produce a let expression. This let expression binds every function application -in the function and produces either a simple identifier or a tuple of -identifiers. Every bound value in the let expression is either a simple -function application or a case expression to extract a single element from a -tuple returned by a function. +in the function and produces a simple identifier. Every bound value in +the let expression is either a simple function application or a case +expression to extract a single element from a tuple returned by a +function. An example of a program in canonical form would be: -\starttyping +\startlambda -- All arguments are an inital lambda - \x c d -> + λx.λc.λd. -- There is one let expression at the top level let -- Calling some other user-defined function. @@ -187,29 +188,21 @@ An example of a program in canonical form would be: in -- The actual result r -\stoptyping - -In this and all following programs, the following definitions are assumed to -be available: - -\starttyping -data Bit = Low | High - -foo :: Int -> (Bit, Bit) -add :: Int -> Int -> Int -sub :: Int -> Int -> Int -\stoptyping +\stoplambda When looking at such a program from a hardware perspective, the top level -lambda's define the input ports. The value produced by the let expression are -the output ports. Each function application bound by the let expression -defines a component instantiation, where the input and output ports are mapped -to local signals or arguments. The tuple extracting case expressions don't map -to - -\subsection{Canonical form definition} -Formally, the canonical form is a core program obeying the following -constraints. +lambda's define the input ports. The value produced by the let expression is +the output port. Most function applications bound by the let expression +define a component instantiation, where the input and output ports are mapped +to local signals or arguments. Some of the others use a builtin +construction (\eg the \lam{case} statement) or call a builtin function +(\eg \lam{add} or \lam{sub}). For these, a hardcoded VHDL translation is +available. + +\subsection{Normal definition} +Formally, the normal form is a core program obeying the following +constraints. TODO: Update this section, this is probably not completely +accurate or relevant anymore. \startitemize[R,inmargin] \item All top level binds must have the form $\expr{\bind{fun}{lamexpr}}$. @@ -251,47 +244,27 @@ be any of $x_0$ to $x_n$. A case expression must have a $hardware\;type$. \item TODO: Many more! \stopitemize -\section{Transformation passes} +\section{Transform passes} -In this section we describe the actual transformations. Here we're using -mostly Core-like notation, with a few notable points. +In this section we describe the actual transforms. Here we're using +the core language in a notation that resembles lambda calculus. -\startitemize -\item A core expression (in contrast with a transformation function, for -example), is enclosed in pipes. For example, $\app{transform}{\expr{\lam{z}{z+1}}}$ -is the transform function applied to a lambda core expression. - -Note that this notation might not be consistently applied everywhere. In -particular where a non-core function is used inside a core expression, things -get slightly confusing. -\item A bind is written as $\expr{\bind{x}{expr}}$. This means binding the identifier -$x$ to the expression $expr$. -\item In the core examples, the layout rule from Haskell is loosely applied. -It should be evident what was meant from indentation, even though it might nog -be strictly correct. -\stopitemize +Each of these transforms is meant to be applied to every (sub)expression +in a program, for as long as it applies. Only when none of the +expressions can be applied anymore, the program is in normal form. We +hope to be able to prove that this form will obey all of the constraints +defined above, but this has yet to happen (though it seems likely that +it will). -\subsection{Example} -In the descriptions of transformations below, the following (slightly -contrived) example program will be used as the running example. Note that this -the example for the canonical form given above is the same program, but in -canonical form. +Each of the transforms will be described informally first, explaining +the need for and goal of the transform. Then, a formal definition is +given, using a familiar syntax from the world of logic. Each transform +is specified as a number of conditions (above the horizontal line) and a +number of conclusions (below the horizontal line). The details of using +this notation are still a bit fuzzy, so comments are welcom. -\starttyping - \x -> - let s = foo x - in - case s of - (a, b) -> - case a of - High -> add - Low -> let - op' = case b of - High -> sub - Low -> \c d -> c - in - \c d -> op' d c -\stoptyping +TODO: Formally describe the "apply to every (sub)expression" in terms of +rules with full transformations in the conditions. \subsection{η-abstraction} This transformation makes sure that all arguments of a function-typed @@ -299,20 +272,12 @@ expression are named, by introducing lambda expressions. When combined with β-reduction and function inlining below, all function-typed expressions should be lambda abstractions or global identifiers. -\transform{η-abstraction} -{ -\lam{E :: * -> *} - -\lam{E} is not the first argument of an application. - -\lam{E} is not a lambda abstraction. - -\lam{x} is a variable that does not occur free in E. - -\conclusion - -\trans{E}{λx.E x} -} +\starttrans +E \lam{E :: * -> *} +-------------- \lam{E} is not the first argument of an application. +λx.E x \lam{E} is not a lambda abstraction. + \lam{x} is a variable that does not occur free in \lam{E}. +\stoptrans \startbuffer[from] foo = λa -> case a of @@ -382,124 +347,308 @@ in \transexample{Extended β-reduction}{from}{to} -\subsection{Introducing main scope} -This transformation is meant to introduce a single let expression that will be -the "main scope". This is the let expression as described under requirement -\ref[letexpr]. This let expression will contain only a single binding, which -binds the original expression to some identifier and then evalutes to just -this identifier (to comply with requirement \in[retexpr]). +\subsection{Argument simplification} +The transforms in this section deal with simplifying application +arguments into normal form. The goal here is to: + +\startitemize + \item Make all arguments of user-defined functions (\eg, of which + we have a function body) simple variable references of a runtime + representable type. + \item Make all arguments of builtin functions either: + \startitemize + \item A type argument. + \item A dictionary argument. + \item A type level expression. + \item A variable reference of a runtime representable type. + \item A variable reference or partial application of a function type. + \stopitemize +\stopitemize + +When looking at the arguments of a user-defined function, we can +divide them into two categories: +\startitemize + \item Arguments with a runtime representable type (\eg bits or vectors). + + These arguments can be preserved in the program, since they can + be translated to input ports later on. However, since we can + only connect signals to input ports, these arguments must be + reduced to simple variables (for which signals will be + produced). This is taken care of by the argument extraction + transform. + \item Non-runtime representable typed arguments. + + These arguments cannot be preserved in the program, since we + cannot represent them as input or output ports in the resulting + VHDL. To remove them, we create a specialized version of the + called function with these arguments filled in. This is done by + the argument propagation transform. +\stopitemize + +When looking at the arguments of a builtin function, we can divide them +into categories: + +\startitemize + \item Arguments with a runtime representable type. + + As we have seen with user-defined functions, these arguments can + always be reduced to a simple variable reference, by the + argument extraction transform. Performing this transform for + builtin functions as well, means that the translation of builtin + functions can be limited to signal references, instead of + needing to support all possible expressions. + + \item Arguments with a function type. + + These arguments are functions passed to higher order builtins, + like \lam{map} and \lam{foldl}. Since implementing these + functions for arbitrary function-typed expressions (\eg, lambda + expressions) is rather comlex, we reduce these arguments to + (partial applications of) global functions. + + We can still support arbitrary expressions from the user code, + by creating a new global function containing that expression. + This way, we can simply replace the argument with a reference to + that new function. However, since the expression can contain any + number of free variables we also have to include partial + applications in our normal form. + + This category of arguments is handled by the function extraction + transform. + \item Other unrepresentable arguments. + + These arguments can take a few different forms: + \startdesc{Type arguments} + In the core language, type arguments can only take a single + form: A type wrapped in the Type constructor. Also, there is + nothing that can be done with type expressions, except for + applying functions to them, so we can simply leave type + arguments as they are. + \stopdesc + \startdesc{Dictionary arguments} + In the core language, dictionary arguments are used to find + operations operating on one of the type arguments (mostly for + finding class methods). Since we will not actually evaluatie + the function body for builtin functions and can generate + code for builtin functions by just looking at the type + arguments, these arguments can be ignored and left as they + are. + \stopdesc + \startdesc{Type level arguments} + Sometimes, we want to pass a value to a builtin function, but + we need to know the value at compile time. Additionally, the + value has an impact on the type of the function. This is + encoded using type-level values, where the actual value of the + argument is not important, but the type encodes some integer, + for example. Since the value is not important, the actual form + of the expression does not matter either and we can leave + these arguments as they are. + \stopdesc + \startdesc{Other arguments} + Technically, there is still a wide array of arguments that can + be passed, but does not fall into any of the above categories. + However, none of the supported builtin functions requires such + an argument. This leaves use with passing unsupported types to + a function, such as calling \lam{head} on a list of functions. + + In these cases, it would be impossible to generate hardware + for such a function call anyway, so we can ignore these + arguments. + + The only way to generate hardware for builtin functions with + arguments like these, is to expand the function call into an + equivalent core expression (\eg, expand map into a series of + function applications). But for now, we choose to simply not + support expressions like these. + \stopdesc + + From the above, we can conclude that we can simply ignore these + other unrepresentable arguments and focus on the first two + categories instead. +\stopitemize + +\subsubsection{Argument extraction} +This transform deals with arguments to functions that +are of a runtime representable type. + +TODO: It seems we can map an expression to a port, not only a signal. +Perhaps this makes this transformation not needed? +TODO: Say something about dataconstructors (without arguments, like True +or False), which are variable references of a runtime representable +type, but do not result in a signal. -Formally, we can describe the transformation as follows. +To reduce a complex expression to a simple variable reference, we create +a new let expression around the application, which binds the complex +expression to a new variable. The original function is then applied to +this variable. -\transformold{Main scope introduction} +\transform{Argument extract} { -\NC \app{transform}{\expr{\bind{f}{expr}}} \NC = \expr{\bind{f}{\app{transform'(expr)}}}\NR -\NR -\NC \app{transform'}{\expr{\lam{v}{expr}}} \NC = \expr{\lam{v}{\app{transform'}{expr}}}\NR -\NC \app{transform'}{\expr{expr}} \NC = \expr{\letexpr{\bind{x}{expr}}{x}} \NR +\lam{Y} is of a hardware representable type + +\lam{Y} is not a variable referene + +\conclusion + +\trans{X Y}{let z = Y in X z} } -When applying this transformation to our running example, we get the following -program. +\subsubsection{Function extraction} +This transform deals with function-typed arguments to builtin functions. +Since these arguments cannot be propagated, we choose to extract them +into a new global function instead. -\starttyping - \x c d -> - let r = (let s = foo x - in - case s of - (a, b) -> - case a of - High -> add c d - Low -> let - op' = case b of - High -> sub - Low -> \c d -> c - in - op' d c - ) - in - r -\stoptyping - -\subsection{Scope flattening} -This transformation tries to flatten the topmost let expression in a bind, -{\em i.e.}, bind all identifiers in the same scope, and bind them to simple -expressions only (so simplify deeply nested expressions). - -Formally, we can describe the transformation as follows. - -\transformold{Main scope introduction} { \NC \app{transform}{\expr{\bind{f}{expr}}} \NC = \expr{\bind{f}{\app{transform'(expr)}}}\NR -\NR -\NC \app{transform'}{\expr{\lam{v}{expr}}} \NC = \expr{\lam{v}{\app{transform'}{expr}}}\NR -\NC \app{transform'}{\expr{\letexpr{binds}{expr}}} \NC = \expr{\letexpr{\app{concat . map . flatten}{binds}}{expr}}\NR -\NR -\NC \app{flatten}{\expr{\bind{x}{\letexpr{binds}{expr}}}} \NC = (\app{concat . map . flatten}{binds}) \cup \{\app{flatten}{\expr{\bind{x}{expr}}}\}\NR -\NC \app{flatten}{\expr{\bind{x}{\case{s}{alts}}}} \NC = \app{concat}{binds'} \cup \{\bind{x}{\case{s}{alts'}}\}\NR -\NC \NC \where{(binds', alts')=\app{unzip.map.(flattenalt s)}{alts}}\NR -\NC \app{\app{flattenalt}{s}}{\expr{\alt{\app{con}{x_0\;..\;x_n}}{expr}}} \NC = (extracts \cup \{\app{flatten}{bind}\}, alt)\NR -\NC \NC \where{extracts =\{\expr{\case{s}{\alt{\app{con}{x_0\;..\;x_n}}{x_0}}},}\NR -\NC \NC \;..\;,\expr{\case{s}{\alt{\app{con}{x_0\;..\;x_n}}{x_n}}}\} \NR -\NC \NC bind = \expr{\bind{y}{expr}}\NR -\NC \NC alt = \expr{\alt{\app{con}{\_\;..\;\_}}{y}}\NR +Any free variables occuring in the extracted arguments will become +parameters to the new global function. The original argument is replaced +with a reference to the new function, applied to any free variables from +the original argument. + +\transform{Function extraction} +{ +\lam{X} is a (partial application of) a builtin function + +\lam{Y} is not an application + +\lam{Y} is not a variable reference + +\conclusion + +\lam{f0 ... fm} = free local vars of \lam{Y} + +\lam{y} is a new global variable + +\lam{y = λf0 ... fn.Y} + +\trans{X Y}{X (y f0 ... fn)} } -When applying this transformation to our running example, we get the following -program. +\subsubsection{Argument propagation} +This transform deals with arguments to user-defined functions that are +not representable at runtime. This means these arguments cannot be +preserved in the final form and most be {\em propagated}. -\starttyping - \x c d -> - let s = foo x - r = case s of - (_, _) -> y - a = case s of (a, b) -> a - b = case s of (a, b) -> b - y = case a of - High -> g - Low -> h - g = add c d - h = op' d c - op' = case b of - High -> i - Low -> j - i = sub - j = \c d -> c - in - r -\stoptyping +Propagation means to create a specialized version of the called +function, with the propagated argument already filled in. As a simple +example, in the following program: -\subsection{More transformations} -As noted before, the above transformations are not complete. Other needed -transformations include: -\startitemize -\item Inlining of local identifiers with a function type. Since these cannot -be represented in hardware directly, they must be transformed into something -else. Inlining them should always be possible without loss of semantics (TODO: -How true is this?) and can expose new possibilities for other transformations -passes (such as application propagation when inlining {\tt j} above). -\item A variation on inlining local identifiers is the propagation of -function arguments with a function type. This will probably be initiated when -transforming the caller (and not the callee), but it will also deal with -identifiers with a function type that are unrepresentable in hardware. - -Special care must be taken here, since the expression that is propagated into -the callee comes from a different scope. The function typed argument must thus -be replaced by any identifiers from the callers scope that the propagated -expression uses. - -Note that propagating an argument will change both a function's interface and -implementation. For this to work, a new function should be created instead of -modifying the original function, so any other callers will not be broken. -\item Something similar should happen with return values with function types. -\item Polymorphism must be removed from all user-defined functions. This is -again similar to propagation function typed arguments, since this will also -create duplicates of functions (for a specific type). This is an operation -that is commonly known as "specialization" and already happens in GHC (since -non-polymorph functions can be a lot faster than generic ones). -\item More builtin expressions should be added and these should be evaluated -by the compiler. For example, map, fold, +. -\stopitemize +\startlambda +f = λa.λb.a + b +inc = λa.f a 1 +\stoplambda + +we could {\em propagate} the constant argument 1, with the following +result: + +\startlambda +f' = λa.a + 1 +inc = λa.f' a +\stoplambda + +Special care must be taken when the to-be-propagated expression has any +free variables. If this is the case, the original argument should not be +removed alltogether, but replaced by all the free variables of the +expression. In this way, the original expression can still be evaluated +inside the new function. Also, this brings us closer to our goal: All +these free variables will be simple variable references. + +To prevent us from propagating the same argument over and over, a simple +local variable reference is not propagated (since is has exactly one +free variable, itself, we would only replace that argument with itself). + +This shows that any free local variables that are not runtime representable +cannot be brought into normal form by this transform. We rely on an +inlining transformation to replace such a variable with an expression we +can propagate again. + +TODO: Move these definitions somewhere sensible. + +Definition: A global variable is any variable that is bound at the +top level of a program. A local variable is any other variable. + +Definition: A hardware representable type is a type that we can generate +a signal for in hardware. For example, a bit, a vector of bits, a 32 bit +unsigned word, etc. Types that are not runtime representable notably +include (but are not limited to): Types, dictionaries, functions. + +Definition: A builtin function is a function for which a builtin +hardware translation is available, because its actual definition is not +translatable. A user-defined function is any other function. + +\starttrans +x = E +~ +x Y0 ... Yi ... Yn \lam{Y_i} is not of a runtime representable type +--------------------------------------------- \lam{Y_i} is not a local variable reference +x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn . \lam{f0 ... fm} = free local vars of \lam{Y_i} + E y0 ... yi-1 Yi yi+1 ... yn +~ +x' y0 ... yi-1 f0 ... fm Yi+1 ... Yn +\stoptrans + +\transform{Argument propagation} +{ +\lam{x} is a global variable, bound to a user-defined function + +\lam{x = E} + +\lam{Y_i} is not of a runtime representable type + +\lam{Y_i} is not a local variable reference + +\conclusion + +\lam{f0 ... fm} = free local vars of \lam{Y_i} + +\lam{x'} is a new global variable + +\lam{x' = λy0 ... yi-1 f0 ... fm yi+1 ... yn . E y0 ... yi-1 Yi yi+1 ... yn} + +\trans{x Y0 ... Yi ... Yn}{x' y0 ... yi-1 f0 ... fm Yi+1 ... Yn} +} -Initial example +TODO: The above definition looks too complicated... Can we find +something more concise? + +\subsection{Cast propagation} +This transform pushes casts down into the expression as far as possible. +\subsection{Let recursification} +This transform makes all lets recursive. +\subsection{Let simplification} +This transform makes the result value of all let expressions a simple +variable reference. +\subsection{Let flattening} +This transform turns two nested lets (\lam{let x = (let ... in ...) in +...}) into a single let. +\subsection{Simple let binding removal} +This transforms inlines simple let bindings (\eg a = b). +\subsection{Function inlining} +This transform inlines let bindings of a funtion type. TODO: This should +be generelized to anything that is non representable at runtime, or +something like that. +\subsection{Scrutinee simplification} +This transform ensures that the scrutinee of a case expression is always +a simple variable reference. +\subsection{Case binder wildening} +This transform replaces all binders of a each case alternative with a +wild binder (\ie, one that is never referred to). This will possibly +introduce a number of new "selector" case statements, that only select +one element from an algebraic datatype and bind it to a variable. +\subsection{Case value simplification} +This transform simplifies the result value of each case alternative by +binding the value in a let expression and replacing the value by a +simple variable reference. +\subsection{Case removal} +This transform removes any case statements with a single alternative and +only wild binders. + +\subsection{Example sequence} + +This section lists an example expression, with a sequence of transforms +applied to it. The exact transforms given here probably don't exactly +match the transforms given above anymore, but perhaps this can clarify +the big picture a bit. + +TODO: Update or remove this section. \startlambda λx.