X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FPrototype.tex;h=a18d30d1ad5c48911808a2606044d47796c9128b;hp=b391d4fcbc6951c2c86af451b805cec8187aae28;hb=4485b899f0fb8ddd68a08fa977b4766a0d801086;hpb=39d85ed266bd73954be058cd713f16ea9bada3b0 diff --git a/Chapters/Prototype.tex b/Chapters/Prototype.tex index b391d4f..a18d30d 100644 --- a/Chapters/Prototype.tex +++ b/Chapters/Prototype.tex @@ -49,7 +49,7 @@ } Considering that we required a prototype which should be working quickly, and that implementing parsers, semantic checkers and especially - typcheckers is not exactly the core of this research (but it is lots and + typcheckers is not exactly the Core of this research (but it is lots and lots of work!), using an existing language is the obvious choice. This also has the advantage that a large set of language features is available to experiment with and it is easy to find which features apply well and @@ -73,7 +73,7 @@ Haskell an obvious choice. \section[sec:prototype:output]{Output format} - The second important question is: What will be our output format? + The second important question is: what will be our output format? This output format should at least allow for programming the hardware design into a field-programmable gate array (\small{FPGA}). The choice of output format is thus limited by what hardware @@ -188,7 +188,7 @@ % Draw the objects (and deferred labels) drawObj (inp, front, desugar, simpl, back, out); \stopuseMPgraphic - \placefigure[right]{GHC compiler pipeline}{\useMPgraphic{ghc-pipeline}} + \placefigure[right]{GHC compiler pipeline}{\startboxed \useMPgraphic{ghc-pipeline}\stopboxed} \startdesc{Frontend} This step takes the Haskell source files and parses them into an @@ -220,7 +220,7 @@ Assuming that we do not want to deal with (or modify) parsing, typechecking and other frontend business and that native code is not really a useful format anymore, we are left with the choice between the full Haskell - \small{AST}, or the smaller (simplified) core representation. + \small{AST}, or the smaller (simplified) Core representation. The advantage of taking the full \small{AST} is that the exact structure of the source program is preserved. We can see exactly what the hardware @@ -228,15 +228,15 @@ the full \small{AST} is a very complicated datastructure. If we are to handle everything it offers, we will quickly get a big compiler. - Using the core representation gives us a much more compact datastructure - (a core expression only uses 9 constructors). Note that this does not mean - that the core representation itself is smaller, on the contrary. - Since the core language has less constructs, most Core expressions + Using the Core representation gives us a much more compact datastructure + (a Core expression only uses 9 constructors). Note that this does not mean + that the Core representation itself is smaller, on the contrary. + Since the Core language has less constructs, most Core expressions are larger than the equivalent versions in Haskell. - However, the fact that the core language is so much smaller, means it is a + However, the fact that the Core language is so much smaller, means it is a lot easier to analyze and translate it into something else. For the same - reason, \small{GHC} runs its simplifications and optimizations on the core + reason, \small{GHC} runs its simplifications and optimizations on the Core representation as well \cite[jones96]. We will use the normal Core representation, not the simplified Core. Even @@ -279,22 +279,22 @@ % Draw the objects (and deferred labels) drawObj (inp, front, norm, vhdl, out); \stopuseMPgraphic - \placefigure[right]{Cλash compiler pipeline}{\useMPgraphic{clash-pipeline}} + \placefigure[right]{Cλash compiler pipeline}{\startboxed \useMPgraphic{clash-pipeline}\stopboxed} \startdesc{Frontend} This is exactly the frontend from the \small{GHC} pipeline, that translates Haskell sources to a typed Core representation. \stopdesc \startdesc{Normalization} - This is a step that transforms the core representation into a normal - form. This normal form is still expressed in the core language, but has + This is a step that transforms the Core representation into a normal + form. This normal form is still expressed in the Core language, but has to adhere to an additional set of constraints. This normal form is less - expressive than the full core language (e.g., it can have limited + expressive than the full Core language (e.g., it can have limited higher-order expressions, has a specific structure, etc.), but is also very close to directly describing hardware. \stopdesc \startdesc{\small{VHDL} generation} - The last step takes the normal formed core representation and generates + The last step takes the normal formed Core representation and generates \small{VHDL} for it. Since the normal form has a specific, hardware-like structure, this final step is very straightforward. \stopdesc @@ -312,7 +312,7 @@ any functions used by the entry functions (recursively). \section[sec:prototype:core]{The Core language} - \defreftxt{core}{the Core language} + \defreftxt{Core}{the Core language} Most of the prototype deals with handling the program in the Core language. In this section we will show what this language looks like and how it works. @@ -379,11 +379,11 @@ func arg \stoplambda This is function application. Each application consists of two - parts: The function part and the argument part. Applications are used + parts: the function part and the argument part. Applications are used for normal function \quote{calls}, but also for applying type abstractions and data constructors. - In core, there is no distinction between an operator and a + In Core, there is no distinction between an operator and a function. This means that, for example the addition of two numbers looks like the following in Core: @@ -516,12 +516,12 @@ A case expression evaluates its scrutinee, which should have an algebraic datatype, into weak head normal form (\small{WHNF}) and - (optionally) binds it to \lam{bndr}. Every alternative lists a - single constructor (\lam{C0 ... Cn}). Based on the actual - constructor of the scrutinee, the corresponding alternative is - chosen. The binders in the chosen alternative (\lam{bndr0,0 .... - bndr0,m} are bound to the actual arguments to the constructor in - the scrutinee. + (optionally) binds it to \lam{bndr}. If bndr is wild, \refdef{wild + binders} it is left out. Every alternative lists a single constructor + (\lam{C0 ... Cn}). Based on the actual constructor of the scrutinee, the + corresponding alternative is chosen. The binders in the chosen + alternative (\lam{bndr0,0 .... bndr0,m} are bound to the actual + arguments to the constructor in the scrutinee. This is best illustrated with an example. Assume there is an algebraic datatype declared as follows\footnote{This @@ -626,7 +626,7 @@ \startdesc{Note} The Core language in \small{GHC} allows adding \emph{notes}, which serve - as hints to the inliner or add custom (string) annotations to a core + as hints to the inliner or add custom (string) annotations to a Core expression. These should not be generated normally, so these are not handled in any way in the prototype. \stopdesc @@ -698,7 +698,7 @@ (though you could of course construct invalidly typed expressions through the \GHC\ API). - Any type in core is one of the following: + Any type in Core is one of the following: \startdesc{A type variable} \startlambda @@ -825,9 +825,8 @@ \stopdesc Using this set of types, all types in basic Haskell can be represented. - \todo{Overview of polymorphism with more examples (or move examples - here)}. + here)} \section[sec:prototype:statetype]{State annotations in Haskell} As noted in \in{section}[sec:description:stateann], Cλash needs some @@ -842,35 +841,42 @@ equal to an existing type (or rather, a new name for an existing type). This allows both the original type and the synonym to be used interchangedly in a Haskell program. This means no explicit conversion - is needed either. For example, a simple accumulator would become: + is needed. For example, a simple accumulator would become: \starthaskell + -- This type synonym would become part of Cλash, it is shown here + -- just for clarity. type State s = s + acc :: Word -> State Word -> (State Word, Word) acc i s = let sum = s + i in (sum, sum) \stophaskell This looks nice in Haskell, but turns out to be hard to implement. There - are no explicit conversion in Haskell, but not in Core either. This - means the type of a value might be show as \hs{AccState} in some places, - but \hs{Word} in others (and this can even change due to - transformations). Since every binder has an explicit type associated - with it, the type of every function type will be properly preserved and - could be used to track down the statefulness of each value by the - compiler. However, this makes the implementation a lot more complicated - than it currently is using \hs{newtypes}. + is no explicit conversion in Haskell, but not in Core either. This + means the type of a value might be shown as \hs{State Word} in + some places, but \hs{Word} in others (and this can even change due + to transformations). Since every binder has an explicit type + associated with it, the type of every function type will be + properly preserved and could be used to track down the + statefulness of each value by the compiler. However, this would make + the implementation a lot more complicated than when using type + renamings as described in the next section. % Use \type instead of \hs here, since the latter breaks inside % section headings. \subsection{Type renaming (\type{newtype})} Haskell also supports type renamings as a way to declare a new type that has the same (runtime) representation as an existing type (but is in - fact a different type to the typechecker). With type renaming, an + fact a different type to the typechecker). With type renaming, explicit conversion between values of the two types is needed. The accumulator would then become: \starthaskell + -- This type renaming would become part of Cλash, it is shown here + -- just for clarity. newtype State s = State s + acc :: Word -> State Word -> (State Word, Word) acc i (State s) = let sum = s + i in (State sum, sum) \stophaskell @@ -882,13 +888,13 @@ never cause name collisions with values). The difference with the type synonym example is in the explicit conversion between the \hs{State Word} and \hs{Word} types by pattern matching and by using the explicit - the \hs{State constructor}. + the \hs{State} constructor. - This explicit conversion makes the \VHDL\ generation easier: Whenever we + This explicit conversion makes the \VHDL\ generation easier: whenever we remove (unpack) the \hs{State} type, this means we are accessing the - current state (\eg, accessing the register output). Whenever we are a + current state (\ie, accessing the register output). Whenever we are adding (packing) the \hs{State} type, we are producing a new value for - the state (\eg, providing the register input). + the state (\ie, providing the register input). When dealing with nested states (a stateful function that calls stateful functions, which might call stateful functions, etc.) the state type @@ -896,9 +902,9 @@ needed. For example, consider the following state type (this is just the state type, not the entire function type): - \starttyping + \starthaskell State (State Bit, State (State Word, Bit), Word) - \stoptyping + \stophaskell We cannot leave all these \hs{State} type constructors out, since that would change the type (unlike when using type synonyms). However, when @@ -912,25 +918,28 @@ then become something like: \starthaskell + -- These type renaminges would become part of Cλash, it is shown + -- here just for clarity. newtype StateIn s = StateIn s newtype StateOut s = StateOut s + acc :: Word -> StateIn Word -> (StateIn Word, Word) acc i (StateIn s) = let sum = s + i in (StateIn sum, sum) \stophaskell This could make the implementation easier and the hardware - descriptions less errorprone (you can no longer \quote{forget} to + descriptions less error-prone (you can no longer \quote{forget} to unpack and repack a state variable and just return it directly, which can be a problem in the current prototype). However, it also means we need twice as many type synonyms to hide away substates, making this - approach a bit cumbersome. It also makes it harder to copmare input - and output state types, possible reducing the type safety of the + approach a bit cumbersome. It also makes it harder to compare input + and output state types, possible reducing the type-safety of the descriptions. \subsection[sec:prototype:substatesynonyms]{Type synonyms for substates} As noted above, when using nested (hierarchical) states, the state types of the \quote{upper} functions (those that call other functions, which - call other functions, etc.) quickly becomes complicated. Also, when the + call other functions, etc.) quickly become complicated. Also, when the state type of one of the \quote{lower} functions changes, the state types of all the upper functions changes as well. If the state type for each function is explicitly and completely specified, this means that a @@ -954,15 +963,16 @@ losing any expressivity. \subsubsection{Example} - As an example of the used approach, there is a simple averaging circuit in - \in{example}[ex:AvgState]. This circuit lets the accumulation of the - inputs be done by a subcomponent, \hs{acc}, but keeps a count of value - accumulated in its own state.\footnote{Currently, the prototype - is not able to compile this example, since the built-in function - for division has not been added.} + As an example of the used approach, a simple averaging circuit + is shown in \in{example}[ex:AvgState]. This circuit lets the + accumulation of the inputs be done by a subcomponent, \hs{acc}, + but keeps a count of value accumulated in its own + state.\footnote{Currently, the prototype is not able to compile + this example, since there is no built-in function for division.} \startbuffer[AvgState] - -- The state type annotation + -- This type renaming would become part of Cλash, it is shown + -- here just for clarity newtype State s = State s -- The accumulator state type @@ -994,37 +1004,38 @@ %\stopcombination \todo{Picture} - \section{Implementing state} + \section{\VHDL\ generation for state} Now its clear how to put state annotations in the Haskell source, there is the question of how to implement this state translation. As we have seen in \in{section}[sec:prototype:design], the translation to \VHDL\ happens as a simple, final step in the compilation process. - This step works on a core expression in normal form. The specifics + This step works on a Core expression in normal form. The specifics of normal form will be explained in \in{chapter}[chap:normalization], but the examples given should be - easy to understand using the definitin of Core given above. + easy to understand using the definition of Core given above. The + conversion to and from the \hs{State} type is done using the cast + operator, \lam{▶}. \startbuffer[AvgStateNormal] acc = λi.λspacked. let -- Remove the State newtype s = spacked ▶ Word - s' = s + i - o = s + i + sum = s + i -- Add the State newtype again - spacked' = s' ▶ State Word - res = (spacked', o) + spacked' = sum ▶ State Word + res = (spacked', sum) in res avg = λi.λspacked. let s = spacked ▶ (AccState, Word) - accs = case s of (accs, _) -> accs - count = case s of (_, count) -> count + accs = case s of (a, b) -> a + count = case s of (c, d) -> d accres = acc i accs - accs' = case accres of (accs', _) -> accs' - sum = case accres of (_, sum) -> sum + accs' = case accres of (e, f) -> e + sum = case accres of (g, h) -> h count' = count + 1 o = sum / count' s' = (accs', count') @@ -1040,10 +1051,10 @@ \subsection[sec:prototype:statelimits]{State in normal form} Before describing how to translate state from normal form to \VHDL, we will first see how state handling looks in normal form. - What limitations are there on their use to guarantee that proper - \VHDL\ can be generated? + How must their use be limited to guarantee that proper \VHDL\ can + be generated? - We will try to formulate a number of rules about what operations are + We will formulate a number of rules about what operations are allowed with state variables. These rules apply to the normalized Core representation, but will in practice apply to the original Haskell hardware description as well. Ideally, these rules would become part @@ -1063,6 +1074,11 @@ (state) variables} and \emph{substate variables}, which will be defined in the rules themselves. + These rules describe everything that can be done with state + variables and state-containing variables. Everything else is + invalid. For every rule, the corresponding part of + \in{example}[ex:AvgStateNormal] is shown. + \startdesc{State variables can appear as an argument.} \startlambda avg = λi.λspacked. ... @@ -1082,12 +1098,13 @@ \lam{State} type. If the result of this unpacking does not have a state type and does - not contain state variables, there are no limitations on its use. - Otherwise if it does not have a state type but does contain - substates, we refer to it as a \emph{state-containing input - variable} and the limitations below apply. If it has a state type - itself, we refer to it as an \emph{input substate variable} and the - below limitations apply as well. + not contain state variables, there are no limitations on its + use (this is the function's own state). Otherwise if it does + not have a state type but does contain substates, we refer to it + as a \emph{state-containing input variable} and the limitations + below apply. If it has a state type itself, we refer to it as an + \emph{input substate variable} and the below limitations apply + as well. It may seem strange to consider a variable that still has a state type directly after unpacking, but consider the case where a @@ -1105,7 +1122,7 @@ \startdesc{Variables can be extracted from state-containing input variables.} \startlambda - accs = case s of (accs, _) -> accs + accs = case s of (a, b) -> a \stoplambda A state-containing input variable is typically a tuple containing @@ -1115,17 +1132,18 @@ multiple, when the input variable is nested). If the result has no state type and does not contain any state - variables either, there are no further limitations on its use. If - the result has no state type but does contain state variables we - refer to it as a \emph{state-containing input variable} and this - limitation keeps applying. If the variable has a state type itself, - we refer to it as an \emph{input substate variable} and below - limitations apply. + variables either, there are no further limitations on its use + (this is the function's own state). If the result has no state + type but does contain state variables we refer to it as a + \emph{state-containing input variable} and this limitation keeps + applying. If the variable has a state type itself, we refer to + it as an \emph{input substate variable} and below limitations + apply. \startdesc{Input substate variables can be passed to functions.} \startlambda accres = acc i accs - accs' = case accres of (accs', _) -> accs' + accs' = case accres of (e, f) -> e \stoplambda An input substate variable can (only) be passed to a function. @@ -1147,7 +1165,8 @@ A function's output state is usually a tuple containing its own updated state variables and all output substates. This result is - built up using any single-constructor algebraic datatype. + built up using any single-constructor algebraic datatype + (possibly nested). The result of these expressions is referred to as a \emph{state-containing output variable}, which are subject to these @@ -1162,7 +1181,7 @@ As soon as all a functions own update state and output substate variables have been joined together, the resulting state-containing output variable can be packed into an output - state variable. Packing is done by casting into a state type. + state variable. Packing is done by casting to a state type. \stopdesc \startdesc{Output state variables can appear as (part of) a function result.} @@ -1184,7 +1203,7 @@ to be passed to functions, the corresponding output substates should be inserted into the output state in the same way. In other words, each pair of corresponding substates in the input and - output states should be passed / returned from the same called + output states should be passed to / returned from the same called function. The prototype currently does not check much of the above @@ -1197,7 +1216,7 @@ As noted above, the basic approach when generating \VHDL\ for stateful functions is to generate a single register for every stateful function. We look around the normal form to find the let binding that removes the - \lam{State} newtype (using a cast). We also find the let binding that + \lam{State} type renaming (using a cast). We also find the let binding that adds a \lam{State} type. These are connected to the output and the input of the generated let binding respectively. This means that there can only be one let binding that adds and one that removes the \lam{State} @@ -1228,8 +1247,8 @@ To keep the function definition correct until the very end of the process, we will not deal with (sub)states until we get to the \small{VHDL} generation. Then, we are translating from Core to - \small{VHDL}, and we can simply ignore substates, effectively removing - the substate components altogether. + \small{VHDL}, and we can simply generate no \VHDL for substates, + effectively removing them altogether. But, how will we know what exactly is a substate? Since any state argument or return value that represents state must be of the @@ -1237,8 +1256,8 @@ must be careful to ignore only \emph{substates}, and not a function's own state. - In \in{example}[ex:AvgStateNorm] above, we should generate a register - connected with its output connected to \lam{s} and its input connected + For \in{example}[ex:AvgStateNormal] above, we should generate a register + with its output connected to \lam{s} and its input connected to \lam{s'}. However, \lam{s'} is build up from both \lam{accs'} and \lam{count'}, while only \lam{count'} should end up in the register. \lam{accs'} is a substate for the \lam{acc} function, for which a @@ -1246,14 +1265,15 @@ function. Fortunately, the \lam{accs'} variable (and any other substate) has a - property that we can easily check: It has a \lam{State} type - annotation. This means that whenever \VHDL\ is generated for a tuple - (or other algebraic type), we can simply leave out all elements that - have a \lam{State} type. This will leave just the parts of the state - that do not have a \lam{State} type themselves, like \lam{count'}, - which is exactly a function's own state. This approach also means that - the state part of the result is automatically excluded when generating - the output port, which is also required. + property that we can easily check: it has a \lam{State} type. This + means that whenever \VHDL\ is generated for a tuple (or other + algebraic type), we can simply leave out all elements that have a + \lam{State} type. This will leave just the parts of the state that + do not have a \lam{State} type themselves, like \lam{count'}, + which is exactly a function's own state. This approach also means + that the state part of the result (\eg\ \lam{s'} in \lam{res}) is + automatically excluded when generating the output port, which is + also required. We can formalize this translation a bit, using the following rules. @@ -1278,44 +1298,46 @@ register specification. \stopitemize - When applying these rules to the description in + When applying these rules to the function \lam{avg} from \in{example}[ex:AvgStateNormal], we be left with the description in \in{example}[ex:AvgStateRemoved]. All the parts that do not generate any \VHDL\ directly are crossed out, leaving just the - actual flow of values in the final hardware. + actual flow of values in the final hardware. To illustrate the + change of the types of \lam{s} and \lam{s'}, their types are also + shown. - \startlambda + \startbuffer[AvgStateRemoved] avg = iλ.λ--spacked.-- let + s :: (--AccState,-- Word) s = --spacked ▶ (AccState, Word)-- - --accs = case s of (accs, _) -> accs-- - count = case s of (--_,-- count) -> count + --accs = case s of (a, b) -> a-- + count = case s of (--c,-- d) -> d accres = acc i --accs-- - --accs' = case accres of (accs', _) -> accs'-- - sum = case accres of (--_,-- sum) -> sum + --accs' = case accres of (e, f) -> e-- + sum = case accres of (--g,-- h) -> h count' = count + 1 o = sum / count' + s' :: (--AccState,-- Word) s' = (--accs',-- count') --spacked' = s' ▶ State (AccState, Word)-- res = (--spacked',-- o) in res - \stoplambda + \stopbuffer + \placeexample[here][ex:AvgStateRemoved]{Normalized version of \in{example}[ex:AvgState] with ignored parts crossed out} + {\typebufferlam{AvgStatRemoved}} - When we would really leave out the crossed out parts, we get a slightly - weird program: There is a variable \lam{s} which has no value, and there - is a variable \lam{s'} that is never used. Together, these two will form + When we actually leave out the crossed out parts, we get a slightly + weird program: there is a variable \lam{s} which has no value, and there + is a variable \lam{s'} that is never used. But together, these two will form the state process of the function. \lam{s} contains the "current" state, \lam{s'} is assigned the "next" state. So, at the end of each clock cycle, \lam{s'} should be assigned to \lam{s}. - In the example the definition of \lam{s'} is still present, since - it does not have a state type. The \lam{accums'} substate has been - removed, leaving us just with the state of \lam{avg} itself. - As an illustration of the result of this function, \in{example}[ex:AccStateVHDL] and \in{example}[ex:AvgStateVHDL] show the the \VHDL\ that is - generated from the examples is this section. + generated by Cλash from the examples is this section. \startbuffer[AvgStateVHDL] entity avgComponent_0 is @@ -1365,6 +1387,17 @@ end process state; end architecture structural; \stopbuffer + + \startbuffer[AvgStateTypes] + package types is + subtype \unsigned_31\ is unsigned (0 to 31); + + type \(,)unsigned_31\ is record + A : \unsigned_31\; + end record; + end package types; + \stopbuffer + \startbuffer[AccStateVHDL] entity accComponent_1 is port (\izAob3\ : in \unsigned_31\; @@ -1373,7 +1406,6 @@ resetn : in std_logic); end entity accComponent_1; - architecture structural of accComponent_1 is signal \szAod3\ : \unsigned_31\; signal \reszAonzAor3\ : \unsigned_31\; @@ -1392,10 +1424,28 @@ end architecture structural; \stopbuffer - \placeexample[][ex:AccStateVHDL]{\VHDL\ generated for acc from \in{example}[ex:AvgState]} - {\typebuffer[AccStateVHDL]} - \placeexample[][ex:AvgStateVHDL]{\VHDL\ generated for avg from \in{example}[ex:AvgState]} - {\typebuffer[AvgStateVHDL]} + \placeexample[][ex:AvgStateTypes]{\VHDL\ types generated for \hs{acc} and \hs{avg} from \in{example}[ex:AvgState]} + {\typebuffervhdl{AvgStateTypes}} + \placeexample[][ex:AccStateVHDL]{\VHDL\ generated for \hs{acc} from \in{example}[ex:AvgState]} + {\typebuffervhdl{AccStateVHDL}} + \placeexample[][ex:AvgStateVHDL]{\VHDL\ generated for \hs{avg} from \in{example}[ex:AvgState]} + {\typebuffervhdl{AvgStateVHDL}} + \section{Prototype implementation} + The prototype has been implemented using Haskell as its + implementation language, just like \GHC. This allows the prototype + do directly use parts of \GHC\ through the \small{API} it exposes + (which essentially taps directly into the internals of \GHC, making + this \small{API} not really a stable interface). + + Cλash can be run from a separate library, but has also been + integrated into \type{ghci} \cite[baaij09]. The latter does requires + a custom \GHC\ build, however. + + The latest version and all history of the Cλash code can be browsed + online or retrieved using the \type{git} program. + + http://git.stderr.nl/gitweb?p=matthijs/projects/cλash.git + % \subsection{Initial state} % How to specify the initial state? Cannot be done inside a hardware % function, since the initial state is its own state argument for the first