X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FHardwareDescription.tex;h=508c11bef4a4dd3f5028ae08cf7befc99a7c4d56;hp=1571ff6acc31148a5bd75d27ddbdfb490ba71a29;hb=dde172499703d199e396b84bb6d5c13dae8cdd8d;hpb=38af31872d0108220e14104f75f816fcd4e326dd diff --git a/Chapters/HardwareDescription.tex b/Chapters/HardwareDescription.tex index 1571ff6..508c11b 100644 --- a/Chapters/HardwareDescription.tex +++ b/Chapters/HardwareDescription.tex @@ -144,6 +144,120 @@ quadruple n = mul (mul n) function boundaries), but eventually, the partial application will become completely applied. + \section{Costless specialization} + Each (complete) function application in our description generates a + component instantiation, or a specific piece of hardware in the final + design. It is interesting to note that each application of a function + generates a \emph{separate} piece of hardware. In the final design, none + of the hardware is shared between applications, even when the applied + function is the same. + + This is distinctly different from normal program compilation: Two separate + calls to the same function share the same machine code. Having more + machine code has implications for speed (due to less efficient caching) + and memory usage. For normal compilation, it is therefore important to + keep the amount of functions limited and maximize the code sharing. + + When generating hardware, this is hardly an issue. Having more \quote{code + sharing} does reduce the amount of \small{VHDL} output (Since different + component instantiations still share the same component), but after + synthesis, the amount of hardware generated is not affected. + + In particular, if we would duplicate all functions so that there is a + duplicate for every application in the program (\eg, each function is then + only applied exactly once), there would be no increase in hardware size + whatsoever. + + TODO: Perhaps these next two sections are a bit too + implementation-oriented? + + \subsection{Specialization} + Because of this, a common optimization technique called + \emph{specialization} is as good as free for hardware generation. Given + some function that has a \emph{domain} of $D$ (\eg, the set of all + possible arguments that could be applied), we create a specialized + function with exactly the same behaviour, but with an domain of $D' + \subset D$. This subset can be derived in all sort of ways, but commonly + this is done by limiting a polymorphic argument to a single type (\eg, + removing polymorphism) or by limiting an argument to just a single value + (\eg, cross-function constant propagation, effectively removing the + argument). + + Since we limit the argument domain of the specialized function, its + definition can often be optimized further (since now more types or even + values of arguments are already know). By replacing any application of + the function that falls within the reduced domain by an application of + the specialized version, the code gets faster (but the code also gets + bigger, since we now have two versions instead of one!). If we apply + this technique often enough, we can often replace all applications of a + function by specialized versions, allowing the original function to be + removed (in some cases, this can even give a net reduction of the code + compared to the non-specialized version). + + Specialization is useful for our hardware descriptions for functions + that contain arguments that cannot be translated to hardware directly + (polymorphic or higher order arguments, for example). If we can create + specialized functions that remove the argument, or make it translatable, + we can use specialization to make the original, untranslatable, function + obsolete. + + \section{Higher order values} + What holds for partial application, can be easily generalized to any + higher order expression. This includes partial applications, plain + variables (e.g., a binder referring to a top level function), lambda + expressions and more complex expressions with a function type (a case + expression returning lambda's, for example). + + Each of these values cannot be directly represented in hardware (just like + partial applications). Also, to make them representable, they need to be + applied: function variables and partial applications will then eventually + become complete applications, applied lambda expressions disappear by + applying β-reduction, etc. + + So any higher order value will be \quote{pushed down} towards its + application just like partial applications. Whenever a function boundary + needs to be crossed, the called function can be specialized. + + TODO: This is section should be improved + + \section{Polymorphism} + In Haskell, values can be polymorphic: They can have multiple types. For + example, the function \hs{fst :: (a, b) -> a} is an example of a + polymorphic function: It works for tuples with any element types. Haskell + typeclasses allow a function to work on a specific set of types, but the + general idea is the same. + +% A type class is a collection of types for which some operations are +% defined. It is thus possible for a value to be polymorphic while having +% any number of \emph{class constraints}: The value is not defined for +% every type, but only for types in the type class. An example of this is +% the \hs{even :: (Integral a) => a -> Bool} function, which can map any +% value of a type that is member of the \hs{Integral} type class + + When generating hardware, polymorphism can't be easily translated. How + many wire will you lay down for a value that could have any type? When + type classes are involved, what hardware components will you lay down for + a class method (whose behaviour depends on the type of its arguments)? + + Fortunately, we can again use the principle of specialization: Since every + function application generates separate pieces of hardware, we can know + the types of all arguments exactly. Provided that we don't use existential + typing, all of the polymorphic types in a function must depend on the + types of the arguments (In other words, the only way to introduce a type + variable is in a lambda abstraction). Our top level function must not have + a polymorphic type (otherwise we wouldn't know the hardware interface to + our top level function). + + If a function is monomorphic, all values inside it are monomorphic as + well, so any function that is applied within the function can only be + applied to monomorphic values. The applied functions can then be + specialized to work just for these specific types, removing the + polymorphism from the applied functions as well. + + By induction, this means that all functions that are (indirectly) called + by our top level function (meaning all functions that are translated in + the final hardware) become monomorphic. + \section{State} A very important concept in hardware designs is \emph{state}. In a stateless (or, \emph{combinatoric}) design, every output is a directly and solely dependent on the @@ -391,261 +505,32 @@ acc in (State s) = (State s', out) TODO: Note about conditions on state variables and checking them. - \subsection{Explicit state implementation} - Recording state variables at the type level. - - Ideal: Type synonyms, since there is no additional code overhead for - packing and unpacking. Downside: there is no explicit conversion in Core - either, so type synonyms tend to get lost in expressions (they can be - preserved in binders, but this makes implementation harder, since that - statefulness of a value must be manually tracked). - - Less ideal: Newtype. Requires explicit packing and unpacking of function - arguments. If you don't unpack substates, there is no overhead for - (un)packing substates. This will result in many nested State constructors - in a nested state type. \eg: - - \starttyping - State (State Bit, State (State Word, Bit), Word) - \stoptyping - - Alternative: Provide different newtypes for input and output state. This - makes the code even more explicit, and typechecking can find even more - errors. However, this requires defining two type synomyms for each - stateful function instead of just one. \eg: - \starttyping - type AccumStateIn = StateIn Bit - type AccumStateOut = StateOut Bit - \stoptyping - This also increases the possibility of having different input and output - states. Checking for identical input and output state types is also - harder, since each element in the state must be unpacked and compared - separately. - - Alternative: Provide a type for the entire result type of a stateful - function, not just the state part. \eg: - - \starttyping - newtype Result state result = Result (state, result) - \stoptyping - - This makes it easy to say "Any stateful function must return a - \type{Result} type, without having to sort out result from state. However, - this either requires a second type for input state (similar to - \type{StateIn} / \type{StateOut} above), or requires the compiler to - select the right argument for input state by looking at types (which works - for complex states, but when that state has the same type as an argument, - things get ambiguous) or by selecting a fixed (\eg, the last) argument, - which might be limiting. - - \subsubsection{Example} - As an example of the used approach, a simple averaging circuit, that lets - the accumulation of the inputs be done by a subcomponent. - - \starttyping - newtype State s = State s - - type AccumState = State Bit - accum :: Word -> AccumState -> (AccumState, Word) - accum i (State s) = (State (s + i), s + i) - - type AvgState = (AccumState, Word) - avg :: Word -> AvgState -> (AvgState, Word) - avg i (State s) = (State s', o) - where - (accums, count) = s - -- Pass our input through the accumulator, which outputs a sum - (accums', sum) = accum i accums - -- Increment the count (which will be our new state) - count' = count + 1 - -- Compute the average - o = sum / count' - s' = (accums', count') - \stoptyping - - And the normalized, core-like versions: - - \starttyping - accum i spacked = res - where - s = case spacked of (State s) -> s - s' = s + i - spacked' = State s' - o = s + i - res = (spacked', o) - - avg i spacked = res - where - s = case spacked of (State s) -> s - accums = case s of (accums, \_) -> accums - count = case s of (\_, count) -> count - accumres = accum i accums - accums' = case accumres of (accums', \_) -> accums' - sum = case accumres of (\_, sum) -> sum - count' = count + 1 - o = sum / count' - s' = (accums', count') - spacked' = State s' - res = (spacked', o) - \stoptyping - - - - As noted above, any component of a function's state that is a substate, - \eg passed on as the state of another function, should have no influence - on the hardware generated for the calling function. Any state-specific - \small{VHDL} for this component can be generated entirely within the called - function. So,we can completely leave out substates from any function. - - From this observation, we might think to remove the substates from a - function's states alltogether, and leave only the state components which - are actual states of the current function. While doing this would not - remove any information needed to generate \small{VHDL} from the function, it would - cause the function definition to become invalid (since we won't have any - substate to pass to the functions anymore). We could solve the syntactic - problems by passing \type{undefined} for state variables, but that would - still break the code on the semantic level (\ie, the function would no - longer be semantically equivalent to the original input). - - To keep the function definition correct until the very end of the process, - we will not deal with (sub)states until we get to the \small{VHDL} generation. - Here, we are translating from Core to \small{VHDL}, and we can simply not generate - \small{VHDL} for substates, effectively removing the substate components - alltogether. - - There are a few important points when ignore substates. - - First, we have to have some definition of "substate". Since any state - argument or return value that represents state must be of the \type{State} - type, we can simply look at its type. However, we must be careful to - ignore only {\em substates}, and not a function's own state. - - In the example above, this means we should remove \type{accums'} from - \type{s'}, but not throw away \type{s'} entirely. We should, however, - remove \type{s'} from the output port of the function, since the state - will be handled by a \small{VHDL} procedure within the function. - - When looking at substates, these can appear in two places: As part of an - argument and as part of a return value. As noted above, these substates - can only be used in very specific ways. - - \desc{State variables can appear as an argument.} When generating \small{VHDL}, we - completely ignore the argument and generate no input port for it. - - \desc{State variables can be extracted from other state variables.} When - extracting a state variable from another state variable, this always means - we're extracting a substate, which we can ignore. So, we simply generate no - \small{VHDL} for any extraction operation that has a state variable as a result. - - \desc{State variables can be passed to functions.} When passing a - state variable to a function, this always means we're passing a substate - to a subcomponent. The entire argument can simply be ingored in the - resulting port map. - - \desc{State variables can be returned from functions.} When returning a - state variable from a function (probably as a part of an algebraic - datatype), this always mean we're returning a substate from a - subcomponent. The entire state variable should be ignored in the resulting - port map. The type binder of the binder that the function call is bound - to should not include the state type either. - - \startdesc{State variables can be inserted into other variables.} When inserting - a state variable into another variable (usually by constructing that new - variable using its constructor), we can identify two cases: - - \startitemize - \item The state is inserted into another state variable. In this case, - the inserted state is a substate, and can be safely left out of the - constructed variable. - \item The state is inserted into a non-state variable. This happens when - building up the return value of a function, where you put state and - retsult variables together in an algebraic type (usually a tuple). In - this case, we should leave the state variable out as well, since we - don't want it to be included as an output port. - \stopitemize - - So, in both cases, we can simply leave out the state variable from the - resulting value. In the latter case, however, we should generate a state - proc instead, which assigns the state variable to the input state variable - at each clock tick. - \stopdesc - - \desc{State variables can appear as (part of) a function result.} When - generating \small{VHDL}, we can completely ignore any part of a function result - that has a state type. If the entire result is a state type, this will - mean the entity will not have an output port. Otherwise, the state - elements will be removed from the type of the output port. - - - Now, we know how to handle each use of a state variable separately. If we - look at the whole, we can conclude the following: - - \startitemize - \item A state unpack operation should not generate any \small{VHDL}. The binder - to which the unpacked state is bound should still be declared, this signal - will become the register and will hold the current state. - \item A state pack operation should not generate any \small{VHDL}. The binder th - which the packed state is bound should not be declared. The binder that is - packed is the signal that will hold the new state. - \item Any values of a State type should not be translated to \small{VHDL}. In - particular, State elements should be removed from tuples (and other - datatypes) and arguments with a state type should not generate ports. - \item To make the state actually work, a simple \small{VHDL} proc should be - generated. This proc updates the state at every clockcycle, by assigning - the new state to the current state. This will be recognized by synthesis - tools as a register specification. - \stopitemize - - - When applying these rules to the example program (in normal form), we will - get the following result. All the parts that don't generate any value are - crossed out, leaving some very boring assignments here and there. - + \subsection{Explicit state annotation} + To make our stateful descriptions unambigious and easier to translate, + we need some way for the developer to describe which arguments and + results are intended to become stateful. - \starthaskell - avg i --spacked-- = res - where - s = --case spacked of (State s) -> s-- - --accums = case s of (accums, \_) -> accums-- - count = case s of (--\_,-- count) -> count - accumres = accum i --accums-- - --accums' = case accumres of (accums', \_) -> accums'-- - sum = case accumres of (--\_,-- sum) -> sum - count' = count + 1 - o = sum / count' - s' = (--accums',-- count') - --spacked' = State s'-- - res = (--spacked',-- o) - \stophaskell - - When we would really leave out the crossed out parts, we get a slightly - weird program: There is a variable \type{s} which has no value, and there - is a variable \type{s'} that is never used. Together, these two will form - the state proc of the function. \type{s} contains the "current" state, - \type{s'} is assigned the "next" state. So, at the end of each clock - cycle, \type{s'} should be assigned to \type{s}. - - Note that the definition of \type{s'} is not removed, even though one - might think it as having a state type. Since the state type has a single - argument constructor \type{State}, some type that should be the resulting - state should always be explicitly packed with the State constructor, - allowing us to remove the packed version, but still generate \small{VHDL} for the - unpacked version (of course with any substates removed). - - As you can see, the definition of \type{s'} is still present, since it - does not have a state type (The State constructor. The \type{accums'} substate has been removed, - leaving us just with the state of \type{avg} itself. - \subsection{Initial state} - How to specify the initial state? Cannot be done inside a hardware - function, since the initial state is its own state argument for the first - call (unless you add an explicit, synchronous reset port). - - External init state is natural for simulation. + Roughly, we have two ways to achieve this: + \startitemize[KR] + \item Use some kind of annotation method or syntactic construction in + the language to indicate exactly which argument and (part of the) + result is stateful. This means that the annotation lives + \quote{outside} of the function, it is completely invisible when + looking at the function body. + \item Use some kind of annotation on the type level, \eg give stateful + arguments and (part of) results a different type. This has the + potential to make this annotation visible inside the function as well, + such that when looking at a value inside the function body you can + tell if it's stateful by looking at its type. This could possibly make + the translation process a lot easier, since less analysis of the + program flow might be required. + \stopitemize - External init state works for hardware generation as well. + From these approaches, the type level \quote{annotations} have been + implemented in Cλash. \in{Section}[sec:prototype:statetype] expands on + the possible ways this could have been implemented. - Implementation issues: state splitting, linking input to output state, - checking usage constraints on state variables. + TODO: Say something about dependent types and fixed size vectors \section[sec:recursion]{Recursion} An import concept in functional languages is recursion. In it's most basic @@ -748,3 +633,6 @@ acc in (State s) = (State s', out) Due to these complications, we leave other forms of recursion as future work as well. + + \section{Supported types} + TODO