X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FHardwareDescription.tex;h=9e4061c9a4d059ee7fd8182c9d1e04565ecf94df;hp=0266490bf21c51bac670b20317846507883cccd0;hb=7095d53c2ec805554837714da3df3a458ebfb2bb;hpb=ca520674308c1949f94efa4b45e3b90076fcf05b diff --git a/Chapters/HardwareDescription.tex b/Chapters/HardwareDescription.tex index 0266490..9e4061c 100644 --- a/Chapters/HardwareDescription.tex +++ b/Chapters/HardwareDescription.tex @@ -27,17 +27,329 @@ to the corresponding input port. The output port of the function is also mapped to a signal, which is used as the result of the application. - An example of a simple program using only function application would be: + \in{Example}[ex:And3] shows a simple program using only function + application and the corresponding architecture. + +\startbuffer[And3] +-- | A simple function that returns +-- the and of three bits +and3 :: Bit -> Bit -> Bit -> Bit +and3 a b c = and (and a b) c +\stopbuffer + + \startuseMPgraphic{And3} + save a, b, c, anda, andb, out; + + % I/O ports + newCircle.a(btex $a$ etex) "framed(false)"; + newCircle.b(btex $b$ etex) "framed(false)"; + newCircle.c(btex $c$ etex) "framed(false)"; + newCircle.out(btex $out$ etex) "framed(false)"; + + % Components + newCircle.anda(btex $and$ etex); + newCircle.andb(btex $and$ etex); + + a.c = origin; + b.c = a.c + (0cm, 1cm); + c.c = b.c + (0cm, 1cm); + anda.c = midpoint(a.c, b.c) + (2cm, 0cm); + andb.c = midpoint(b.c, c.c) + (4cm, 0cm); + + out.c = andb.c + (2cm, 0cm); + + % Draw objects and lines + drawObj(a, b, c, anda, andb, out); + + ncarc(a)(anda) "arcangle(-10)"; + ncarc(b)(anda); + ncarc(anda)(andb); + ncarc(c)(andb); + ncline(andb)(out); + \stopuseMPgraphic + + \placeexample[here][ex:And3]{Simple three port and.} + \startcombination[2*1] + {\typebufferhs{And3}}{Haskell description using function applications.} + {\boxedgraphic{And3}}{The architecture described by the Haskell description.} + \stopcombination + + TODO: Define top level function and subfunctions/circuits. + + \section{Choice} + Since describing components and connections allows us to describe a lot of + hardware designs already, there is an obvious thing missing: Choice. We + need some way to be able to choose between values based on another value. + In Haskell, choice is achieved by case expressions, if expressions or + pattern matching (all of which can be reduced to case expressions). + + An obvious way to add choice to our language without having to recognize + any of Haskell's syntax, would be to add a primivite \quote{\hs{if}} + function. This function would take three arguments: The condition, the + value to return when the condition is true and the value to return when + the condition is false. + + This \hs{if} function would then essentially describe a multiplexer and + allows us to describe any architecture that uses multiplexers. (TODO: Are + there other mechanisms of choice in hardware?) + + However, to be able to describe our hardware in a more convenient way, we + also need to translate Haskell's choice mechanisms. The easiest of these + are of course case expressions (and if expressions, which can be very + directly translated to case expressions). A case expression can in turn + simply be translated to a conditional assignment, where the conditions use + equality comparisons against the constructors in the case expressions. + + In \in{example}[ex:CaseInv] a simple case statement is shown, + scrutinizing a boolean value. If we would translate a Boolean to a bit + value, we could of course remove the comparator and directly feed 'in' + into the multiplex (or even use an inverter instead of a multiplexer). + However, we will try to make a general translation, which works for all + possible case statements. Optimizations such as these are left for the + \VHDL synthesizer, which handles them very well. + + \startbuffer[CaseInv] + inv :: Bool -> Bool + inv in = case in of + True -> False + False -> True + \stopbuffer + + \startuseMPgraphic{CaseInv} + save in, truecmp, falseout, trueout, out, cmp, mux; + + % I/O ports + newCircle.in(btex $in$ etex) "framed(false)"; + newCircle.out(btex $out$ etex) "framed(false)"; + % Constants + newBox.truecmp(btex $True$ etex) "framed(false)"; + newBox.trueout(btex $True$ etex) "framed(false)"; + newBox.falseout(btex $False$ etex) "framed(false)"; + + % Components + newCircle.cmp(btex $==$ etex); + newMux.mux; + + in.c = origin; + cmp.c = in.c + (3cm, 0cm); + truecmp.c = cmp.c + (-1cm, 1cm); + mux.sel = cmp.e + (1cm, -1cm); + falseout.c = mux.inpa - (2cm, 0cm); + trueout.c = mux.inpb - (2cm, 0cm); + out.c = mux.out + (2cm, 0cm); + + % Draw objects and lines + drawObj(in, out, truecmp, trueout, falseout, cmp, mux); + + ncline(in)(cmp); + ncarc(truecmp)(cmp); + nccurve(cmp.e)(mux.sel) "angleA(0)", "angleB(-90)"; + ncline(falseout)(mux) "posB(inpa)"; + ncline(trueout)(mux) "posB(inpb)"; + ncline(mux)(out) "posA(out)"; + \stopuseMPgraphic + + \placeexample[here][ex:CaseInv]{Simple inverter.} + \startcombination[2*1] + {\typebufferhs{CaseInv}}{Haskell description using a Case expression.} + {\boxedgraphic{CaseInv}}{The architecture described by the Haskell description.} + \stopcombination + + Slightly more complex (but very powerful) forms of choice are pattern + matches. A function can be defined in multiple clauses, where each clause + specifies a pattern. When the arguments match the pattern, the + corresponding clause will be used. + + \startbuffer[PatternInv] + inv :: Bool -> Bool + inv True = False + inv False = True + \stopbuffer + + \placeexample[here][ex:PatternInv]{Simple inverter using pattern matching.} + {\typebufferhs{CaseInv}} + + The architecture described in \in{example}[ex:PatternInv] is of course the + same one as the one in \in{example}[ex:CaseInv]. The general interpration + of pattern matching is also similar to that of case expressions: Generate + hardware for each of the clause (like each of the clauses of a case + expression) and connect them to the function output through (a number of + nested) multiplexers. These multiplexers are driven by comparators and + other logic, that check each pattern in turn. + + \section{Types} + Apart from giving structure to our hardware, we'll also need to provide + some way to translate the values used to hardware equivalents. In + particular, this means having to come up with a hardware equivalent for + every \emph{type} used in our program. + + Since most functional languages have a lot of standard types that are + hard to translate (integers without a fixed size, lists without a static + length, etc.), we will start out by defining a number of \quote{builtin} + types ourselves. These types are builtin in the sense that our compiler + will have a fixed VHDL type for these. User defined types, on the other + hand, will have their hardware type derived directly from their Haskell + declaration automatically, according to the rules we sketch here. + + \subsection{Builtin types} + \startdesc{Bit} + This is the most basic type available. It is mapped directly onto + the \type{std_logic} \small{VHDL} type. Mapping this to the + \type{bit} type might make more sense (since the Haskell version + only has two values), but using \type{std_logic} is more standard + (and allowed for some experimentation with don't care values) + \stopdesc + \startdesc{Bool} + This is a builtin Haskell type, but is translated exactly like the + Bit type. Supporting the Bool type is particularly useful to support + \hs{if ... then ... else ...} expressions, which always have a + \hs{Bool} value for the condition. - \starthaskell - -- | A simple function that returns the and of three bits - and3 :: Bit -> Bit -> Bit -> Bit - and3 a b c = and (and a b) c - \stophaskell + A \hs{Bool} is translated to a \type{std_logic}, just like \hs{Bit}. + \stopdesc + \startdesc{SizedWord, SizedInt} + These are types to represent integers. \hs{SizedWord} is unsigned, + while \hs{SizedInt} is signed. These types are parameterized by a + length type, so you can define an unsigned word of 32 bits wide as + follows: + + \starthaskell + type Word32 = SizedWord D32 + \stophaskell + + Here, \hs{D32} is the \emph{type level representation} of the number + 32. TODO: Introduce dependent types somewhere. + + These types are translated to the \small{VHDL} \type{unsigned} and + \type{signed} respectively. + \stopdesc + \startdesc{RangedWord} + This is another type to describe integers, but unlike the previous + two it has no specific width, but an upper bound. This means that + its range is not limited to powers of two, but can be any number. + A \hs{RangedWord} only has an upper bound, its lower bound is + implicitly zero. There is a lot of added implementation complexity + when adding a lower bound and having just an upper bound was enough + for the primary purpose of this type: Typesafely indexing vectors. + + This type is translated to the \type{unsigned} \small{VHDL} type. + \stopdesc + \startdesc{TFVec} + This is a vector type, with a fixed length. It has two type + parameters: Its length and the type of the elements contained in it. + + A fixed size vector is translated to a \small{VHDL} array type. + + The \hs{TF} in its name is a reference to the implementation used in + the prototype, which uses \emph{type families}. + \stopdesc - This results in the following hardware: - - TODO: Pretty picture + TODO: Reference Christiaan / describe dependent typing + \subsection{User-defined types} + There are three ways to define new types in Haskell: Algebraic + datatypes with the \hs{data} keyword, type synonyms with the \hs{type} + keyword and type renamings with the \hs{newtype} keyword. This + expclitely excludes more advanced type creation from \GHC extensions + such as type families, existential typing, \small{GADT}s, etc. + + The first of these actually introduces a new type, for which we provide + the \VHDL translation below. The latter two only define new names for + existing types (where synonyms are completely interchangeable and + renamings need explicit conversion). Therefore, these don't need any + particular \VHDL translation, a synonym or renamed type will just use + the same representation as the equivalent type. + + For algebraic types, we can make the following distinction: + + \startdesc{Product types} + A product type is an algebraic datatype with a single constructor with + two or more fields. This is essentially a way to pack a few values + together in a record-like structure. In fact, the builtin tuple types + are just product types (and are thus supported in exactly the same + way). + + The "product" in its name refers to the collection of values belonging + to this type. The collection for a product type is the cartesic + product of the collections for the types of its fields. + + These types are translated to \VHDL record types, with one field for + every field in the constructor. This translation applies to all single + constructor algebraic datatypes, including those with no fields (unit + types) and just one field (which are technically not a product). + \stopdesc + \startdesc{Enumerated types} + An enumerated type is an algebraic datatype with multiple constructors, but + none of them have fields. This is essentially a way to get an + enum-like type containing alternatives. + + Note that Haskell's \hs{Bool} type is also defined as an + enumeration type, but we have a fixed translation for that. + + These types are translated to \VHDL enumerations, with one value for + each constructor. This allows references to these constructors to be + translated to the corresponding enumeration value. + \stopdesc + \startdesc{Sum types} + A sum type is an algebraic datatype with multiple constructors, where + the constructors have one or more fields. Technically, a type with + more than one field per constructor is a sum of products type, but + for our purposes this distinction does not really make a difference, + so we'll leave it out. + + Sum types are currently not supported by the prototype, since its + translation is not so obvious as before. The can easily be emulated, + as we will see from an example: + + \starthaskell + data Sum = A Bit Word | B Word + \stophaskell + + An obvious way to translate this would be to create an enumeration to + distinguish the constructors and then create a big record that + contains all the fields of all the constructors. This is the same + translation that would result from the following enumeration and + product type (using a tuple for clarity): + + \starthaskell + data SumC = A | B + type Sum = (SumC, Bit, Word, Word) + \stophaskell + + Here, the \hs{SumC} type effectively signals which of the fields of + the \hs{Sum} type are valid, all the other ones have no useful value. + + An obvious problem with this naive approach is the space usage: The + example above generates a fairly big \VHDL type. However, we can be + sure that the two \hs{Word}s in the \hs{Sum} type will never be valid + at the same time, so this is a waste of space. + + Obviously, we could do some duplication detection here to reuse a + particular field for another constructor, but this would only + partially solve the problem. If I would, for example, have an array of + 8 bits and a 8 bit unsiged word, these are different types and could + not be shared. However, in the final hardware, both of these types + would simply be 8 bit connections, so we have a 100\% size increase by + not sharing these! + \stopdesc + + Another interesting case is that of recursive types. In Haskell, an + algebraic datatype can be recursive: Any of its field types can be (or + contain) the type being defined. The most well-known recursive type is + probably the list type, which is defined is: + + \starthaskell + data List a = Empty | Cons a (List a) + \stophaskell + + Where \hs{Empty} is usually written as \hs{[]} and \hs{Cons} as \hs{:}. + This immediately shows the problem with recursive types: What hardware + type to allocate here? There is no way to statically determine how long + the list will be + + In general, we can say we can never properly translated recursive types: + All recursive types have a potentially infinite value (even though in + practice they will have a bounded value, there is no way for the + compiler to determine an upper bound on its size. \subsection{Partial application} It should be obvious that we cannot generate hardware signals for all @@ -47,19 +359,50 @@ represented as a signal or i/o port to a component. From this, we can see that the above translation rules do not apply to a - partial application. Let's look at an example: - - \starthaskell - -- | Multiply the input word by four. - quadruple :: Word -> Word - quadruple n = mul (mul n) - where - mul = (*) 2 - \stophaskell - - It should be clear that the above code describes the following hardware: - - TODO: Pretty picture + partial application. \in{Example}[ex:Quadruple] shows an example use of + partial application and the corresponding architecture. + +\startbuffer[Quadruple] +-- | Multiply the input word by four. +quadruple :: Word -> Word +quadruple n = mul (mul n) + where + mul = (*) 2 +\stopbuffer + + \startuseMPgraphic{Quadruple} + save in, two, mula, mulb, out; + + % I/O ports + newCircle.in(btex $n$ etex) "framed(false)"; + newCircle.two(btex $2$ etex) "framed(false)"; + newCircle.out(btex $out$ etex) "framed(false)"; + + % Components + newCircle.mula(btex $\times$ etex); + newCircle.mulb(btex $\times$ etex); + + two.c = origin; + in.c = two.c + (0cm, 1cm); + mula.c = in.c + (2cm, 0cm); + mulb.c = mula.c + (2cm, 0cm); + out.c = mulb.c + (2cm, 0cm); + + % Draw objects and lines + drawObj(in, two, mula, mulb, out); + + nccurve(two)(mula) "angleA(0)", "angleB(45)"; + nccurve(two)(mulb) "angleA(0)", "angleB(45)"; + ncline(in)(mula); + ncline(mula)(mulb); + ncline(mulb)(out); + \stopuseMPgraphic + + \placeexample[here][ex:Quadruple]{Simple three port and.} + \startcombination[2*1] + {\typebufferhs{Quadruple}}{Haskell description using function applications.} + {\boxedgraphic{Quadruple}}{The architecture described by the Haskell description.} + \stopcombination Here, the definition of mul is a partial function application: It applies \hs{2 :: Word} to the function \hs{(*) :: Word -> Word -> Word} resulting in @@ -76,284 +419,395 @@ function boundaries), but eventually, the partial application will become completely applied. - \section{State} - \subsection{Introduction} - Provide some examples + \section{Costless specialization} + Each (complete) function application in our description generates a + component instantiation, or a specific piece of hardware in the final + design. It is interesting to note that each application of a function + generates a \emph{separate} piece of hardware. In the final design, none + of the hardware is shared between applications, even when the applied + function is the same. + + This is distinctly different from normal program compilation: Two separate + calls to the same function share the same machine code. Having more + machine code has implications for speed (due to less efficient caching) + and memory usage. For normal compilation, it is therefore important to + keep the amount of functions limited and maximize the code sharing. + + When generating hardware, this is hardly an issue. Having more \quote{code + sharing} does reduce the amount of \small{VHDL} output (Since different + component instantiations still share the same component), but after + synthesis, the amount of hardware generated is not affected. + + In particular, if we would duplicate all functions so that there is a + duplicate for every application in the program (\eg, each function is then + only applied exactly once), there would be no increase in hardware size + whatsoever. + + TODO: Perhaps these next two sections are a bit too + implementation-oriented? + + \subsection{Specialization} + Because of this, a common optimization technique called + \emph{specialization} is as good as free for hardware generation. Given + some function that has a \emph{domain} of $D$ (\eg, the set of all + possible arguments that could be applied), we create a specialized + function with exactly the same behaviour, but with an domain of $D' + \subset D$. This subset can be derived in all sort of ways, but commonly + this is done by limiting a polymorphic argument to a single type (\eg, + removing polymorphism) or by limiting an argument to just a single value + (\eg, cross-function constant propagation, effectively removing the + argument). + + Since we limit the argument domain of the specialized function, its + definition can often be optimized further (since now more types or even + values of arguments are already know). By replacing any application of + the function that falls within the reduced domain by an application of + the specialized version, the code gets faster (but the code also gets + bigger, since we now have two versions instead of one!). If we apply + this technique often enough, we can often replace all applications of a + function by specialized versions, allowing the original function to be + removed (in some cases, this can even give a net reduction of the code + compared to the non-specialized version). + + Specialization is useful for our hardware descriptions for functions + that contain arguments that cannot be translated to hardware directly + (polymorphic or higher order arguments, for example). If we can create + specialized functions that remove the argument, or make it translatable, + we can use specialization to make the original, untranslatable, function + obsolete. + + \section{Higher order values} + What holds for partial application, can be easily generalized to any + higher order expression. This includes partial applications, plain + variables (e.g., a binder referring to a top level function), lambda + expressions and more complex expressions with a function type (a case + expression returning lambda's, for example). + + Each of these values cannot be directly represented in hardware (just like + partial applications). Also, to make them representable, they need to be + applied: function variables and partial applications will then eventually + become complete applications, applied lambda expressions disappear by + applying β-reduction, etc. + + So any higher order value will be \quote{pushed down} towards its + application just like partial applications. Whenever a function boundary + needs to be crossed, the called function can be specialized. + + TODO: This is section should be improved + + \section{Polymorphism} + In Haskell, values can be polymorphic: They can have multiple types. For + example, the function \hs{fst :: (a, b) -> a} is an example of a + polymorphic function: It works for tuples with any element types. Haskell + typeclasses allow a function to work on a specific set of types, but the + general idea is the same. + +% A type class is a collection of types for which some operations are +% defined. It is thus possible for a value to be polymorphic while having +% any number of \emph{class constraints}: The value is not defined for +% every type, but only for types in the type class. An example of this is +% the \hs{even :: (Integral a) => a -> Bool} function, which can map any +% value of a type that is member of the \hs{Integral} type class + + When generating hardware, polymorphism can't be easily translated. How + many wire will you lay down for a value that could have any type? When + type classes are involved, what hardware components will you lay down for + a class method (whose behaviour depends on the type of its arguments)? + + Fortunately, we can again use the principle of specialization: Since every + function application generates separate pieces of hardware, we can know + the types of all arguments exactly. Provided that we don't use existential + typing, all of the polymorphic types in a function must depend on the + types of the arguments (In other words, the only way to introduce a type + variable is in a lambda abstraction). Our top level function must not have + a polymorphic type (otherwise we wouldn't know the hardware interface to + our top level function). + + If a function is monomorphic, all values inside it are monomorphic as + well, so any function that is applied within the function can only be + applied to monomorphic values. The applied functions can then be + specialized to work just for these specific types, removing the + polymorphism from the applied functions as well. + + By induction, this means that all functions that are (indirectly) called + by our top level function (meaning all functions that are translated in + the final hardware) become monomorphic. + \section{State} + A very important concept in hardware designs is \emph{state}. In a + stateless (or, \emph{combinatoric}) design, every output is a directly and solely dependent on the + inputs. In a stateful design, the outputs can depend on the history of + inputs, or the \emph{state}. State is usually stored in \emph{registers}, + which retain their value during a clockcycle, and are typically updated at + the start of every clockcycle. Since the updating of the state is tightly + coupled (synchronized) to the clock signal, these state updates are often + called \emph{synchronous}. + + To make our hardware description language useful to describe more that + simple combinatoric designs, we'll need to be able to describe state in + some way. \subsection{Approaches to state} - Explain impact of state (or rather, temporal behaviour) on function signature. + In Haskell, functions are always pure (except when using unsafe + functions like \hs{unsafePerformIO}, which should be prevented whenever + possible). This means that the output of a function solely depends on + its inputs. If you evaluate a given function with given inputs, it will + always provide the same output. + + TODO: Define pure + + This is a perfect match for a combinatoric circuit, where the output + also soley depend on the inputs. However, when state is involved, this + no longer holds. Since we're in charge of our own language, we could + remove this purity constraint and allow a function to return different + values depending on the cycle in which it is evaluated (or rather, the + current state). However, this means that all kinds of interesting + properties of our functional language get lost, and all kinds of + transformations and optimizations might no longer be meaning preserving. + + Provided that we want to keep the function pure, the current state has + to be present in the function's arguments in some way. There seem to be + two obvious ways to do this: Adding the current state as an argument, or + including the full history of each argument. + \subsubsection{Stream arguments and results} + Including the entire history of each input (\eg, the value of that + input for each previous clockcycle) is an obvious way to make outputs + depend on all previous input. This is easily done by making every + input a list instead of a single value, containing all previous values + as well as the current value. + + An obvious downside of this solution is that on each cycle, all the + previous cycles must be resimulated to obtain the current state. To do + this, it might be needed to have a recursive helper function as well, + wich might be hard to properly analyze by the compiler. + + A slight variation on this approach is one taken by some of the other + functional \small{HDL}s in the field (TODO: References to Lava, + ForSyDe, ...): Make functions operate on complete streams. This means + that a function is no longer called on every cycle, but just once. It + takes stream as inputs instead of values, where each stream contains + all the values for every clockcycle since system start. This is easily + modeled using an (infinite) list, with one element for each clock + cycle. Since the funciton is only evaluated once, its output is also a + stream. Note that, since we are working with infinite lists and still + want to be able to simulate the system cycle-by-cycle, this relies + heavily on the lazy semantics of Haskell. + + Since our inputs and outputs are streams, all other (intermediate) + values must be streams. All of our primitive operators (\eg, addition, + substraction, bitwise operations, etc.) must operate on streams as + well (note that changing a single-element operation to a stream + operation can done with \hs{map}, \hs{zipwith}, etc.). + + Note that the concept of \emph{state} is no more than having some way + to communicate a value from one cycle to the next. By introducing a + \hs{delay} function, we can do exactly that: Delay (each value in) a + stream so that we can "look into" the past. This \hs{delay} function + simply outputs a stream where each value is the same as the input + value, but shifted one cycle. This causes a \quote{gap} at the + beginning of the stream: What is the value of the delay output in the + first cycle? For this, the \hs{delay} function has a second input + (which is a value, not a stream!). + + \in{Example}[ex:DelayAcc] shows a simple accumulator expressed in this + style. + +\startbuffer[DelayAcc] +acc :: Stream Word -> Stream Word +acc in = out + where + out = (delay out 0) + in +\stopbuffer + +\startuseMPgraphic{DelayAcc} + save in, out, add, reg; + + % I/O ports + newCircle.in(btex $in$ etex) "framed(false)"; + newCircle.out(btex $out$ etex) "framed(false)"; + + % Components + newReg.reg("") "dx(4mm)", "dy(6mm)", "reflect(true)"; + newCircle.add(btex + etex); + + in.c = origin; + add.c = in.c + (2cm, 0cm); + out.c = add.c + (2cm, 0cm); + reg.c = add.c + (0cm, 2cm); + + % Draw objects and lines + drawObj(in, out, add, reg); + + nccurve(add)(reg) "angleA(0)", "angleB(180)", "posB(d)"; + nccurve(reg)(add) "angleA(180)", "angleB(-45)", "posA(out)"; + ncline(in)(add); + ncline(add)(out); +\stopuseMPgraphic + + + \placeexample[here][ex:DelayAcc]{Simple accumulator architecture.} + \startcombination[2*1] + {\typebufferhs{DelayAcc}}{Haskell description using streams.} + {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.} + \stopcombination + + + This notation can be confusing (especially due to the loop in the + definition of out), but is essentially easy to interpret. There is a + single call to delay, resulting in a circuit with a single register, + whose input is connected to \hs{outl (which is the output of the + adder)}, and it's output is the \hs{delay out 0} (which is connected + to one of the adder inputs). + + This notation has a number of downsides, amongst which are limited + readability and ambiguity in the interpretation. TODO: Reference + Christiaan. + \subsubsection{Explicit state arguments and results} - Nested state for called functions. + A more explicit way to model state, is to simply add an extra argument + containing the current state value. This allows an output to depend on + both the inputs as well as the current state while keeping the + function pure (letting the result depend only on the arguments), since + the current state is now an argument. + + In Haskell, this would look like \in{example}[ex:ExplicitAcc]. + +\startbuffer[ExplicitAcc] +-- input -> current state -> (new state, output) +acc :: Word -> Word -> (Word, Word) +acc in (State s) = (State s', out) + where + out = s + in + s' = out +\stopbuffer + + \placeexample[here][ex:ExplicitAcc]{Simple accumulator architecture.} + \startcombination[2*1] + {\typebufferhs{ExplicitAcc}}{Haskell description using explicit state arguments.} + % Picture is identical to the one we had just now. + {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.} + \stopcombination + + This approach makes a function's state very explicit, which state + variables are used by a function can be completely determined from its + type signature (as opposed to the stream approach, where a function + looks the same from the outside, regardless of what state variables it + uses (or wether it's stateful at all). + + This approach is the one chosen for Cλash and will be examined more + closely below. \subsection{Explicit state specification} - Note about semantic correctness of top level state. - - Note about automatic ``down-pushing'' of state. - - Note about explicit state specification as the best solution. - - Note about substates - - Note about conditions on state variables and checking them. - - \subsection{Explicit state implementation} - Recording state variables at the type level. - - Ideal: Type synonyms, since there is no additional code overhead for - packing and unpacking. Downside: there is no explicit conversion in Core - either, so type synonyms tend to get lost in expressions (they can be - preserved in binders, but this makes implementation harder, since that - statefulness of a value must be manually tracked). - - Less ideal: Newtype. Requires explicit packing and unpacking of function - arguments. If you don't unpack substates, there is no overhead for - (un)packing substates. This will result in many nested State constructors - in a nested state type. \eg: - - \starttyping - State (State Bit, State (State Word, Bit), Word) - \stoptyping - - Alternative: Provide different newtypes for input and output state. This - makes the code even more explicit, and typechecking can find even more - errors. However, this requires defining two type synomyms for each - stateful function instead of just one. \eg: - \starttyping - type AccumStateIn = StateIn Bit - type AccumStateOut = StateOut Bit - \stoptyping - This also increases the possibility of having different input and output - states. Checking for identical input and output state types is also - harder, since each element in the state must be unpacked and compared - separately. - - Alternative: Provide a type for the entire result type of a stateful - function, not just the state part. \eg: - - \starttyping - newtype Result state result = Result (state, result) - \stoptyping - - This makes it easy to say "Any stateful function must return a - \type{Result} type, without having to sort out result from state. However, - this either requires a second type for input state (similar to - \type{StateIn} / \type{StateOut} above), or requires the compiler to - select the right argument for input state by looking at types (which works - for complex states, but when that state has the same type as an argument, - things get ambiguous) or by selecting a fixed (\eg, the last) argument, - which might be limiting. - - \subsubsection{Example} - As an example of the used approach, a simple averaging circuit, that lets - the accumulation of the inputs be done by a subcomponent. - - \starttyping - newtype State s = State s - - type AccumState = State Bit - accum :: Word -> AccumState -> (AccumState, Word) - accum i (State s) = (State (s + i), s + i) - - type AvgState = (AccumState, Word) - avg :: Word -> AvgState -> (AvgState, Word) - avg i (State s) = (State s', o) - where - (accums, count) = s - -- Pass our input through the accumulator, which outputs a sum - (accums', sum) = accum i accums - -- Increment the count (which will be our new state) - count' = count + 1 - -- Compute the average - o = sum / count' - s' = (accums', count') - \stoptyping - - And the normalized, core-like versions: - - \starttyping - accum i spacked = res - where - s = case spacked of (State s) -> s - s' = s + i - spacked' = State s' - o = s + i - res = (spacked', o) - - avg i spacked = res - where - s = case spacked of (State s) -> s - accums = case s of (accums, \_) -> accums - count = case s of (\_, count) -> count - accumres = accum i accums - accums' = case accumres of (accums', \_) -> accums' - sum = case accumres of (\_, sum) -> sum - count' = count + 1 - o = sum / count' - s' = (accums', count') - spacked' = State s' - res = (spacked', o) - \stoptyping - - - - As noted above, any component of a function's state that is a substate, - \eg passed on as the state of another function, should have no influence - on the hardware generated for the calling function. Any state-specific - \small{VHDL} for this component can be generated entirely within the called - function. So,we can completely leave out substates from any function. - - From this observation, we might think to remove the substates from a - function's states alltogether, and leave only the state components which - are actual states of the current function. While doing this would not - remove any information needed to generate \small{VHDL} from the function, it would - cause the function definition to become invalid (since we won't have any - substate to pass to the functions anymore). We could solve the syntactic - problems by passing \type{undefined} for state variables, but that would - still break the code on the semantic level (\ie, the function would no - longer be semantically equivalent to the original input). - - To keep the function definition correct until the very end of the process, - we will not deal with (sub)states until we get to the \small{VHDL} generation. - Here, we are translating from Core to \small{VHDL}, and we can simply not generate - \small{VHDL} for substates, effectively removing the substate components - alltogether. - - There are a few important points when ignore substates. - - First, we have to have some definition of "substate". Since any state - argument or return value that represents state must be of the \type{State} - type, we can simply look at its type. However, we must be careful to - ignore only {\em substates}, and not a function's own state. - - In the example above, this means we should remove \type{accums'} from - \type{s'}, but not throw away \type{s'} entirely. We should, however, - remove \type{s'} from the output port of the function, since the state - will be handled by a \small{VHDL} procedure within the function. - - When looking at substates, these can appear in two places: As part of an - argument and as part of a return value. As noted above, these substates - can only be used in very specific ways. - - \desc{State variables can appear as an argument.} When generating \small{VHDL}, we - completely ignore the argument and generate no input port for it. - - \desc{State variables can be extracted from other state variables.} When - extracting a state variable from another state variable, this always means - we're extracting a substate, which we can ignore. So, we simply generate no - \small{VHDL} for any extraction operation that has a state variable as a result. - - \desc{State variables can be passed to functions.} When passing a - state variable to a function, this always means we're passing a substate - to a subcomponent. The entire argument can simply be ingored in the - resulting port map. - - \desc{State variables can be returned from functions.} When returning a - state variable from a function (probably as a part of an algebraic - datatype), this always mean we're returning a substate from a - subcomponent. The entire state variable should be ignored in the resulting - port map. The type binder of the binder that the function call is bound - to should not include the state type either. - - \startdesc{State variables can be inserted into other variables.} When inserting - a state variable into another variable (usually by constructing that new - variable using its constructor), we can identify two cases: - - \startitemize - \item The state is inserted into another state variable. In this case, - the inserted state is a substate, and can be safely left out of the - constructed variable. - \item The state is inserted into a non-state variable. This happens when - building up the return value of a function, where you put state and - retsult variables together in an algebraic type (usually a tuple). In - this case, we should leave the state variable out as well, since we - don't want it to be included as an output port. - \stopitemize - - So, in both cases, we can simply leave out the state variable from the - resulting value. In the latter case, however, we should generate a state - proc instead, which assigns the state variable to the input state variable - at each clock tick. - \stopdesc - - \desc{State variables can appear as (part of) a function result.} When - generating \small{VHDL}, we can completely ignore any part of a function result - that has a state type. If the entire result is a state type, this will - mean the entity will not have an output port. Otherwise, the state - elements will be removed from the type of the output port. - - - Now, we know how to handle each use of a state variable separately. If we - look at the whole, we can conclude the following: - - \startitemize - \item A state unpack operation should not generate any \small{VHDL}. The binder - to which the unpacked state is bound should still be declared, this signal - will become the register and will hold the current state. - \item A state pack operation should not generate any \small{VHDL}. The binder th - which the packed state is bound should not be declared. The binder that is - packed is the signal that will hold the new state. - \item Any values of a State type should not be translated to \small{VHDL}. In - particular, State elements should be removed from tuples (and other - datatypes) and arguments with a state type should not generate ports. - \item To make the state actually work, a simple \small{VHDL} proc should be - generated. This proc updates the state at every clockcycle, by assigning - the new state to the current state. This will be recognized by synthesis - tools as a register specification. - \stopitemize - - - When applying these rules to the example program (in normal form), we will - get the following result. All the parts that don't generate any value are - crossed out, leaving some very boring assignments here and there. - + We've seen the concept of explicit state in a simple example below, but + what are the implications of this approach? + + \subsubsection{Substates} + Since a function's state is reflected directly in its type signature, + if a function calls other stateful functions (\eg, has subcircuits) it + has to somehow know the current state for these called functions. The + only way to do this, is to put these \emph{substates} inside the + caller's state. This means that a function's state is the sum of the + states of all functions it calls, and its own state. + + This also means that the type of a function (at least the "state" + part) is dependent on its implementation and the functions it calls. + This is the major downside of this approach: The separation between + interface and implementation is limited. However, since Cλash is not + very suitable for separate compilation (see + \in{section}[sec:prototype:separate]) this is not a big problem in + practice. Additionally, when using a type synonym for the state type + of each function, we can still provide explicit type signatures + while keeping the state specification for a function near its + definition only. - \starthaskell - avg i --spacked-- = res - where - s = --case spacked of (State s) -> s-- - --accums = case s of (accums, \_) -> accums-- - count = case s of (--\_,-- count) -> count - accumres = accum i --accums-- - --accums' = case accumres of (accums', \_) -> accums'-- - sum = case accumres of (--\_,-- sum) -> sum - count' = count + 1 - o = sum / count' - s' = (--accums',-- count') - --spacked' = State s'-- - res = (--spacked',-- o) - \stophaskell - - When we would really leave out the crossed out parts, we get a slightly - weird program: There is a variable \type{s} which has no value, and there - is a variable \type{s'} that is never used. Together, these two will form - the state proc of the function. \type{s} contains the "current" state, - \type{s'} is assigned the "next" state. So, at the end of each clock - cycle, \type{s'} should be assigned to \type{s}. - - Note that the definition of \type{s'} is not removed, even though one - might think it as having a state type. Since the state type has a single - argument constructor \type{State}, some type that should be the resulting - state should always be explicitly packed with the State constructor, - allowing us to remove the packed version, but still generate \small{VHDL} for the - unpacked version (of course with any substates removed). - - As you can see, the definition of \type{s'} is still present, since it - does not have a state type (The State constructor. The \type{accums'} substate has been removed, - leaving us just with the state of \type{avg} itself. - \subsection{Initial state} - How to specify the initial state? Cannot be done inside a hardware - function, since the initial state is its own state argument for the first - call (unless you add an explicit, synchronous reset port). + \subsubsection{...} + We need some way to know which arguments should become input ports and + which argument(s?) should become the current state (\eg, be bound to + the register outputs). This does not hold holds not just for the top + level function, but also for any subfunctions. Or could we perhaps + deduce the statefulness of subfunctions by analyzing the flow of data + in the calling functions? + + To explore this matter, we make an interesting observation: We get + completely correct behaviour when we put all state registers in the + top level entity (or even outside of it). All of the state arguments + and results on subfunctions are treated as normal input and output + ports. Effectively, a stateful function results in a stateless + hardware component that has one of its input ports connected to the + output of a register and one of its output ports connected to the + input of the same register. + + TODO: Example? + + Of course, even though the hardware described like this has the + correct behaviour, unless the layout tool does smart optimizations, + there will be a lot of extra wire in the design (since registers will + not be close to the component that uses them). Also, when working with + the generated \small{VHDL} code, there will be a lot of extra ports + just to pass one state values, which can get quite confusing. + + To fix this, we can simply \quote{push} the registers down into the + subcircuits. When we see a register that is connected directly to a + subcircuit, we remove the corresponding input and output port and put + the register inside the subcircuit instead. This is slightly less + trivial when looking at the Haskell code instead of the resulting + circuit, but the idea is still the same. + + TODO: Example? + + However, when applying this technique, we might push registers down + too far. When you intend to store a result of a stateless subfunction + in the caller's state and pass the current value of that state + variable to that same function, the register might get pushed down too + far. It is impossible to distinguish this case from similar code where + the called function is in fact stateful. From this we can conclude + that we have to either: + + \startitemize + \item accept that the generated hardware might not be exactly what we + intended, in some specific cases. In most cases, the hardware will be + what we intended. + \item explicitely annotate state arguments and results in the input + description. + \stopitemize + + The first option causes (non-obvious) exceptions in the language + intepretation. Also, automatically determining where registers should + end up is easier to implement correctly with explicit annotations, so + for these reasons we will look at how this annotations could work. + + + TODO: Note about conditions on state variables and checking them. + + \subsection{Explicit state annotation} + To make our stateful descriptions unambigious and easier to translate, + we need some way for the developer to describe which arguments and + results are intended to become stateful. + + Roughly, we have two ways to achieve this: + \startitemize[KR] + \item Use some kind of annotation method or syntactic construction in + the language to indicate exactly which argument and (part of the) + result is stateful. This means that the annotation lives + \quote{outside} of the function, it is completely invisible when + looking at the function body. + \item Use some kind of annotation on the type level, \eg give stateful + arguments and (part of) results a different type. This has the + potential to make this annotation visible inside the function as well, + such that when looking at a value inside the function body you can + tell if it's stateful by looking at its type. This could possibly make + the translation process a lot easier, since less analysis of the + program flow might be required. + \stopitemize - External init state is natural for simulation. + From these approaches, the type level \quote{annotations} have been + implemented in Cλash. \in{Section}[sec:prototype:statetype] expands on + the possible ways this could have been implemented. - External init state works for hardware generation as well. + TODO: Say something about dependent types and fixed size vectors - Implementation issues: state splitting, linking input to output state, - checking usage constraints on state variables. - \section{Recursion} + \section[sec:recursion]{Recursion} An import concept in functional languages is recursion. In it's most basic form, recursion is a function that is defined in terms of itself. This usually requires multiple evaluations of this function, with changing