to the corresponding input port. The output port of the function is also
mapped to a signal, which is used as the result of the application.
- An example of a simple program using only function application would be:
+ \in{Example}[ex:And3] shows a simple program using only function
+ application and the corresponding architecture.
+
+\startbuffer[And3]
+-- | A simple function that returns
+-- the and of three bits
+and3 :: Bit -> Bit -> Bit -> Bit
+and3 a b c = and (and a b) c
+\stopbuffer
- \starthaskell
- -- | A simple function that returns the and of three bits
- and3 :: Bit -> Bit -> Bit -> Bit
- and3 a b c = and (and a b) c
- \stophaskell
+ \startuseMPgraphic{And3}
+ save a, b, c, anda, andb, out;
+
+ % I/O ports
+ newCircle.a(btex $a$ etex) "framed(false)";
+ newCircle.b(btex $b$ etex) "framed(false)";
+ newCircle.c(btex $c$ etex) "framed(false)";
+ newCircle.out(btex $out$ etex) "framed(false)";
+
+ % Components
+ newCircle.anda(btex $and$ etex);
+ newCircle.andb(btex $and$ etex);
+
+ a.c = origin;
+ b.c = a.c + (0cm, 1cm);
+ c.c = b.c + (0cm, 1cm);
+ anda.c = midpoint(a.c, b.c) + (2cm, 0cm);
+ andb.c = midpoint(b.c, c.c) + (4cm, 0cm);
+
+ out.c = andb.c + (2cm, 0cm);
+
+ % Draw objects and lines
+ drawObj(a, b, c, anda, andb, out);
+
+ ncarc(a)(anda) "arcangle(-10)";
+ ncarc(b)(anda);
+ ncarc(anda)(andb);
+ ncarc(c)(andb);
+ ncline(andb)(out);
+ \stopuseMPgraphic
+
+ \placeexample[here][ex:And3]{Simple three port and.}
+ \startcombination[2*1]
+ {\typebufferhs{And3}}{Haskell description using function applications.}
+ {\boxedgraphic{And3}}{The architecture described by the Haskell description.}
+ \stopcombination
+
+ TODO: Define top level function and subfunctions/circuits.
+
+ \section{Choice}
+ Since describing components and connections allows us to describe a lot of
+ hardware designs already, there is an obvious thing missing: Choice. We
+ need some way to be able to choose between values based on another value.
+ In Haskell, choice is achieved by case expressions, if expressions or
+ pattern matching (all of which can be reduced to case expressions).
+
+ An obvious way to add choice to our language without having to recognize
+ any of Haskell's syntax, would be to add a primivite \quote{\hs{if}}
+ function. This function would take three arguments: The condition, the
+ value to return when the condition is true and the value to return when
+ the condition is false.
+
+ This \hs{if} function would then essentially describe a multiplexer and
+ allows us to describe any architecture that uses multiplexers. (TODO: Are
+ there other mechanisms of choice in hardware?)
+
+ However, to be able to describe our hardware in a more convenient way, we
+ also need to translate Haskell's choice mechanisms. The easiest of these
+ are of course case expressions (and if expressions, which can be very
+ directly translated to case expressions). A case expression can in turn
+ simply be translated to a conditional assignment, where the conditions use
+ equality comparisons against the constructors in the case expressions.
+
+ In \in{example}[ex:CaseInv] a simple case statement is shown,
+ scrutinizing a boolean value. If we would translate a Boolean to a bit
+ value, we could of course remove the comparator and directly feed 'in'
+ into the multiplex (or even use an inverter instead of a multiplexer).
+ However, we will try to make a general translation, which works for all
+ possible case statements. Optimizations such as these are left for the
+ \VHDL synthesizer, which handles them very well.
+
+ \startbuffer[CaseInv]
+ inv :: Bool -> Bool
+ inv in = case in of
+ True -> False
+ False -> True
+ \stopbuffer
+
+ \startuseMPgraphic{CaseInv}
+ save in, truecmp, falseout, trueout, out, cmp, mux;
+
+ % I/O ports
+ newCircle.in(btex $in$ etex) "framed(false)";
+ newCircle.out(btex $out$ etex) "framed(false)";
+ % Constants
+ newBox.truecmp(btex $True$ etex) "framed(false)";
+ newBox.trueout(btex $True$ etex) "framed(false)";
+ newBox.falseout(btex $False$ etex) "framed(false)";
+
+ % Components
+ newCircle.cmp(btex $==$ etex);
+ newMux.mux;
+
+ in.c = origin;
+ cmp.c = in.c + (3cm, 0cm);
+ truecmp.c = cmp.c + (-1cm, 1cm);
+ mux.sel = cmp.e + (1cm, -1cm);
+ falseout.c = mux.inpa - (2cm, 0cm);
+ trueout.c = mux.inpb - (2cm, 0cm);
+ out.c = mux.out + (2cm, 0cm);
+
+ % Draw objects and lines
+ drawObj(in, out, truecmp, trueout, falseout, cmp, mux);
+
+ ncline(in)(cmp);
+ ncarc(truecmp)(cmp);
+ nccurve(cmp.e)(mux.sel) "angleA(0)", "angleB(-90)";
+ ncline(falseout)(mux) "posB(inpa)";
+ ncline(trueout)(mux) "posB(inpb)";
+ ncline(mux)(out) "posA(out)";
+ \stopuseMPgraphic
+
+ \placeexample[here][ex:CaseInv]{Simple inverter.}
+ \startcombination[2*1]
+ {\typebufferhs{CaseInv}}{Haskell description using a Case expression.}
+ {\boxedgraphic{CaseInv}}{The architecture described by the Haskell description.}
+ \stopcombination
+
+ Slightly more complex (but very powerful) forms of choice are pattern
+ matches. A function can be defined in multiple clauses, where each clause
+ specifies a pattern. When the arguments match the pattern, the
+ corresponding clause will be used.
+
+ \startbuffer[PatternInv]
+ inv :: Bool -> Bool
+ inv True = False
+ inv False = True
+ \stopbuffer
+
+ \placeexample[here][ex:PatternInv]{Simple inverter using pattern matching.}
+ {\typebufferhs{CaseInv}}
+
+ The architecture described in \in{example}[ex:PatternInv] is of course the
+ same one as the one in \in{example}[ex:CaseInv]. The general interpration
+ of pattern matching is also similar to that of case expressions: Generate
+ hardware for each of the clause (like each of the clauses of a case
+ expression) and connect them to the function output through (a number of
+ nested) multiplexers. These multiplexers are driven by comparators and
+ other logic, that check each pattern in turn.
+
+ \section{Types}
+ Apart from giving structure to our hardware, we'll also need to provide
+ some way to translate the values used to hardware equivalents. In
+ particular, this means having to come up with a hardware equivalent for
+ every \emph{type} used in our program.
+
+ Since most functional languages have a lot of standard types that are
+ hard to translate (integers without a fixed size, lists without a static
+ length, etc.), we will start out by defining a number of \quote{builtin}
+ types ourselves. These types are builtin in the sense that our compiler
+ will have a fixed VHDL type for these. User defined types, on the other
+ hand, will have their hardware type derived directly from their Haskell
+ declaration automatically, according to the rules we sketch here.
+
+ \subsection{Builtin types}
+ \startdesc{Bit}
+ This is the most basic type available. It is mapped directly onto
+ the \type{std_logic} \small{VHDL} type. Mapping this to the
+ \type{bit} type might make more sense (since the Haskell version
+ only has two values), but using \type{std_logic} is more standard
+ (and allowed for some experimentation with don't care values)
+ \stopdesc
+ \startdesc{Bool}
+ This is a builtin Haskell type, but is translated exactly like the
+ Bit type. Supporting the Bool type is particularly useful to support
+ \hs{if ... then ... else ...} expressions, which always have a
+ \hs{Bool} value for the condition.
- This results in the following hardware:
-
- TODO: Pretty picture
+ A \hs{Bool} is translated to a \type{std_logic}, just like \hs{Bit}.
+ \stopdesc
+ \startdesc{SizedWord, SizedInt}
+ These are types to represent integers. \hs{SizedWord} is unsigned,
+ while \hs{SizedInt} is signed. These types are parameterized by a
+ length type, so you can define an unsigned word of 32 bits wide as
+ follows:
+
+ \starthaskell
+ type Word32 = SizedWord D32
+ \stophaskell
+
+ Here, \hs{D32} is the \emph{type level representation} of the number
+ 32. TODO: Introduce dependent types somewhere.
+
+ These types are translated to the \small{VHDL} \type{unsigned} and
+ \type{signed} respectively.
+ \stopdesc
+ \startdesc{RangedWord}
+ This is another type to describe integers, but unlike the previous
+ two it has no specific width, but an upper bound. This means that
+ its range is not limited to powers of two, but can be any number.
+ A \hs{RangedWord} only has an upper bound, its lower bound is
+ implicitly zero. There is a lot of added implementation complexity
+ when adding a lower bound and having just an upper bound was enough
+ for the primary purpose of this type: Typesafely indexing vectors.
+
+ This type is translated to the \type{unsigned} \small{VHDL} type.
+ \stopdesc
+ \startdesc{TFVec}
+ This is a vector type, with a fixed length. It has two type
+ parameters: Its length and the type of the elements contained in it.
+
+ A fixed size vector is translated to a \small{VHDL} array type.
+
+ The \hs{TF} in its name is a reference to the implementation used in
+ the prototype, which uses \emph{type families}.
+ \stopdesc
+
+ TODO: Reference Christiaan
+ \subsection{User-defined types}
+ There are three ways to define new types in Haskell: Algebraic
+ datatypes with the \hs{data} keyword, type synonyms with the \hs{type}
+ keyword and type renamings with the \hs{newtype} keyword. This
+ expclitely excludes more advanced type creation from \GHC extensions
+ such as type families, existential typing, \small{GADT}s, etc.
+
+ The first of these actually introduces a new type, for which we provide
+ the \VHDL translation below. The latter two only define new names for
+ existing types (where synonyms are completely interchangeable and
+ renamings need explicit conversion). Therefore, these don't need any
+ particular \VHDL translation, a synonym or renamed type will just use
+ the same representation as the equivalent type.
+
+ For algebraic types, we can make the following distinction:
+
+ \startdesc{Product types}
+ A product type is an algebraic datatype with a single constructor with
+ two or more fields. This is essentially a way to pack a few values
+ together in a record-like structure. In fact, the builtin tuple types
+ are just product types (and are thus supported in exactly the same
+ way).
+
+ The "product" in its name refers to the collection of values belonging
+ to this type. The collection for a product type is the cartesic
+ product of the collections for the types of its fields.
+
+ These types are translated to \VHDL record types, with one field for
+ every field in the constructor. This translation applies to all single
+ constructor algebraic datatypes, including those with no fields (unit
+ types) and just one field (which are technically not a product).
+ \stopdesc
+ \startdesc{Enumerated types}
+ An enumerated type is an algebraic datatype with multiple constructors, but
+ none of them have fields. This is essentially a way to get an
+ enum-like type containing alternatives.
+
+ Note that Haskell's \hs{Bool} type is also defined as an
+ enumeration type, but we have a fixed translation for that.
+
+ These types are translated to \VHDL enumerations, with one value for
+ each constructor. This allows references to these constructors to be
+ translated to the corresponding enumeration value.
+ \stopdesc
+ \startdesc{Sum types}
+ A sum type is an algebraic datatype with multiple constructors, where
+ the constructors have one or more fields. Technically, a type with
+ more than one field per constructor is a sum of products type, but
+ for our purposes this distinction does not really make a difference,
+ so we'll leave it out.
+
+ Sum types are currently not supported by the prototype, since its
+ translation is not so obvious as before. The can easily be emulated,
+ as we will see from an example:
+
+ \starthaskell
+ data Sum = A Bit Word | B Word
+ \stophaskell
+
+ An obvious way to translate this would be to create an enumeration to
+ distinguish the constructors and then create a big record that
+ contains all the fields of all the constructors. This is the same
+ translation that would result from the following enumeration and
+ product type (using a tuple for clarity):
+
+ \starthaskell
+ data SumC = A | B
+ type Sum = (SumC, Bit, Word, Word)
+ \stophaskell
+
+ Here, the \hs{SumC} type effectively signals which of the fields of
+ the \hs{Sum} type are valid, all the other ones have no useful value.
+
+ An obvious problem with this naive approach is the space usage: The
+ example above generates a fairly big \VHDL type. However, we can be
+ sure that the two \hs{Word}s in the \hs{Sum} type will never be valid
+ at the same time, so this is a waste of space.
+
+ Obviously, we could do some duplication detection here to reuse a
+ particular field for another constructor, but this would only
+ partially solve the problem. If I would, for example, have an array of
+ 8 bits and a 8 bit unsiged word, these are different types and could
+ not be shared. However, in the final hardware, both of these types
+ would simply be 8 bit connections, so we have a 100\% size increase by
+ not sharing these!
+ \stopdesc
+
+ Another interesting case is that of recursive types. In Haskell, an
+ algebraic datatype can be recursive: Any of its field types can be (or
+ contain) the type being defined. The most well-known recursive type is
+ probably the list type, which is defined is:
+
+ \starthaskell
+ data List a = Empty | Cons a (List a)
+ \stophaskell
+
+ Where \hs{Empty} is usually written as \hs{[]} and \hs{Cons} as \hs{:}.
+ This immediately shows the problem with recursive types: What hardware
+ type to allocate here? There is no way to statically determine how long
+ the list will be
+
+ In general, we can say we can never properly translated recursive types:
+ All recursive types have a potentially infinite value (even though in
+ practice they will have a bounded value, there is no way for the
+ compiler to determine an upper bound on its size.
\subsection{Partial application}
It should be obvious that we cannot generate hardware signals for all
represented as a signal or i/o port to a component.
From this, we can see that the above translation rules do not apply to a
- partial application. Let's look at an example:
+ partial application. \in{Example}[ex:Quadruple] shows an example use of
+ partial application and the corresponding architecture.
- \starthaskell
- -- | Multiply the input word by four.
- quadruple :: Word -> Word
- quadruple n = mul (mul n)
- where
- mul = (*) 2
- \stophaskell
+\startbuffer[Quadruple]
+-- | Multiply the input word by four.
+quadruple :: Word -> Word
+quadruple n = mul (mul n)
+ where
+ mul = (*) 2
+\stopbuffer
+
+ \startuseMPgraphic{Quadruple}
+ save in, two, mula, mulb, out;
+
+ % I/O ports
+ newCircle.in(btex $n$ etex) "framed(false)";
+ newCircle.two(btex $2$ etex) "framed(false)";
+ newCircle.out(btex $out$ etex) "framed(false)";
+
+ % Components
+ newCircle.mula(btex $\times$ etex);
+ newCircle.mulb(btex $\times$ etex);
- It should be clear that the above code describes the following hardware:
+ two.c = origin;
+ in.c = two.c + (0cm, 1cm);
+ mula.c = in.c + (2cm, 0cm);
+ mulb.c = mula.c + (2cm, 0cm);
+ out.c = mulb.c + (2cm, 0cm);
- TODO: Pretty picture
+ % Draw objects and lines
+ drawObj(in, two, mula, mulb, out);
+
+ nccurve(two)(mula) "angleA(0)", "angleB(45)";
+ nccurve(two)(mulb) "angleA(0)", "angleB(45)";
+ ncline(in)(mula);
+ ncline(mula)(mulb);
+ ncline(mulb)(out);
+ \stopuseMPgraphic
+
+ \placeexample[here][ex:Quadruple]{Simple three port and.}
+ \startcombination[2*1]
+ {\typebufferhs{Quadruple}}{Haskell description using function applications.}
+ {\boxedgraphic{Quadruple}}{The architecture described by the Haskell description.}
+ \stopcombination
Here, the definition of mul is a partial function application: It applies
\hs{2 :: Word} to the function \hs{(*) :: Word -> Word -> Word} resulting in
function boundaries), but eventually, the partial application will become
completely applied.
+ \section{Costless specialization}
+ Each (complete) function application in our description generates a
+ component instantiation, or a specific piece of hardware in the final
+ design. It is interesting to note that each application of a function
+ generates a \emph{separate} piece of hardware. In the final design, none
+ of the hardware is shared between applications, even when the applied
+ function is the same.
+
+ This is distinctly different from normal program compilation: Two separate
+ calls to the same function share the same machine code. Having more
+ machine code has implications for speed (due to less efficient caching)
+ and memory usage. For normal compilation, it is therefore important to
+ keep the amount of functions limited and maximize the code sharing.
+
+ When generating hardware, this is hardly an issue. Having more \quote{code
+ sharing} does reduce the amount of \small{VHDL} output (Since different
+ component instantiations still share the same component), but after
+ synthesis, the amount of hardware generated is not affected.
+
+ In particular, if we would duplicate all functions so that there is a
+ duplicate for every application in the program (\eg, each function is then
+ only applied exactly once), there would be no increase in hardware size
+ whatsoever.
+
+ TODO: Perhaps these next two sections are a bit too
+ implementation-oriented?
+
+ \subsection{Specialization}
+ Because of this, a common optimization technique called
+ \emph{specialization} is as good as free for hardware generation. Given
+ some function that has a \emph{domain} of $D$ (\eg, the set of all
+ possible arguments that could be applied), we create a specialized
+ function with exactly the same behaviour, but with an domain of $D'
+ \subset D$. This subset can be derived in all sort of ways, but commonly
+ this is done by limiting a polymorphic argument to a single type (\eg,
+ removing polymorphism) or by limiting an argument to just a single value
+ (\eg, cross-function constant propagation, effectively removing the
+ argument).
+
+ Since we limit the argument domain of the specialized function, its
+ definition can often be optimized further (since now more types or even
+ values of arguments are already know). By replacing any application of
+ the function that falls within the reduced domain by an application of
+ the specialized version, the code gets faster (but the code also gets
+ bigger, since we now have two versions instead of one!). If we apply
+ this technique often enough, we can often replace all applications of a
+ function by specialized versions, allowing the original function to be
+ removed (in some cases, this can even give a net reduction of the code
+ compared to the non-specialized version).
+
+ Specialization is useful for our hardware descriptions for functions
+ that contain arguments that cannot be translated to hardware directly
+ (polymorphic or higher order arguments, for example). If we can create
+ specialized functions that remove the argument, or make it translatable,
+ we can use specialization to make the original, untranslatable, function
+ obsolete.
+
+ \section{Higher order values}
+ What holds for partial application, can be easily generalized to any
+ higher order expression. This includes partial applications, plain
+ variables (e.g., a binder referring to a top level function), lambda
+ expressions and more complex expressions with a function type (a case
+ expression returning lambda's, for example).
+
+ Each of these values cannot be directly represented in hardware (just like
+ partial applications). Also, to make them representable, they need to be
+ applied: function variables and partial applications will then eventually
+ become complete applications, applied lambda expressions disappear by
+ applying β-reduction, etc.
+
+ So any higher order value will be \quote{pushed down} towards its
+ application just like partial applications. Whenever a function boundary
+ needs to be crossed, the called function can be specialized.
+
+ TODO: This is section should be improved
+
+ \section{Polymorphism}
+ In Haskell, values can be polymorphic: They can have multiple types. For
+ example, the function \hs{fst :: (a, b) -> a} is an example of a
+ polymorphic function: It works for tuples with any element types. Haskell
+ typeclasses allow a function to work on a specific set of types, but the
+ general idea is the same.
+
+% A type class is a collection of types for which some operations are
+% defined. It is thus possible for a value to be polymorphic while having
+% any number of \emph{class constraints}: The value is not defined for
+% every type, but only for types in the type class. An example of this is
+% the \hs{even :: (Integral a) => a -> Bool} function, which can map any
+% value of a type that is member of the \hs{Integral} type class
+
+ When generating hardware, polymorphism can't be easily translated. How
+ many wire will you lay down for a value that could have any type? When
+ type classes are involved, what hardware components will you lay down for
+ a class method (whose behaviour depends on the type of its arguments)?
+
+ Fortunately, we can again use the principle of specialization: Since every
+ function application generates separate pieces of hardware, we can know
+ the types of all arguments exactly. Provided that we don't use existential
+ typing, all of the polymorphic types in a function must depend on the
+ types of the arguments (In other words, the only way to introduce a type
+ variable is in a lambda abstraction). Our top level function must not have
+ a polymorphic type (otherwise we wouldn't know the hardware interface to
+ our top level function).
+
+ If a function is monomorphic, all values inside it are monomorphic as
+ well, so any function that is applied within the function can only be
+ applied to monomorphic values. The applied functions can then be
+ specialized to work just for these specific types, removing the
+ polymorphism from the applied functions as well.
+
+ By induction, this means that all functions that are (indirectly) called
+ by our top level function (meaning all functions that are translated in
+ the final hardware) become monomorphic.
+
\section{State}
A very important concept in hardware designs is \emph{state}. In a
stateless (or, \emph{combinatoric}) design, every output is a directly and solely dependent on the
In Haskell, this would look like \in{example}[ex:ExplicitAcc].
\startbuffer[ExplicitAcc]
-acc :: Word -> (State Word) -> (State Word, Word)
+-- input -> current state -> (new state, output)
+acc :: Word -> Word -> (Word, Word)
acc in (State s) = (State s', out)
where
out = s + in
looks the same from the outside, regardless of what state variables it
uses (or wether it's stateful at all).
- A direct consequence of this, is that if a function calls other
- stateful functions (\eg, has subcircuits), it has to somehow know the
- current state for these called functions. The only way to do this, is
- to put these \emph{substates} inside the caller's state. This means
- that a function's state is the sum of the states of all functions it
- calls, and its own state.
-
This approach is the one chosen for Cλash and will be examined more
closely below.
\subsection{Explicit state specification}
- Note about semantic correctness of top level state.
-
- Note about automatic ``down-pushing'' of state.
-
- Note about explicit state specification as the best solution.
-
- Note about substates
-
- Note about conditions on state variables and checking them.
-
- \subsection{Explicit state implementation}
- Recording state variables at the type level.
-
- Ideal: Type synonyms, since there is no additional code overhead for
- packing and unpacking. Downside: there is no explicit conversion in Core
- either, so type synonyms tend to get lost in expressions (they can be
- preserved in binders, but this makes implementation harder, since that
- statefulness of a value must be manually tracked).
-
- Less ideal: Newtype. Requires explicit packing and unpacking of function
- arguments. If you don't unpack substates, there is no overhead for
- (un)packing substates. This will result in many nested State constructors
- in a nested state type. \eg:
-
- \starttyping
- State (State Bit, State (State Word, Bit), Word)
- \stoptyping
-
- Alternative: Provide different newtypes for input and output state. This
- makes the code even more explicit, and typechecking can find even more
- errors. However, this requires defining two type synomyms for each
- stateful function instead of just one. \eg:
- \starttyping
- type AccumStateIn = StateIn Bit
- type AccumStateOut = StateOut Bit
- \stoptyping
- This also increases the possibility of having different input and output
- states. Checking for identical input and output state types is also
- harder, since each element in the state must be unpacked and compared
- separately.
-
- Alternative: Provide a type for the entire result type of a stateful
- function, not just the state part. \eg:
-
- \starttyping
- newtype Result state result = Result (state, result)
- \stoptyping
-
- This makes it easy to say "Any stateful function must return a
- \type{Result} type, without having to sort out result from state. However,
- this either requires a second type for input state (similar to
- \type{StateIn} / \type{StateOut} above), or requires the compiler to
- select the right argument for input state by looking at types (which works
- for complex states, but when that state has the same type as an argument,
- things get ambiguous) or by selecting a fixed (\eg, the last) argument,
- which might be limiting.
-
- \subsubsection{Example}
- As an example of the used approach, a simple averaging circuit, that lets
- the accumulation of the inputs be done by a subcomponent.
-
- \starttyping
- newtype State s = State s
-
- type AccumState = State Bit
- accum :: Word -> AccumState -> (AccumState, Word)
- accum i (State s) = (State (s + i), s + i)
-
- type AvgState = (AccumState, Word)
- avg :: Word -> AvgState -> (AvgState, Word)
- avg i (State s) = (State s', o)
- where
- (accums, count) = s
- -- Pass our input through the accumulator, which outputs a sum
- (accums', sum) = accum i accums
- -- Increment the count (which will be our new state)
- count' = count + 1
- -- Compute the average
- o = sum / count'
- s' = (accums', count')
- \stoptyping
-
- And the normalized, core-like versions:
-
- \starttyping
- accum i spacked = res
- where
- s = case spacked of (State s) -> s
- s' = s + i
- spacked' = State s'
- o = s + i
- res = (spacked', o)
-
- avg i spacked = res
- where
- s = case spacked of (State s) -> s
- accums = case s of (accums, \_) -> accums
- count = case s of (\_, count) -> count
- accumres = accum i accums
- accums' = case accumres of (accums', \_) -> accums'
- sum = case accumres of (\_, sum) -> sum
- count' = count + 1
- o = sum / count'
- s' = (accums', count')
- spacked' = State s'
- res = (spacked', o)
- \stoptyping
-
-
-
- As noted above, any component of a function's state that is a substate,
- \eg passed on as the state of another function, should have no influence
- on the hardware generated for the calling function. Any state-specific
- \small{VHDL} for this component can be generated entirely within the called
- function. So,we can completely leave out substates from any function.
-
- From this observation, we might think to remove the substates from a
- function's states alltogether, and leave only the state components which
- are actual states of the current function. While doing this would not
- remove any information needed to generate \small{VHDL} from the function, it would
- cause the function definition to become invalid (since we won't have any
- substate to pass to the functions anymore). We could solve the syntactic
- problems by passing \type{undefined} for state variables, but that would
- still break the code on the semantic level (\ie, the function would no
- longer be semantically equivalent to the original input).
-
- To keep the function definition correct until the very end of the process,
- we will not deal with (sub)states until we get to the \small{VHDL} generation.
- Here, we are translating from Core to \small{VHDL}, and we can simply not generate
- \small{VHDL} for substates, effectively removing the substate components
- alltogether.
-
- There are a few important points when ignore substates.
-
- First, we have to have some definition of "substate". Since any state
- argument or return value that represents state must be of the \type{State}
- type, we can simply look at its type. However, we must be careful to
- ignore only {\em substates}, and not a function's own state.
-
- In the example above, this means we should remove \type{accums'} from
- \type{s'}, but not throw away \type{s'} entirely. We should, however,
- remove \type{s'} from the output port of the function, since the state
- will be handled by a \small{VHDL} procedure within the function.
-
- When looking at substates, these can appear in two places: As part of an
- argument and as part of a return value. As noted above, these substates
- can only be used in very specific ways.
-
- \desc{State variables can appear as an argument.} When generating \small{VHDL}, we
- completely ignore the argument and generate no input port for it.
-
- \desc{State variables can be extracted from other state variables.} When
- extracting a state variable from another state variable, this always means
- we're extracting a substate, which we can ignore. So, we simply generate no
- \small{VHDL} for any extraction operation that has a state variable as a result.
-
- \desc{State variables can be passed to functions.} When passing a
- state variable to a function, this always means we're passing a substate
- to a subcomponent. The entire argument can simply be ingored in the
- resulting port map.
-
- \desc{State variables can be returned from functions.} When returning a
- state variable from a function (probably as a part of an algebraic
- datatype), this always mean we're returning a substate from a
- subcomponent. The entire state variable should be ignored in the resulting
- port map. The type binder of the binder that the function call is bound
- to should not include the state type either.
-
- \startdesc{State variables can be inserted into other variables.} When inserting
- a state variable into another variable (usually by constructing that new
- variable using its constructor), we can identify two cases:
-
- \startitemize
- \item The state is inserted into another state variable. In this case,
- the inserted state is a substate, and can be safely left out of the
- constructed variable.
- \item The state is inserted into a non-state variable. This happens when
- building up the return value of a function, where you put state and
- retsult variables together in an algebraic type (usually a tuple). In
- this case, we should leave the state variable out as well, since we
- don't want it to be included as an output port.
- \stopitemize
-
- So, in both cases, we can simply leave out the state variable from the
- resulting value. In the latter case, however, we should generate a state
- proc instead, which assigns the state variable to the input state variable
- at each clock tick.
- \stopdesc
-
- \desc{State variables can appear as (part of) a function result.} When
- generating \small{VHDL}, we can completely ignore any part of a function result
- that has a state type. If the entire result is a state type, this will
- mean the entity will not have an output port. Otherwise, the state
- elements will be removed from the type of the output port.
-
-
- Now, we know how to handle each use of a state variable separately. If we
- look at the whole, we can conclude the following:
-
- \startitemize
- \item A state unpack operation should not generate any \small{VHDL}. The binder
- to which the unpacked state is bound should still be declared, this signal
- will become the register and will hold the current state.
- \item A state pack operation should not generate any \small{VHDL}. The binder th
- which the packed state is bound should not be declared. The binder that is
- packed is the signal that will hold the new state.
- \item Any values of a State type should not be translated to \small{VHDL}. In
- particular, State elements should be removed from tuples (and other
- datatypes) and arguments with a state type should not generate ports.
- \item To make the state actually work, a simple \small{VHDL} proc should be
- generated. This proc updates the state at every clockcycle, by assigning
- the new state to the current state. This will be recognized by synthesis
- tools as a register specification.
- \stopitemize
-
-
- When applying these rules to the example program (in normal form), we will
- get the following result. All the parts that don't generate any value are
- crossed out, leaving some very boring assignments here and there.
-
+ We've seen the concept of explicit state in a simple example below, but
+ what are the implications of this approach?
+
+ \subsubsection{Substates}
+ Since a function's state is reflected directly in its type signature,
+ if a function calls other stateful functions (\eg, has subcircuits) it
+ has to somehow know the current state for these called functions. The
+ only way to do this, is to put these \emph{substates} inside the
+ caller's state. This means that a function's state is the sum of the
+ states of all functions it calls, and its own state.
+
+ This also means that the type of a function (at least the "state"
+ part) is dependent on its implementation and the functions it calls.
+ This is the major downside of this approach: The separation between
+ interface and implementation is limited. However, since Cλash is not
+ very suitable for separate compilation (see
+ \in{section}[sec:prototype:separate]) this is not a big problem in
+ practice. Additionally, when using a type synonym for the state type
+ of each function, we can still provide explicit type signatures
+ while keeping the state specification for a function near its
+ definition only.
- \starthaskell
- avg i --spacked-- = res
- where
- s = --case spacked of (State s) -> s--
- --accums = case s of (accums, \_) -> accums--
- count = case s of (--\_,-- count) -> count
- accumres = accum i --accums--
- --accums' = case accumres of (accums', \_) -> accums'--
- sum = case accumres of (--\_,-- sum) -> sum
- count' = count + 1
- o = sum / count'
- s' = (--accums',-- count')
- --spacked' = State s'--
- res = (--spacked',-- o)
- \stophaskell
-
- When we would really leave out the crossed out parts, we get a slightly
- weird program: There is a variable \type{s} which has no value, and there
- is a variable \type{s'} that is never used. Together, these two will form
- the state proc of the function. \type{s} contains the "current" state,
- \type{s'} is assigned the "next" state. So, at the end of each clock
- cycle, \type{s'} should be assigned to \type{s}.
-
- Note that the definition of \type{s'} is not removed, even though one
- might think it as having a state type. Since the state type has a single
- argument constructor \type{State}, some type that should be the resulting
- state should always be explicitly packed with the State constructor,
- allowing us to remove the packed version, but still generate \small{VHDL} for the
- unpacked version (of course with any substates removed).
-
- As you can see, the definition of \type{s'} is still present, since it
- does not have a state type (The State constructor. The \type{accums'} substate has been removed,
- leaving us just with the state of \type{avg} itself.
- \subsection{Initial state}
- How to specify the initial state? Cannot be done inside a hardware
- function, since the initial state is its own state argument for the first
- call (unless you add an explicit, synchronous reset port).
-
- External init state is natural for simulation.
+ \subsubsection{...}
+ We need some way to know which arguments should become input ports and
+ which argument(s?) should become the current state (\eg, be bound to
+ the register outputs). This does not hold holds not just for the top
+ level function, but also for any subfunctions. Or could we perhaps
+ deduce the statefulness of subfunctions by analyzing the flow of data
+ in the calling functions?
+
+ To explore this matter, we make an interesting observation: We get
+ completely correct behaviour when we put all state registers in the
+ top level entity (or even outside of it). All of the state arguments
+ and results on subfunctions are treated as normal input and output
+ ports. Effectively, a stateful function results in a stateless
+ hardware component that has one of its input ports connected to the
+ output of a register and one of its output ports connected to the
+ input of the same register.
+
+ TODO: Example?
+
+ Of course, even though the hardware described like this has the
+ correct behaviour, unless the layout tool does smart optimizations,
+ there will be a lot of extra wire in the design (since registers will
+ not be close to the component that uses them). Also, when working with
+ the generated \small{VHDL} code, there will be a lot of extra ports
+ just to pass one state values, which can get quite confusing.
+
+ To fix this, we can simply \quote{push} the registers down into the
+ subcircuits. When we see a register that is connected directly to a
+ subcircuit, we remove the corresponding input and output port and put
+ the register inside the subcircuit instead. This is slightly less
+ trivial when looking at the Haskell code instead of the resulting
+ circuit, but the idea is still the same.
+
+ TODO: Example?
+
+ However, when applying this technique, we might push registers down
+ too far. When you intend to store a result of a stateless subfunction
+ in the caller's state and pass the current value of that state
+ variable to that same function, the register might get pushed down too
+ far. It is impossible to distinguish this case from similar code where
+ the called function is in fact stateful. From this we can conclude
+ that we have to either:
+
+ \startitemize
+ \item accept that the generated hardware might not be exactly what we
+ intended, in some specific cases. In most cases, the hardware will be
+ what we intended.
+ \item explicitely annotate state arguments and results in the input
+ description.
+ \stopitemize
+
+ The first option causes (non-obvious) exceptions in the language
+ intepretation. Also, automatically determining where registers should
+ end up is easier to implement correctly with explicit annotations, so
+ for these reasons we will look at how this annotations could work.
+
+
+ TODO: Note about conditions on state variables and checking them.
+
+ \subsection{Explicit state annotation}
+ To make our stateful descriptions unambigious and easier to translate,
+ we need some way for the developer to describe which arguments and
+ results are intended to become stateful.
+
+ Roughly, we have two ways to achieve this:
+ \startitemize[KR]
+ \item Use some kind of annotation method or syntactic construction in
+ the language to indicate exactly which argument and (part of the)
+ result is stateful. This means that the annotation lives
+ \quote{outside} of the function, it is completely invisible when
+ looking at the function body.
+ \item Use some kind of annotation on the type level, \eg give stateful
+ arguments and (part of) results a different type. This has the
+ potential to make this annotation visible inside the function as well,
+ such that when looking at a value inside the function body you can
+ tell if it's stateful by looking at its type. This could possibly make
+ the translation process a lot easier, since less analysis of the
+ program flow might be required.
+ \stopitemize
- External init state works for hardware generation as well.
+ From these approaches, the type level \quote{annotations} have been
+ implemented in Cλash. \in{Section}[sec:prototype:statetype] expands on
+ the possible ways this could have been implemented.
- Implementation issues: state splitting, linking input to output state,
- checking usage constraints on state variables.
+ TODO: Say something about dependent types and fixed size vectors
\section[sec:recursion]{Recursion}
An import concept in functional languages is recursion. In it's most basic