1 \chapter[chap:description]{Hardware description}
2 This chapter will provide an overview of the hardware description language
3 that was created and the issues that have arisen in the process. It will
4 focus on the issues of the language, not the implementation.
6 When translating Haskell to hardware, we need to make choices in what kind
7 of hardware to generate for what Haskell constructs. When faced with
8 choices, we've tried to stick with the most obvious choice wherever
9 possible. In a lot of cases, when you look at a hardware description it is
10 comletely clear what hardware is described. We want our translator to
11 generate exactly that hardware whenever possible, to minimize the amount of
12 surprise for people working with it.
14 In this chapter we try to describe how we interpret a Haskell program from a
15 hardware perspective. We provide a description of each Haskell language
16 element that needs translation, to provide a clear picture of what is
19 \section{Function application}
20 The basic syntactic element of a functional program are functions and
21 function application. These have a single obvious \small{VHDL} translation: Each
22 function becomes a hardware component, where each argument is an input port
23 and the result value is the output port.
25 Each function application in turn becomes component instantiation. Here, the
26 result of each argument expression is assigned to a signal, which is mapped
27 to the corresponding input port. The output port of the function is also
28 mapped to a signal, which is used as the result of the application.
30 An example of a simple program using only function application would be:
33 -- | A simple function that returns the and of three bits
34 and3 :: Bit -> Bit -> Bit -> Bit
35 and3 a b c = and (and a b) c
38 This results in the following hardware:
42 \subsection{Partial application}
43 It should be obvious that we cannot generate hardware signals for all
44 expressions we can express in Haskell. The most obvious criterium for this
45 is the type of an expression. We will see more of this below, but for now it
46 should be obvious that any expression of a function type cannot be
47 represented as a signal or i/o port to a component.
49 From this, we can see that the above translation rules do not apply to a
50 partial application. Let's look at an example:
53 -- | Multiply the input word by four.
54 quadruple :: Word -> Word
55 quadruple n = mul (mul n)
60 It should be clear that the above code describes the following hardware:
64 Here, the definition of mul is a partial function application: It applies
65 \hs{2 :: Word} to the function \hs{(*) :: Word -> Word -> Word} resulting in
66 the expression \hs{(*) 2 :: Word -> Word}. Since this resulting expression
67 is again a function, we can't generate hardware for it directly. This is
68 because the hardware to generate for \hs{mul} depends completely on where
69 and how it is used. In this example, it is even used twice!
71 However, it is clear that the above hardware description actually describes
72 valid hardware. In general, we can see that any partial applied function
73 must eventually become completely applied, at which point we can generate
74 hardware for it using the rules for function application above. It might
75 mean that a partial application is passed around quite a bit (even beyond
76 function boundaries), but eventually, the partial application will become
80 A very important concept in hardware designs is \emph{state}. In a
81 stateless (or, \emph{combinatoric}) design, every output is a directly and solely dependent on the
82 inputs. In a stateful design, the outputs can depend on the history of
83 inputs, or the \emph{state}. State is usually stored in \emph{registers},
84 which retain their value during a clockcycle, and are typically updated at
85 the start of every clockcycle. Since the updating of the state is tightly
86 coupled (synchronized) to the clock signal, these state updates are often
87 called \emph{synchronous}.
89 To make our hardware description language useful to describe more that
90 simple combinatoric designs, we'll need to be able to describe state in
93 \subsection{Approaches to state}
94 In Haskell, functions are always pure (except when using unsafe
95 functions like \hs{unsafePerformIO}, which should be prevented whenever
96 possible). This means that the output of a function solely depends on
97 its inputs. If you evaluate a given function with given inputs, it will
98 always provide the same output.
102 This is a perfect match for a combinatoric circuit, where the output
103 also soley depend on the inputs. However, when state is involved, this
104 no longer holds. Since we're in charge of our own language, we could
105 remove this purity constraint and allow a function to return different
106 values depending on the cycle in which it is evaluated (or rather, the
107 current state). However, this means that all kinds of interesting
108 properties of our functional language get lost, and all kinds of
109 transformations and optimizations might no longer be meaning preserving.
111 Provided that we want to keep the function pure, the current state has
112 to be present in the function's arguments in some way. There seem to be
113 two obvious ways to do this: Adding the current state as an argument, or
114 including the full history of each argument.
116 \subsubsection{Stream arguments and results}
117 Including the entire history of each input (\eg, the value of that
118 input for each previous clockcycle) is an obvious way to make outputs
119 depend on all previous input. This is easily done by making every
120 input a list instead of a single value, containing all previous values
121 as well as the current value.
123 An obvious downside of this solution is that on each cycle, all the
124 previous cycles must be resimulated to obtain the current state. To do
125 this, it might be needed to have a recursive helper function as well,
126 wich might be hard to properly analyze by the compiler.
128 A slight variation on this approach is one taken by some of the other
129 functional \small{HDL}s in the field (TODO: References to Lava,
130 ForSyDe, ...): Make functions operate on complete streams. This means
131 that a function is no longer called on every cycle, but just once. It
132 takes stream as inputs instead of values, where each stream contains
133 all the values for every clockcycle since system start. This is easily
134 modeled using an (infinite) list, with one element for each clock
135 cycle. Since the funciton is only evaluated once, its output is also a
136 stream. Note that, since we are working with infinite lists and still
137 want to be able to simulate the system cycle-by-cycle, this relies
138 heavily on the lazy semantics of Haskell.
140 Since our inputs and outputs are streams, all other (intermediate)
141 values must be streams. All of our primitive operators (\eg, addition,
142 substraction, bitwise operations, etc.) must operate on streams as
143 well (note that changing a single-element operation to a stream
144 operation can done with \hs{map}, \hs{zipwith}, etc.).
146 Note that the concept of \emph{state} is no more than having some way
147 to communicate a value from one cycle to the next. By introducing a
148 \hs{delay} function, we can do exactly that: Delay (each value in) a
149 stream so that we can "look into" the past. This \hs{delay} function
150 simply outputs a stream where each value is the same as the input
151 value, but shifted one cycle. This causes a \quote{gap} at the
152 beginning of the stream: What is the value of the delay output in the
153 first cycle? For this, the \hs{delay} function has a second input
154 (which is a value, not a stream!).
156 \in{Example}[ex:DelayAcc] shows a simple accumulator expressed in this
159 \startbuffer[DelayAcc]
160 acc :: Stream Word -> Stream Word
163 out = (delay out 0) + in
166 \startuseMPgraphic{DelayAcc}
167 save in, out, add, reg;
170 newCircle.in(btex $in$ etex) "framed(false)";
171 newCircle.out(btex $out$ etex) "framed(false)";
174 newReg.reg("") "dx(4mm)", "dy(6mm)", "reflect(true)";
175 newCircle.add(btex + etex);
178 add.c = in.c + (2cm, 0cm);
179 out.c = add.c + (2cm, 0cm);
180 reg.c = add.c + (0cm, 2cm);
182 % Draw objects and lines
183 drawObj(in, out, add, reg);
185 nccurve(add)(reg) "angleA(0)", "angleB(180)", "posB(d)";
186 nccurve(reg)(add) "angleA(180)", "angleB(-45)", "posA(out)";
192 \placeexample[here][ex:DelayAcc]{Simple accumulator architecture.}
193 \startcombination[2*1]
194 {\typebufferhs{DelayAcc}}{Haskell description using streams.}
195 {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
199 This notation can be confusing (especially due to the loop in the
200 definition of out), but is essentially easy to interpret. There is a
201 single call to delay, resulting in a circuit with a single register,
202 whose input is connected to \hs{outl (which is the output of the
203 adder)}, and it's output is the \hs{delay out 0} (which is connected
204 to one of the adder inputs).
206 This notation has a number of downsides, amongst which are limited
207 readability and ambiguity in the interpretation. TODO: Reference
210 \subsubsection{Explicit state arguments and results}
211 A more explicit way to model state, is to simply add an extra argument
212 containing the current state value. This allows an output to depend on
213 both the inputs as well as the current state while keeping the
214 function pure (letting the result depend only on the arguments), since
215 the current state is now an argument.
217 In Haskell, this would look like \in{example}[ex:ExplicitAcc].
219 \startbuffer[ExplicitAcc]
220 acc :: Word -> (State Word) -> (State Word, Word)
221 acc in (State s) = (State s', out)
227 \placeexample[here][ex:ExplicitAcc]{Simple accumulator architecture.}
228 \startcombination[2*1]
229 {\typebufferhs{ExplicitAcc}}{Haskell description using explicit state arguments.}
230 % Picture is identical to the one we had just now.
231 {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
234 This approach makes a function's state very explicit, which state
235 variables are used by a function can be completely determined from its
236 type signature (as opposed to the stream approach, where a function
237 looks the same from the outside, regardless of what state variables it
238 uses (or wether it's stateful at all).
240 A direct consequence of this, is that if a function calls other
241 stateful functions (\eg, has subcircuits), it has to somehow know the
242 current state for these called functions. The only way to do this, is
243 to put these \emph{substates} inside the caller's state. This means
244 that a function's state is the sum of the states of all functions it
245 calls, and its own state.
247 This approach is the one chosen for Cλash and will be examined more
250 \subsection{Explicit state specification}
251 Note about semantic correctness of top level state.
253 Note about automatic ``down-pushing'' of state.
255 Note about explicit state specification as the best solution.
259 Note about conditions on state variables and checking them.
261 \subsection{Explicit state implementation}
262 Recording state variables at the type level.
264 Ideal: Type synonyms, since there is no additional code overhead for
265 packing and unpacking. Downside: there is no explicit conversion in Core
266 either, so type synonyms tend to get lost in expressions (they can be
267 preserved in binders, but this makes implementation harder, since that
268 statefulness of a value must be manually tracked).
270 Less ideal: Newtype. Requires explicit packing and unpacking of function
271 arguments. If you don't unpack substates, there is no overhead for
272 (un)packing substates. This will result in many nested State constructors
273 in a nested state type. \eg:
276 State (State Bit, State (State Word, Bit), Word)
279 Alternative: Provide different newtypes for input and output state. This
280 makes the code even more explicit, and typechecking can find even more
281 errors. However, this requires defining two type synomyms for each
282 stateful function instead of just one. \eg:
284 type AccumStateIn = StateIn Bit
285 type AccumStateOut = StateOut Bit
287 This also increases the possibility of having different input and output
288 states. Checking for identical input and output state types is also
289 harder, since each element in the state must be unpacked and compared
292 Alternative: Provide a type for the entire result type of a stateful
293 function, not just the state part. \eg:
296 newtype Result state result = Result (state, result)
299 This makes it easy to say "Any stateful function must return a
300 \type{Result} type, without having to sort out result from state. However,
301 this either requires a second type for input state (similar to
302 \type{StateIn} / \type{StateOut} above), or requires the compiler to
303 select the right argument for input state by looking at types (which works
304 for complex states, but when that state has the same type as an argument,
305 things get ambiguous) or by selecting a fixed (\eg, the last) argument,
306 which might be limiting.
308 \subsubsection{Example}
309 As an example of the used approach, a simple averaging circuit, that lets
310 the accumulation of the inputs be done by a subcomponent.
313 newtype State s = State s
315 type AccumState = State Bit
316 accum :: Word -> AccumState -> (AccumState, Word)
317 accum i (State s) = (State (s + i), s + i)
319 type AvgState = (AccumState, Word)
320 avg :: Word -> AvgState -> (AvgState, Word)
321 avg i (State s) = (State s', o)
324 -- Pass our input through the accumulator, which outputs a sum
325 (accums', sum) = accum i accums
326 -- Increment the count (which will be our new state)
328 -- Compute the average
330 s' = (accums', count')
333 And the normalized, core-like versions:
336 accum i spacked = res
338 s = case spacked of (State s) -> s
346 s = case spacked of (State s) -> s
347 accums = case s of (accums, \_) -> accums
348 count = case s of (\_, count) -> count
349 accumres = accum i accums
350 accums' = case accumres of (accums', \_) -> accums'
351 sum = case accumres of (\_, sum) -> sum
354 s' = (accums', count')
361 As noted above, any component of a function's state that is a substate,
362 \eg passed on as the state of another function, should have no influence
363 on the hardware generated for the calling function. Any state-specific
364 \small{VHDL} for this component can be generated entirely within the called
365 function. So,we can completely leave out substates from any function.
367 From this observation, we might think to remove the substates from a
368 function's states alltogether, and leave only the state components which
369 are actual states of the current function. While doing this would not
370 remove any information needed to generate \small{VHDL} from the function, it would
371 cause the function definition to become invalid (since we won't have any
372 substate to pass to the functions anymore). We could solve the syntactic
373 problems by passing \type{undefined} for state variables, but that would
374 still break the code on the semantic level (\ie, the function would no
375 longer be semantically equivalent to the original input).
377 To keep the function definition correct until the very end of the process,
378 we will not deal with (sub)states until we get to the \small{VHDL} generation.
379 Here, we are translating from Core to \small{VHDL}, and we can simply not generate
380 \small{VHDL} for substates, effectively removing the substate components
383 There are a few important points when ignore substates.
385 First, we have to have some definition of "substate". Since any state
386 argument or return value that represents state must be of the \type{State}
387 type, we can simply look at its type. However, we must be careful to
388 ignore only {\em substates}, and not a function's own state.
390 In the example above, this means we should remove \type{accums'} from
391 \type{s'}, but not throw away \type{s'} entirely. We should, however,
392 remove \type{s'} from the output port of the function, since the state
393 will be handled by a \small{VHDL} procedure within the function.
395 When looking at substates, these can appear in two places: As part of an
396 argument and as part of a return value. As noted above, these substates
397 can only be used in very specific ways.
399 \desc{State variables can appear as an argument.} When generating \small{VHDL}, we
400 completely ignore the argument and generate no input port for it.
402 \desc{State variables can be extracted from other state variables.} When
403 extracting a state variable from another state variable, this always means
404 we're extracting a substate, which we can ignore. So, we simply generate no
405 \small{VHDL} for any extraction operation that has a state variable as a result.
407 \desc{State variables can be passed to functions.} When passing a
408 state variable to a function, this always means we're passing a substate
409 to a subcomponent. The entire argument can simply be ingored in the
412 \desc{State variables can be returned from functions.} When returning a
413 state variable from a function (probably as a part of an algebraic
414 datatype), this always mean we're returning a substate from a
415 subcomponent. The entire state variable should be ignored in the resulting
416 port map. The type binder of the binder that the function call is bound
417 to should not include the state type either.
419 \startdesc{State variables can be inserted into other variables.} When inserting
420 a state variable into another variable (usually by constructing that new
421 variable using its constructor), we can identify two cases:
424 \item The state is inserted into another state variable. In this case,
425 the inserted state is a substate, and can be safely left out of the
426 constructed variable.
427 \item The state is inserted into a non-state variable. This happens when
428 building up the return value of a function, where you put state and
429 retsult variables together in an algebraic type (usually a tuple). In
430 this case, we should leave the state variable out as well, since we
431 don't want it to be included as an output port.
434 So, in both cases, we can simply leave out the state variable from the
435 resulting value. In the latter case, however, we should generate a state
436 proc instead, which assigns the state variable to the input state variable
440 \desc{State variables can appear as (part of) a function result.} When
441 generating \small{VHDL}, we can completely ignore any part of a function result
442 that has a state type. If the entire result is a state type, this will
443 mean the entity will not have an output port. Otherwise, the state
444 elements will be removed from the type of the output port.
447 Now, we know how to handle each use of a state variable separately. If we
448 look at the whole, we can conclude the following:
451 \item A state unpack operation should not generate any \small{VHDL}. The binder
452 to which the unpacked state is bound should still be declared, this signal
453 will become the register and will hold the current state.
454 \item A state pack operation should not generate any \small{VHDL}. The binder th
455 which the packed state is bound should not be declared. The binder that is
456 packed is the signal that will hold the new state.
457 \item Any values of a State type should not be translated to \small{VHDL}. In
458 particular, State elements should be removed from tuples (and other
459 datatypes) and arguments with a state type should not generate ports.
460 \item To make the state actually work, a simple \small{VHDL} proc should be
461 generated. This proc updates the state at every clockcycle, by assigning
462 the new state to the current state. This will be recognized by synthesis
463 tools as a register specification.
467 When applying these rules to the example program (in normal form), we will
468 get the following result. All the parts that don't generate any value are
469 crossed out, leaving some very boring assignments here and there.
473 avg i --spacked-- = res
475 s = --case spacked of (State s) -> s--
476 --accums = case s of (accums, \_) -> accums--
477 count = case s of (--\_,-- count) -> count
478 accumres = accum i --accums--
479 --accums' = case accumres of (accums', \_) -> accums'--
480 sum = case accumres of (--\_,-- sum) -> sum
483 s' = (--accums',-- count')
484 --spacked' = State s'--
485 res = (--spacked',-- o)
488 When we would really leave out the crossed out parts, we get a slightly
489 weird program: There is a variable \type{s} which has no value, and there
490 is a variable \type{s'} that is never used. Together, these two will form
491 the state proc of the function. \type{s} contains the "current" state,
492 \type{s'} is assigned the "next" state. So, at the end of each clock
493 cycle, \type{s'} should be assigned to \type{s}.
495 Note that the definition of \type{s'} is not removed, even though one
496 might think it as having a state type. Since the state type has a single
497 argument constructor \type{State}, some type that should be the resulting
498 state should always be explicitly packed with the State constructor,
499 allowing us to remove the packed version, but still generate \small{VHDL} for the
500 unpacked version (of course with any substates removed).
502 As you can see, the definition of \type{s'} is still present, since it
503 does not have a state type (The State constructor. The \type{accums'} substate has been removed,
504 leaving us just with the state of \type{avg} itself.
505 \subsection{Initial state}
506 How to specify the initial state? Cannot be done inside a hardware
507 function, since the initial state is its own state argument for the first
508 call (unless you add an explicit, synchronous reset port).
510 External init state is natural for simulation.
512 External init state works for hardware generation as well.
514 Implementation issues: state splitting, linking input to output state,
515 checking usage constraints on state variables.
517 \section[sec:recursion]{Recursion}
518 An import concept in functional languages is recursion. In it's most basic
519 form, recursion is a function that is defined in terms of itself. This
520 usually requires multiple evaluations of this function, with changing
521 arguments, until eventually an evaluation of the function no longer requires
524 Recursion in a hardware description is a bit of a funny thing. Usually,
525 recursion is associated with a lot of nondeterminism, stack overflows, but
526 also flexibility and expressive power.
528 Given the notion that each function application will translate to a
529 component instantiation, we are presented with a problem. A recursive
530 function would translate to a component that contains itself. Or, more
531 precisely, that contains an instance of itself. This instance would again
532 contain an instance of itself, and again, into infinity. This is obviously a
533 problem for generating hardware.
535 This is expected for functions that describe infinite recursion. In that
536 case, we can't generate hardware that shows correct behaviour in a single
537 cycle (at best, we could generate hardware that needs an infinite number of
540 However, most recursive hardware descriptions will describe finite
541 recursion. This is because the recursive call is done conditionally. There
542 is usually a case statement where at least one alternative does not contain
543 the recursive call, which we call the "base case". If, for each call to the
544 recursive function, we would be able to detect which alternative applies,
545 we would be able to remove the case expression and leave only the base case
546 when it applies. This will ensure that expanding the recursive functions
547 will terminate after a bounded number of expansions.
549 This does imply the extra requirement that the base case is detectable at
550 compile time. In particular, this means that the decision between the base
551 case and the recursive case must not depend on runtime data.
553 \subsection{List recursion}
554 The most common deciding factor in recursion is the length of a list that is
555 passed in as an argument. Since we represent lists as vectors that encode
556 the length in the vector type, it seems easy to determine the base case. We
557 can simply look at the argument type for this. However, it turns out that
558 this is rather non-trivial to write down in Haskell in the first place. As
559 an example, we would like to write down something like this:
562 sum :: Vector n Word -> Word
563 sum xs = case null xs of
565 False -> head xs + sum (tail xs)
568 However, the typechecker will now use the following reasoning (element type
569 of the vector is left out):
572 \item tail has the type \hs{(n > 0) => Vector n -> Vector (n - 1)}
573 \item This means that xs must have the type \hs{(n > 0) => Vector n}
574 \item This means that sum must have the type \hs{(n > 0) => Vector n -> a}
575 \item sum is called with the result of tail as an argument, which has the
576 type \hs{Vector n} (since \hs{(n > 0) => n - 1 == m}).
577 \item This means that sum must have the type \hs{Vector n -> a}
578 \item This is a contradiction between the type deduced from the body of sum
579 (the input vector must be non-empty) and the use of sum (the input vector
580 could have any length).
583 As you can see, using a simple case at value level causes the type checker
584 to always typecheck both alternatives, which can't be done! This is a
585 fundamental problem, that would seem perfectly suited for a type class.
586 Considering that we need to switch between to implementations of the sum
587 function, based on the type of the argument, this sounds like the perfect
588 problem to solve with a type class. However, this approach has its own
589 problems (not the least of them that you need to define a new typeclass for
590 every recursive function you want to define).
592 Another approach tried involved using GADTs to be able to do pattern
593 matching on empty / non empty lists. While this worked partially, it also
594 created problems with more complex expressions.
596 TODO: How much detail should there be here? I can probably refer to
599 Evaluating all possible (and non-possible) ways to add recursion to our
600 descriptions, it seems better to leave out list recursion alltogether. This
601 allows us to focus on other interesting areas instead. By including
602 (builtin) support for a number of higher order functions like map and fold,
603 we can still express most of the things we would use list recursion for.
605 \subsection{General recursion}
606 Of course there are other forms of recursion, that do not depend on the
607 length (and thus type) of a list. For example, simple recursion using a
608 counter could be expressed, but only translated to hardware for a fixed
609 number of iterations. Also, this would require extensive support for compile
610 time simplification (constant propagation) and compile time evaluation
611 (evaluation constant comparisons), to ensure non-termination. Even then, it
612 is hard to really guarantee termination, since the user (or GHC desugarer)
613 might use some obscure notation that results in a corner case of the
614 simplifier that is not caught and thus non-termination.
616 Due to these complications, we leave other forms of recursion as