1 \chapter[chap:description]{Hardware description}
2 This chapter will provide an overview of the hardware description language
3 that was created and the issues that have arisen in the process. It will
4 focus on the issues of the language, not the implementation.
6 When translating Haskell to hardware, we need to make choices in what kind
7 of hardware to generate for what Haskell constructs. When faced with
8 choices, we've tried to stick with the most obvious choice wherever
9 possible. In a lot of cases, when you look at a hardware description it is
10 comletely clear what hardware is described. We want our translator to
11 generate exactly that hardware whenever possible, to minimize the amount of
12 surprise for people working with it.
14 In this chapter we try to describe how we interpret a Haskell program from a
15 hardware perspective. We provide a description of each Haskell language
16 element that needs translation, to provide a clear picture of what is
19 \section{Function application}
20 The basic syntactic element of a functional program are functions and
21 function application. These have a single obvious \small{VHDL} translation: Each
22 function becomes a hardware component, where each argument is an input port
23 and the result value is the output port.
25 Each function application in turn becomes component instantiation. Here, the
26 result of each argument expression is assigned to a signal, which is mapped
27 to the corresponding input port. The output port of the function is also
28 mapped to a signal, which is used as the result of the application.
30 \in{Example}[ex:And3] shows a simple program using only function
31 application and the corresponding architecture.
34 -- | A simple function that returns
35 -- the and of three bits
36 and3 :: Bit -> Bit -> Bit -> Bit
37 and3 a b c = and (and a b) c
40 \startuseMPgraphic{And3}
41 save a, b, c, anda, andb, out;
44 newCircle.a(btex $a$ etex) "framed(false)";
45 newCircle.b(btex $b$ etex) "framed(false)";
46 newCircle.c(btex $c$ etex) "framed(false)";
47 newCircle.out(btex $out$ etex) "framed(false)";
50 newCircle.anda(btex $and$ etex);
51 newCircle.andb(btex $and$ etex);
54 b.c = a.c + (0cm, 1cm);
55 c.c = b.c + (0cm, 1cm);
56 anda.c = midpoint(a.c, b.c) + (2cm, 0cm);
57 andb.c = midpoint(b.c, c.c) + (4cm, 0cm);
59 out.c = andb.c + (2cm, 0cm);
61 % Draw objects and lines
62 drawObj(a, b, c, anda, andb, out);
64 ncarc(a)(anda) "arcangle(-10)";
71 \placeexample[here][ex:And3]{Simple three port and.}
72 \startcombination[2*1]
73 {\typebufferhs{And3}}{Haskell description using function applications.}
74 {\boxedgraphic{And3}}{The architecture described by the Haskell description.}
77 TODO: Define top level function and subfunctions/circuits.
79 \subsection{Partial application}
80 It should be obvious that we cannot generate hardware signals for all
81 expressions we can express in Haskell. The most obvious criterium for this
82 is the type of an expression. We will see more of this below, but for now it
83 should be obvious that any expression of a function type cannot be
84 represented as a signal or i/o port to a component.
86 From this, we can see that the above translation rules do not apply to a
87 partial application. \in{Example}[ex:Quadruple] shows an example use of
88 partial application and the corresponding architecture.
90 \startbuffer[Quadruple]
91 -- | Multiply the input word by four.
92 quadruple :: Word -> Word
93 quadruple n = mul (mul n)
98 \startuseMPgraphic{Quadruple}
99 save in, two, mula, mulb, out;
102 newCircle.in(btex $n$ etex) "framed(false)";
103 newCircle.two(btex $2$ etex) "framed(false)";
104 newCircle.out(btex $out$ etex) "framed(false)";
107 newCircle.mula(btex $\times$ etex);
108 newCircle.mulb(btex $\times$ etex);
111 in.c = two.c + (0cm, 1cm);
112 mula.c = in.c + (2cm, 0cm);
113 mulb.c = mula.c + (2cm, 0cm);
114 out.c = mulb.c + (2cm, 0cm);
116 % Draw objects and lines
117 drawObj(in, two, mula, mulb, out);
119 nccurve(two)(mula) "angleA(0)", "angleB(45)";
120 nccurve(two)(mulb) "angleA(0)", "angleB(45)";
126 \placeexample[here][ex:Quadruple]{Simple three port and.}
127 \startcombination[2*1]
128 {\typebufferhs{Quadruple}}{Haskell description using function applications.}
129 {\boxedgraphic{Quadruple}}{The architecture described by the Haskell description.}
132 Here, the definition of mul is a partial function application: It applies
133 \hs{2 :: Word} to the function \hs{(*) :: Word -> Word -> Word} resulting in
134 the expression \hs{(*) 2 :: Word -> Word}. Since this resulting expression
135 is again a function, we can't generate hardware for it directly. This is
136 because the hardware to generate for \hs{mul} depends completely on where
137 and how it is used. In this example, it is even used twice!
139 However, it is clear that the above hardware description actually describes
140 valid hardware. In general, we can see that any partial applied function
141 must eventually become completely applied, at which point we can generate
142 hardware for it using the rules for function application above. It might
143 mean that a partial application is passed around quite a bit (even beyond
144 function boundaries), but eventually, the partial application will become
147 \section{Costless specialization}
148 Each (complete) function application in our description generates a
149 component instantiation, or a specific piece of hardware in the final
150 design. It is interesting to note that each application of a function
151 generates a \emph{separate} piece of hardware. In the final design, none
152 of the hardware is shared between applications, even when the applied
153 function is the same.
155 This is distinctly different from normal program compilation: Two separate
156 calls to the same function share the same machine code. Having more
157 machine code has implications for speed (due to less efficient caching)
158 and memory usage. For normal compilation, it is therefore important to
159 keep the amount of functions limited and maximize the code sharing.
161 When generating hardware, this is hardly an issue. Having more \quote{code
162 sharing} does reduce the amount of \small{VHDL} output (Since different
163 component instantiations still share the same component), but after
164 synthesis, the amount of hardware generated is not affected.
166 In particular, if we would duplicate all functions so that there is a
167 duplicate for every application in the program (\eg, each function is then
168 only applied exactly once), there would be no increase in hardware size
171 TODO: Perhaps these next two sections are a bit too
172 implementation-oriented?
174 \subsection{Specialization}
175 Because of this, a common optimization technique called
176 \emph{specialization} is as good as free for hardware generation. Given
177 some function that has a \emph{domain} of $D$ (\eg, the set of all
178 possible arguments that could be applied), we create a specialized
179 function with exactly the same behaviour, but with an domain of $D'
180 \subset D$. This subset can be derived in all sort of ways, but commonly
181 this is done by limiting a polymorphic argument to a single type (\eg,
182 removing polymorphism) or by limiting an argument to just a single value
183 (\eg, cross-function constant propagation, effectively removing the
186 Since we limit the argument domain of the specialized function, its
187 definition can often be optimized further (since now more types or even
188 values of arguments are already know). By replacing any application of
189 the function that falls within the reduced domain by an application of
190 the specialized version, the code gets faster (but the code also gets
191 bigger, since we now have two versions instead of one!). If we apply
192 this technique often enough, we can often replace all applications of a
193 function by specialized versions, allowing the original function to be
194 removed (in some cases, this can even give a net reduction of the code
195 compared to the non-specialized version).
197 Specialization is useful for our hardware descriptions for functions
198 that contain arguments that cannot be translated to hardware directly
199 (polymorphic or higher order arguments, for example). If we can create
200 specialized functions that remove the argument, or make it translatable,
201 we can use specialization to make the original, untranslatable, function
204 \section{Higher order values}
205 What holds for partial application, can be easily generalized to any
206 higher order expression. This includes partial applications, plain
207 variables (e.g., a binder referring to a top level function), lambda
208 expressions and more complex expressions with a function type (a case
209 expression returning lambda's, for example).
211 Each of these values cannot be directly represented in hardware (just like
212 partial applications). Also, to make them representable, they need to be
213 applied: function variables and partial applications will then eventually
214 become complete applications, applied lambda expressions disappear by
215 applying β-reduction, etc.
217 So any higher order value will be \quote{pushed down} towards its
218 application just like partial applications. Whenever a function boundary
219 needs to be crossed, the called function can be specialized.
221 TODO: This is section should be improved
223 \section{Polymorphism}
224 In Haskell, values can be polymorphic: They can have multiple types. For
225 example, the function \hs{fst :: (a, b) -> a} is an example of a
226 polymorphic function: It works for tuples with any element types. Haskell
227 typeclasses allow a function to work on a specific set of types, but the
228 general idea is the same.
230 % A type class is a collection of types for which some operations are
231 % defined. It is thus possible for a value to be polymorphic while having
232 % any number of \emph{class constraints}: The value is not defined for
233 % every type, but only for types in the type class. An example of this is
234 % the \hs{even :: (Integral a) => a -> Bool} function, which can map any
235 % value of a type that is member of the \hs{Integral} type class
237 When generating hardware, polymorphism can't be easily translated. How
238 many wire will you lay down for a value that could have any type? When
239 type classes are involved, what hardware components will you lay down for
240 a class method (whose behaviour depends on the type of its arguments)?
242 Fortunately, we can again use the principle of specialization: Since every
243 function application generates separate pieces of hardware, we can know
244 the types of all arguments exactly. Provided that we don't use existential
245 typing, all of the polymorphic types in a function must depend on the
246 types of the arguments (In other words, the only way to introduce a type
247 variable is in a lambda abstraction). Our top level function must not have
248 a polymorphic type (otherwise we wouldn't know the hardware interface to
249 our top level function).
251 If a function is monomorphic, all values inside it are monomorphic as
252 well, so any function that is applied within the function can only be
253 applied to monomorphic values. The applied functions can then be
254 specialized to work just for these specific types, removing the
255 polymorphism from the applied functions as well.
257 By induction, this means that all functions that are (indirectly) called
258 by our top level function (meaning all functions that are translated in
259 the final hardware) become monomorphic.
262 A very important concept in hardware designs is \emph{state}. In a
263 stateless (or, \emph{combinatoric}) design, every output is a directly and solely dependent on the
264 inputs. In a stateful design, the outputs can depend on the history of
265 inputs, or the \emph{state}. State is usually stored in \emph{registers},
266 which retain their value during a clockcycle, and are typically updated at
267 the start of every clockcycle. Since the updating of the state is tightly
268 coupled (synchronized) to the clock signal, these state updates are often
269 called \emph{synchronous}.
271 To make our hardware description language useful to describe more that
272 simple combinatoric designs, we'll need to be able to describe state in
275 \subsection{Approaches to state}
276 In Haskell, functions are always pure (except when using unsafe
277 functions like \hs{unsafePerformIO}, which should be prevented whenever
278 possible). This means that the output of a function solely depends on
279 its inputs. If you evaluate a given function with given inputs, it will
280 always provide the same output.
284 This is a perfect match for a combinatoric circuit, where the output
285 also soley depend on the inputs. However, when state is involved, this
286 no longer holds. Since we're in charge of our own language, we could
287 remove this purity constraint and allow a function to return different
288 values depending on the cycle in which it is evaluated (or rather, the
289 current state). However, this means that all kinds of interesting
290 properties of our functional language get lost, and all kinds of
291 transformations and optimizations might no longer be meaning preserving.
293 Provided that we want to keep the function pure, the current state has
294 to be present in the function's arguments in some way. There seem to be
295 two obvious ways to do this: Adding the current state as an argument, or
296 including the full history of each argument.
298 \subsubsection{Stream arguments and results}
299 Including the entire history of each input (\eg, the value of that
300 input for each previous clockcycle) is an obvious way to make outputs
301 depend on all previous input. This is easily done by making every
302 input a list instead of a single value, containing all previous values
303 as well as the current value.
305 An obvious downside of this solution is that on each cycle, all the
306 previous cycles must be resimulated to obtain the current state. To do
307 this, it might be needed to have a recursive helper function as well,
308 wich might be hard to properly analyze by the compiler.
310 A slight variation on this approach is one taken by some of the other
311 functional \small{HDL}s in the field (TODO: References to Lava,
312 ForSyDe, ...): Make functions operate on complete streams. This means
313 that a function is no longer called on every cycle, but just once. It
314 takes stream as inputs instead of values, where each stream contains
315 all the values for every clockcycle since system start. This is easily
316 modeled using an (infinite) list, with one element for each clock
317 cycle. Since the funciton is only evaluated once, its output is also a
318 stream. Note that, since we are working with infinite lists and still
319 want to be able to simulate the system cycle-by-cycle, this relies
320 heavily on the lazy semantics of Haskell.
322 Since our inputs and outputs are streams, all other (intermediate)
323 values must be streams. All of our primitive operators (\eg, addition,
324 substraction, bitwise operations, etc.) must operate on streams as
325 well (note that changing a single-element operation to a stream
326 operation can done with \hs{map}, \hs{zipwith}, etc.).
328 Note that the concept of \emph{state} is no more than having some way
329 to communicate a value from one cycle to the next. By introducing a
330 \hs{delay} function, we can do exactly that: Delay (each value in) a
331 stream so that we can "look into" the past. This \hs{delay} function
332 simply outputs a stream where each value is the same as the input
333 value, but shifted one cycle. This causes a \quote{gap} at the
334 beginning of the stream: What is the value of the delay output in the
335 first cycle? For this, the \hs{delay} function has a second input
336 (which is a value, not a stream!).
338 \in{Example}[ex:DelayAcc] shows a simple accumulator expressed in this
341 \startbuffer[DelayAcc]
342 acc :: Stream Word -> Stream Word
345 out = (delay out 0) + in
348 \startuseMPgraphic{DelayAcc}
349 save in, out, add, reg;
352 newCircle.in(btex $in$ etex) "framed(false)";
353 newCircle.out(btex $out$ etex) "framed(false)";
356 newReg.reg("") "dx(4mm)", "dy(6mm)", "reflect(true)";
357 newCircle.add(btex + etex);
360 add.c = in.c + (2cm, 0cm);
361 out.c = add.c + (2cm, 0cm);
362 reg.c = add.c + (0cm, 2cm);
364 % Draw objects and lines
365 drawObj(in, out, add, reg);
367 nccurve(add)(reg) "angleA(0)", "angleB(180)", "posB(d)";
368 nccurve(reg)(add) "angleA(180)", "angleB(-45)", "posA(out)";
374 \placeexample[here][ex:DelayAcc]{Simple accumulator architecture.}
375 \startcombination[2*1]
376 {\typebufferhs{DelayAcc}}{Haskell description using streams.}
377 {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
381 This notation can be confusing (especially due to the loop in the
382 definition of out), but is essentially easy to interpret. There is a
383 single call to delay, resulting in a circuit with a single register,
384 whose input is connected to \hs{outl (which is the output of the
385 adder)}, and it's output is the \hs{delay out 0} (which is connected
386 to one of the adder inputs).
388 This notation has a number of downsides, amongst which are limited
389 readability and ambiguity in the interpretation. TODO: Reference
392 \subsubsection{Explicit state arguments and results}
393 A more explicit way to model state, is to simply add an extra argument
394 containing the current state value. This allows an output to depend on
395 both the inputs as well as the current state while keeping the
396 function pure (letting the result depend only on the arguments), since
397 the current state is now an argument.
399 In Haskell, this would look like \in{example}[ex:ExplicitAcc].
401 \startbuffer[ExplicitAcc]
402 -- input -> current state -> (new state, output)
403 acc :: Word -> Word -> (Word, Word)
404 acc in (State s) = (State s', out)
410 \placeexample[here][ex:ExplicitAcc]{Simple accumulator architecture.}
411 \startcombination[2*1]
412 {\typebufferhs{ExplicitAcc}}{Haskell description using explicit state arguments.}
413 % Picture is identical to the one we had just now.
414 {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
417 This approach makes a function's state very explicit, which state
418 variables are used by a function can be completely determined from its
419 type signature (as opposed to the stream approach, where a function
420 looks the same from the outside, regardless of what state variables it
421 uses (or wether it's stateful at all).
423 This approach is the one chosen for Cλash and will be examined more
426 \subsection{Explicit state specification}
427 We've seen the concept of explicit state in a simple example below, but
428 what are the implications of this approach?
430 \subsubsection{Substates}
431 Since a function's state is reflected directly in its type signature,
432 if a function calls other stateful functions (\eg, has subcircuits) it
433 has to somehow know the current state for these called functions. The
434 only way to do this, is to put these \emph{substates} inside the
435 caller's state. This means that a function's state is the sum of the
436 states of all functions it calls, and its own state.
438 This also means that the type of a function (at least the "state"
439 part) is dependent on its implementation and the functions it calls.
440 This is the major downside of this approach: The separation between
441 interface and implementation is limited. However, since Cλash is not
442 very suitable for separate compilation (see
443 \in{section}[sec:prototype:separate]) this is not a big problem in
444 practice. Additionally, when using a type synonym for the state type
445 of each function, we can still provide explicit type signatures
446 while keeping the state specification for a function near its
450 We need some way to know which arguments should become input ports and
451 which argument(s?) should become the current state (\eg, be bound to
452 the register outputs). This does not hold holds not just for the top
453 level function, but also for any subfunctions. Or could we perhaps
454 deduce the statefulness of subfunctions by analyzing the flow of data
455 in the calling functions?
457 To explore this matter, we make an interesting observation: We get
458 completely correct behaviour when we put all state registers in the
459 top level entity (or even outside of it). All of the state arguments
460 and results on subfunctions are treated as normal input and output
461 ports. Effectively, a stateful function results in a stateless
462 hardware component that has one of its input ports connected to the
463 output of a register and one of its output ports connected to the
464 input of the same register.
468 Of course, even though the hardware described like this has the
469 correct behaviour, unless the layout tool does smart optimizations,
470 there will be a lot of extra wire in the design (since registers will
471 not be close to the component that uses them). Also, when working with
472 the generated \small{VHDL} code, there will be a lot of extra ports
473 just to pass one state values, which can get quite confusing.
475 To fix this, we can simply \quote{push} the registers down into the
476 subcircuits. When we see a register that is connected directly to a
477 subcircuit, we remove the corresponding input and output port and put
478 the register inside the subcircuit instead. This is slightly less
479 trivial when looking at the Haskell code instead of the resulting
480 circuit, but the idea is still the same.
484 However, when applying this technique, we might push registers down
485 too far. When you intend to store a result of a stateless subfunction
486 in the caller's state and pass the current value of that state
487 variable to that same function, the register might get pushed down too
488 far. It is impossible to distinguish this case from similar code where
489 the called function is in fact stateful. From this we can conclude
490 that we have to either:
493 \item accept that the generated hardware might not be exactly what we
494 intended, in some specific cases. In most cases, the hardware will be
496 \item explicitely annotate state arguments and results in the input
500 The first option causes (non-obvious) exceptions in the language
501 intepretation. Also, automatically determining where registers should
502 end up is easier to implement correctly with explicit annotations, so
503 for these reasons we will look at how this annotations could work.
506 TODO: Note about conditions on state variables and checking them.
508 \subsection{Explicit state annotation}
509 To make our stateful descriptions unambigious and easier to translate,
510 we need some way for the developer to describe which arguments and
511 results are intended to become stateful.
513 Roughly, we have two ways to achieve this:
515 \item Use some kind of annotation method or syntactic construction in
516 the language to indicate exactly which argument and (part of the)
517 result is stateful. This means that the annotation lives
518 \quote{outside} of the function, it is completely invisible when
519 looking at the function body.
520 \item Use some kind of annotation on the type level, \eg give stateful
521 arguments and (part of) results a different type. This has the
522 potential to make this annotation visible inside the function as well,
523 such that when looking at a value inside the function body you can
524 tell if it's stateful by looking at its type. This could possibly make
525 the translation process a lot easier, since less analysis of the
526 program flow might be required.
529 From these approaches, the type level \quote{annotations} have been
530 implemented in Cλash. \in{Section}[sec:prototype:statetype] expands on
531 the possible ways this could have been implemented.
533 TODO: Say something about dependent types and fixed size vectors
535 \section[sec:recursion]{Recursion}
536 An import concept in functional languages is recursion. In it's most basic
537 form, recursion is a function that is defined in terms of itself. This
538 usually requires multiple evaluations of this function, with changing
539 arguments, until eventually an evaluation of the function no longer requires
542 Recursion in a hardware description is a bit of a funny thing. Usually,
543 recursion is associated with a lot of nondeterminism, stack overflows, but
544 also flexibility and expressive power.
546 Given the notion that each function application will translate to a
547 component instantiation, we are presented with a problem. A recursive
548 function would translate to a component that contains itself. Or, more
549 precisely, that contains an instance of itself. This instance would again
550 contain an instance of itself, and again, into infinity. This is obviously a
551 problem for generating hardware.
553 This is expected for functions that describe infinite recursion. In that
554 case, we can't generate hardware that shows correct behaviour in a single
555 cycle (at best, we could generate hardware that needs an infinite number of
558 However, most recursive hardware descriptions will describe finite
559 recursion. This is because the recursive call is done conditionally. There
560 is usually a case statement where at least one alternative does not contain
561 the recursive call, which we call the "base case". If, for each call to the
562 recursive function, we would be able to detect which alternative applies,
563 we would be able to remove the case expression and leave only the base case
564 when it applies. This will ensure that expanding the recursive functions
565 will terminate after a bounded number of expansions.
567 This does imply the extra requirement that the base case is detectable at
568 compile time. In particular, this means that the decision between the base
569 case and the recursive case must not depend on runtime data.
571 \subsection{List recursion}
572 The most common deciding factor in recursion is the length of a list that is
573 passed in as an argument. Since we represent lists as vectors that encode
574 the length in the vector type, it seems easy to determine the base case. We
575 can simply look at the argument type for this. However, it turns out that
576 this is rather non-trivial to write down in Haskell in the first place. As
577 an example, we would like to write down something like this:
580 sum :: Vector n Word -> Word
581 sum xs = case null xs of
583 False -> head xs + sum (tail xs)
586 However, the typechecker will now use the following reasoning (element type
587 of the vector is left out):
590 \item tail has the type \hs{(n > 0) => Vector n -> Vector (n - 1)}
591 \item This means that xs must have the type \hs{(n > 0) => Vector n}
592 \item This means that sum must have the type \hs{(n > 0) => Vector n -> a}
593 \item sum is called with the result of tail as an argument, which has the
594 type \hs{Vector n} (since \hs{(n > 0) => n - 1 == m}).
595 \item This means that sum must have the type \hs{Vector n -> a}
596 \item This is a contradiction between the type deduced from the body of sum
597 (the input vector must be non-empty) and the use of sum (the input vector
598 could have any length).
601 As you can see, using a simple case at value level causes the type checker
602 to always typecheck both alternatives, which can't be done! This is a
603 fundamental problem, that would seem perfectly suited for a type class.
604 Considering that we need to switch between to implementations of the sum
605 function, based on the type of the argument, this sounds like the perfect
606 problem to solve with a type class. However, this approach has its own
607 problems (not the least of them that you need to define a new typeclass for
608 every recursive function you want to define).
610 Another approach tried involved using GADTs to be able to do pattern
611 matching on empty / non empty lists. While this worked partially, it also
612 created problems with more complex expressions.
614 TODO: How much detail should there be here? I can probably refer to
617 Evaluating all possible (and non-possible) ways to add recursion to our
618 descriptions, it seems better to leave out list recursion alltogether. This
619 allows us to focus on other interesting areas instead. By including
620 (builtin) support for a number of higher order functions like map and fold,
621 we can still express most of the things we would use list recursion for.
623 \subsection{General recursion}
624 Of course there are other forms of recursion, that do not depend on the
625 length (and thus type) of a list. For example, simple recursion using a
626 counter could be expressed, but only translated to hardware for a fixed
627 number of iterations. Also, this would require extensive support for compile
628 time simplification (constant propagation) and compile time evaluation
629 (evaluation constant comparisons), to ensure non-termination. Even then, it
630 is hard to really guarantee termination, since the user (or GHC desugarer)
631 might use some obscure notation that results in a corner case of the
632 simplifier that is not caught and thus non-termination.
634 Due to these complications, we leave other forms of recursion as
637 \section{Supported types}