Chapters/HardwareDescription.tex

   1 \chapter[chap:description]{Hardware description}
   2   This chapter will provide an overview of the hardware description language
   3   that was created and the issues that have arisen in the process. It will
   4   focus on the issues of the language, not the implementation.
   5
   6   When translating Haskell to hardware, we need to make choices in what kind
   7   of hardware to generate for what Haskell constructs. When faced with
   8   choices, we've tried to stick with the most obvious choice wherever
   9   possible. In a lot of cases, when you look at a hardware description it is
  10   comletely clear what hardware is described. We want our translator to
  11   generate exactly that hardware whenever possible, to minimize the amount of
  12   surprise for people working with it.
  13
  14   In this chapter we try to describe how we interpret a Haskell program from a
  15   hardware perspective. We provide a description of each Haskell language
  16   element that needs translation, to provide a clear picture of what is
  17   supported and how.
  18
  19   \section{Function application}
  20   The basic syntactic element of a functional program are functions and
  21   function application. These have a single obvious \small{VHDL} translation: Each
  22   function becomes a hardware component, where each argument is an input port
  23   and the result value is the output port.
  24
  25   Each function application in turn becomes component instantiation. Here, the
  26   result of each argument expression is assigned to a signal, which is mapped
  27   to the corresponding input port. The output port of the function is also
  28   mapped to a signal, which is used as the result of the application.
  29
  30   \in{Example}[ex:And3] shows a simple program using only function
  31   application and the corresponding architecture.
  32
  33 \startbuffer[And3]
  34 -- | A simple function that returns
  35 --   the and of three bits
  36 and3 :: Bit -> Bit -> Bit -> Bit
  37 and3 a b c = and (and a b) c
  38 \stopbuffer
  39
  40   \startuseMPgraphic{And3}
  41     save a, b, c, anda, andb, out;
  42
  43     % I/O ports
  44     newCircle.a(btex $a$ etex) "framed(false)";
  45     newCircle.b(btex $b$ etex) "framed(false)";
  46     newCircle.c(btex $c$ etex) "framed(false)";
  47     newCircle.out(btex $out$ etex) "framed(false)";
  48
  49     % Components
  50     newCircle.anda(btex $and$ etex);
  51     newCircle.andb(btex $and$ etex);
  52
  53     a.c    = origin;
  54     b.c    = a.c + (0cm, 1cm);
  55     c.c    = b.c + (0cm, 1cm);
  56     anda.c = midpoint(a.c, b.c) + (2cm, 0cm);
  57     andb.c = midpoint(b.c, c.c) + (4cm, 0cm);
  58
  59     out.c   = andb.c + (2cm, 0cm);
  60
  61     % Draw objects and lines
  62     drawObj(a, b, c, anda, andb, out);
  63
  64     ncarc(a)(anda) "arcangle(-10)";
  65     ncarc(b)(anda);
  66     ncarc(anda)(andb);
  67     ncarc(c)(andb);
  68     ncline(andb)(out);
  69   \stopuseMPgraphic
  70
  71   \placeexample[here][ex:And3]{Simple three port and.}
  72     \startcombination[2*1]
  73       {\typebufferhs{And3}}{Haskell description using function applications.}
  74       {\boxedgraphic{And3}}{The architecture described by the Haskell description.}
  75     \stopcombination
  76
  77   TODO: Define top level function and subfunctions/circuits.
  78
  79   \subsection{Partial application}
  80   It should be obvious that we cannot generate hardware signals for all
  81   expressions we can express in Haskell. The most obvious criterium for this
  82   is the type of an expression. We will see more of this below, but for now it
  83   should be obvious that any expression of a function type cannot be
  84   represented as a signal or i/o port to a component.
  85
  86   From this, we can see that the above translation rules do not apply to a
  87   partial application. \in{Example}[ex:Quadruple] shows an example use of
  88   partial application and the corresponding architecture.
  89
  90 \startbuffer[Quadruple]
  91 -- | Multiply the input word by four.
  92 quadruple :: Word -> Word
  93 quadruple n = mul (mul n)
  94   where
  95     mul = (*) 2
  96 \stopbuffer
  97
  98   \startuseMPgraphic{Quadruple}
  99     save in, two, mula, mulb, out;
 100
 101     % I/O ports
 102     newCircle.in(btex $n$ etex) "framed(false)";
 103     newCircle.two(btex $2$ etex) "framed(false)";
 104     newCircle.out(btex $out$ etex) "framed(false)";
 105
 106     % Components
 107     newCircle.mula(btex $\times$ etex);
 108     newCircle.mulb(btex $\times$ etex);
 109
 110     two.c    = origin;
 111     in.c     = two.c + (0cm, 1cm);
 112     mula.c  = in.c + (2cm, 0cm);
 113     mulb.c  = mula.c + (2cm, 0cm);
 114     out.c   = mulb.c + (2cm, 0cm);
 115
 116     % Draw objects and lines
 117     drawObj(in, two, mula, mulb, out);
 118
 119     nccurve(two)(mula) "angleA(0)", "angleB(45)";
 120     nccurve(two)(mulb) "angleA(0)", "angleB(45)";
 121     ncline(in)(mula);
 122     ncline(mula)(mulb);
 123     ncline(mulb)(out);
 124   \stopuseMPgraphic
 125
 126   \placeexample[here][ex:Quadruple]{Simple three port and.}
 127     \startcombination[2*1]
 128       {\typebufferhs{Quadruple}}{Haskell description using function applications.}
 129       {\boxedgraphic{Quadruple}}{The architecture described by the Haskell description.}
 130     \stopcombination
 131
 132   Here, the definition of mul is a partial function application: It applies
 133   \hs{2 :: Word} to the function \hs{(*) :: Word -> Word -> Word} resulting in
 134   the expression \hs{(*) 2 :: Word -> Word}. Since this resulting expression
 135   is again a function, we can't generate hardware for it directly. This is
 136   because the hardware to generate for \hs{mul} depends completely on where
 137   and how it is used. In this example, it is even used twice!
 138
 139   However, it is clear that the above hardware description actually describes
 140   valid hardware. In general, we can see that any partial applied function
 141   must eventually become completely applied, at which point we can generate
 142   hardware for it using the rules for function application above. It might
 143   mean that a partial application is passed around quite a bit (even beyond
 144   function boundaries), but eventually, the partial application will become
 145   completely applied.
 146
 147   \section{State}
 148     A very important concept in hardware designs is \emph{state}. In a
 149     stateless (or, \emph{combinatoric}) design, every output is a directly and solely dependent on the
 150     inputs. In a stateful design, the outputs can depend on the history of
 151     inputs, or the \emph{state}. State is usually stored in \emph{registers},
 152     which retain their value during a clockcycle, and are typically updated at
 153     the start of every clockcycle. Since the updating of the state is tightly
 154     coupled (synchronized) to the clock signal, these state updates are often
 155     called \emph{synchronous}.
 156
 157     To make our hardware description language useful to describe more that
 158     simple combinatoric designs, we'll need to be able to describe state in
 159     some way.
 160
 161     \subsection{Approaches to state}
 162       In Haskell, functions are always pure (except when using unsafe
 163       functions like \hs{unsafePerformIO}, which should be prevented whenever
 164       possible). This means that the output of a function solely depends on
 165       its inputs. If you evaluate a given function with given inputs, it will
 166       always provide the same output.
 167
 168       TODO: Define pure
 169
 170       This is a perfect match for a combinatoric circuit, where the output
 171       also soley depend on the inputs. However, when state is involved, this
 172       no longer holds. Since we're in charge of our own language, we could
 173       remove this purity constraint and allow a function to return different
 174       values depending on the cycle in which it is evaluated (or rather, the
 175       current state). However, this means that all kinds of interesting
 176       properties of our functional language get lost, and all kinds of
 177       transformations and optimizations might no longer be meaning preserving.
 178
 179       Provided that we want to keep the function pure, the current state has
 180       to be present in the function's arguments in some way. There seem to be
 181       two obvious ways to do this: Adding the current state as an argument, or
 182       including the full history of each argument.
 183
 184       \subsubsection{Stream arguments and results}
 185         Including the entire history of each input (\eg, the value of that
 186         input for each previous clockcycle) is an obvious way to make outputs
 187         depend on all previous input. This is easily done by making every
 188         input a list instead of a single value, containing all previous values
 189         as well as the current value.
 190
 191         An obvious downside of this solution is that on each cycle, all the
 192         previous cycles must be resimulated to obtain the current state. To do
 193         this, it might be needed to have a recursive helper function as well,
 194         wich might be hard to properly analyze by the compiler.
 195
 196         A slight variation on this approach is one taken by some of the other
 197         functional \small{HDL}s in the field (TODO: References to Lava,
 198         ForSyDe, ...): Make functions operate on complete streams. This means
 199         that a function is no longer called on every cycle, but just once. It
 200         takes stream as inputs instead of values, where each stream contains
 201         all the values for every clockcycle since system start. This is easily
 202         modeled using an (infinite) list, with one element for each clock
 203         cycle. Since the funciton is only evaluated once, its output is also a
 204         stream. Note that, since we are working with infinite lists and still
 205         want to be able to simulate the system cycle-by-cycle, this relies
 206         heavily on the lazy semantics of Haskell.
 207
 208         Since our inputs and outputs are streams, all other (intermediate)
 209         values must be streams. All of our primitive operators (\eg, addition,
 210         substraction, bitwise operations, etc.) must operate on streams as
 211         well (note that changing a single-element operation to a stream
 212         operation can done with \hs{map}, \hs{zipwith}, etc.).
 213
 214         Note that the concept of \emph{state} is no more than having some way
 215         to communicate a value from one cycle to the next. By introducing a
 216         \hs{delay} function, we can do exactly that: Delay (each value in) a
 217         stream so that we can "look into" the past. This \hs{delay} function
 218         simply outputs a stream where each value is the same as the input
 219         value, but shifted one cycle. This causes a \quote{gap} at the
 220         beginning of the stream: What is the value of the delay output in the
 221         first cycle? For this, the \hs{delay} function has a second input
 222         (which is a value, not a stream!).
 223
 224         \in{Example}[ex:DelayAcc] shows a simple accumulator expressed in this
 225         style.
 226
 227 \startbuffer[DelayAcc]
 228 acc :: Stream Word -> Stream Word
 229 acc in = out
 230   where
 231     out = (delay out 0) + in
 232 \stopbuffer
 233
 234 \startuseMPgraphic{DelayAcc}
 235   save in, out, add, reg;
 236
 237   % I/O ports
 238   newCircle.in(btex $in$ etex) "framed(false)";
 239   newCircle.out(btex $out$ etex) "framed(false)";
 240
 241   % Components
 242   newReg.reg("") "dx(4mm)", "dy(6mm)", "reflect(true)";
 243   newCircle.add(btex + etex);
 244
 245   in.c    = origin;
 246   add.c   = in.c + (2cm, 0cm);
 247   out.c   = add.c + (2cm, 0cm);
 248   reg.c   = add.c + (0cm, 2cm);
 249
 250   % Draw objects and lines
 251   drawObj(in, out, add, reg);
 252
 253   nccurve(add)(reg) "angleA(0)", "angleB(180)", "posB(d)";
 254   nccurve(reg)(add) "angleA(180)", "angleB(-45)", "posA(out)";
 255   ncline(in)(add);
 256   ncline(add)(out);
 257 \stopuseMPgraphic
 258
 259
 260         \placeexample[here][ex:DelayAcc]{Simple accumulator architecture.}
 261           \startcombination[2*1]
 262             {\typebufferhs{DelayAcc}}{Haskell description using streams.}
 263             {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
 264           \stopcombination
 265
 266
 267         This notation can be confusing (especially due to the loop in the
 268         definition of out), but is essentially easy to interpret. There is a
 269         single call to delay, resulting in a circuit with a single register,
 270         whose input is connected to \hs{outl (which is the output of the
 271         adder)}, and it's output is the \hs{delay out 0} (which is connected
 272         to one of the adder inputs).
 273
 274         This notation has a number of downsides, amongst which are limited
 275         readability and ambiguity in the interpretation. TODO: Reference
 276         Christiaan.
 277
 278       \subsubsection{Explicit state arguments and results}
 279         A more explicit way to model state, is to simply add an extra argument
 280         containing the current state value. This allows an output to depend on
 281         both the inputs as well as the current state while keeping the
 282         function pure (letting the result depend only on the arguments), since
 283         the current state is now an argument.
 284
 285         In Haskell, this would look like \in{example}[ex:ExplicitAcc].
 286
 287 \startbuffer[ExplicitAcc]
 288 -- input -> current state -> (new state, output)
 289 acc :: Word -> Word -> (Word, Word)
 290 acc in (State s) = (State s', out)
 291   where
 292     out = s + in
 293     s'  = out
 294 \stopbuffer
 295
 296         \placeexample[here][ex:ExplicitAcc]{Simple accumulator architecture.}
 297           \startcombination[2*1]
 298             {\typebufferhs{ExplicitAcc}}{Haskell description using explicit state arguments.}
 299             % Picture is identical to the one we had just now.
 300             {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
 301           \stopcombination
 302
 303         This approach makes a function's state very explicit, which state
 304         variables are used by a function can be completely determined from its
 305         type signature (as opposed to the stream approach, where a function
 306         looks the same from the outside, regardless of what state variables it
 307         uses (or wether it's stateful at all).
 308
 309         This approach is the one chosen for Cλash and will be examined more
 310         closely below.
 311
 312     \subsection{Explicit state specification}
 313       We've seen the concept of explicit state in a simple example below, but
 314       what are the implications of this approach?
 315
 316       \subsubsection{Substates}
 317         Since a function's state is reflected directly in its type signature,
 318         if a function calls other stateful functions (\eg, has subcircuits) it
 319         has to somehow know the current state for these called functions. The
 320         only way to do this, is to put these \emph{substates} inside the
 321         caller's state. This means that a function's state is the sum of the
 322         states of all functions it calls, and its own state.
 323
 324         This also means that the type of a function (at least the "state"
 325         part) is dependent on its implementation and the functions it calls.
 326         This is the major downside of this approach: The separation between
 327         interface and implementation is limited. However, since Cλash is not
 328         very suitable for separate compilation (see
 329         \in{section}[sec:prototype:separate]) this is not a big problem in
 330         practice. Additionally, when using a type synonym for the state type
 331         of each function, we can still provide explicit type signatures
 332         while keeping the state specification for a function near its
 333         definition only.
 334
 335       \subsubsection{...}
 336         We need some way to know which arguments should become input ports and
 337         which argument(s?) should become the current state (\eg, be bound to
 338         the register outputs). This does not hold holds not just for the top
 339         level function, but also for any subfunctions. Or could we perhaps
 340         deduce the statefulness of subfunctions by analyzing the flow of data
 341         in the calling functions?
 342
 343         To explore this matter, we make an interesting observation: We get
 344         completely correct behaviour when we put all state registers in the
 345         top level entity (or even outside of it). All of the state arguments
 346         and results on subfunctions are treated as normal input and output
 347         ports. Effectively, a stateful function results in a stateless
 348         hardware component that has one of its input ports connected to the
 349         output of a register and one of its output ports connected to the
 350         input of the same register.
 351
 352         TODO: Example?
 353
 354         Of course, even though the hardware described like this has the
 355         correct behaviour, unless the layout tool does smart optimizations,
 356         there will be a lot of extra wire in the design (since registers will
 357         not be close to the component that uses them). Also, when working with
 358         the generated \small{VHDL} code, there will be a lot of extra ports
 359         just to pass one state values, which can get quite confusing.
 360
 361         To fix this, we can simply \quote{push} the registers down into the
 362         subcircuits. When we see a register that is connected directly to a
 363         subcircuit, we remove the corresponding input and output port and put
 364         the register inside the subcircuit instead. This is slightly less
 365         trivial when looking at the Haskell code instead of the resulting
 366         circuit, but the idea is still the same.
 367
 368         TODO: Example?
 369
 370         However, when applying this technique, we might push registers down
 371         too far. When you intend to store a result of a stateless subfunction
 372         in the caller's state and pass the current value of that state
 373         variable to that same function, the register might get pushed down too
 374         far. It is impossible to distinguish this case from similar code where
 375         the called function is in fact stateful. From this we can conclude
 376         that we have to either:
 377
 378         \startitemize
 379         \item accept that the generated hardware might not be exactly what we
 380         intended, in some specific cases. In most cases, the hardware will be
 381         what we intended.
 382         \item explicitely annotate state arguments and results in the input
 383         description.
 384         \stopitemize
 385
 386         The first option causes (non-obvious) exceptions in the language
 387         intepretation. Also, automatically determining where registers should
 388         end up is easier to implement correctly with explicit annotations, so
 389         for these reasons we will look at how this annotations could work.
 390
 391
 392       TODO: Note about conditions on state variables and checking them.
 393
 394     \subsection{Explicit state implementation}
 395       Recording state variables at the type level.
 396
 397       Ideal: Type synonyms, since there is no additional code overhead for
 398       packing and unpacking. Downside: there is no explicit conversion in Core
 399       either, so type synonyms tend to get lost in expressions (they can be
 400       preserved in binders, but this makes implementation harder, since that
 401       statefulness of a value must be manually tracked).
 402
 403       Less ideal: Newtype. Requires explicit packing and unpacking of function
 404       arguments. If you don't unpack substates, there is no overhead for
 405       (un)packing substates. This will result in many nested State constructors
 406       in a nested state type. \eg:
 407
 408   \starttyping
 409   State (State Bit, State (State Word, Bit), Word)
 410   \stoptyping
 411
 412       Alternative: Provide different newtypes for input and output state. This
 413       makes the code even more explicit, and typechecking can find even more
 414       errors. However, this requires defining two type synomyms for each
 415       stateful function instead of just one. \eg:
 416   \starttyping
 417   type AccumStateIn = StateIn Bit
 418   type AccumStateOut = StateOut Bit
 419   \stoptyping
 420       This also increases the possibility of having different input and output
 421       states. Checking for identical input and output state types is also
 422       harder, since each element in the state must be unpacked and compared
 423       separately.
 424
 425       Alternative: Provide a type for the entire result type of a stateful
 426       function, not just the state part. \eg:
 427
 428   \starttyping
 429   newtype Result state result = Result (state, result)
 430   \stoptyping
 431
 432       This makes it easy to say "Any stateful function must return a
 433       \type{Result} type, without having to sort out result from state. However,
 434       this either requires a second type for input state (similar to
 435       \type{StateIn} / \type{StateOut} above), or requires the compiler to
 436       select the right argument for input state by looking at types (which works
 437       for complex states, but when that state has the same type as an argument,
 438       things get ambiguous) or by selecting a fixed (\eg, the last) argument,
 439       which might be limiting.
 440
 441       \subsubsection{Example}
 442       As an example of the used approach, a simple averaging circuit, that lets
 443       the accumulation of the inputs be done by a subcomponent.
 444
 445       \starttyping
 446         newtype State s = State s
 447
 448         type AccumState = State Bit
 449         accum :: Word -> AccumState -> (AccumState, Word)
 450         accum i (State s) = (State (s + i), s + i)
 451
 452         type AvgState = (AccumState, Word)
 453         avg :: Word -> AvgState -> (AvgState, Word)
 454         avg i (State s) = (State s', o)
 455           where
 456             (accums, count) = s
 457             -- Pass our input through the accumulator, which outputs a sum
 458             (accums', sum) = accum i accums
 459             -- Increment the count (which will be our new state)
 460             count' = count + 1
 461             -- Compute the average
 462             o = sum / count'
 463             s' = (accums', count')
 464       \stoptyping
 465
 466       And the normalized, core-like versions:
 467
 468       \starttyping
 469         accum i spacked = res
 470           where
 471             s = case spacked of (State s) -> s
 472             s' = s + i
 473             spacked' = State s'
 474             o = s + i
 475             res = (spacked', o)
 476
 477         avg i spacked = res
 478           where
 479             s = case spacked of (State s) -> s
 480             accums = case s of (accums, \_) -> accums
 481             count = case s of (\_, count) -> count
 482             accumres = accum i accums
 483             accums' = case accumres of (accums', \_) -> accums'
 484             sum = case accumres of (\_, sum) -> sum
 485             count' = count + 1
 486             o = sum / count'
 487             s' = (accums', count')
 488             spacked' = State s'
 489             res = (spacked', o)
 490       \stoptyping
 491
 492
 493
 494       As noted above, any component of a function's state that is a substate,
 495       \eg passed on as the state of another function, should have no influence
 496       on the hardware generated for the calling function. Any state-specific
 497       \small{VHDL} for this component can be generated entirely within the called
 498       function. So,we can completely leave out substates from any function.
 499
 500       From this observation, we might think to remove the substates from a
 501       function's states alltogether, and leave only the state components which
 502       are actual states of the current function. While doing this would not
 503       remove any information needed to generate \small{VHDL} from the function, it would
 504       cause the function definition to become invalid (since we won't have any
 505       substate to pass to the functions anymore). We could solve the syntactic
 506       problems by passing \type{undefined} for state variables, but that would
 507       still break the code on the semantic level (\ie, the function would no
 508       longer be semantically equivalent to the original input).
 509
 510       To keep the function definition correct until the very end of the process,
 511       we will not deal with (sub)states until we get to the \small{VHDL} generation.
 512       Here, we are translating from Core to \small{VHDL}, and we can simply not generate
 513       \small{VHDL} for substates, effectively removing the substate components
 514       alltogether.
 515
 516       There are a few important points when ignore substates.
 517
 518       First, we have to have some definition of "substate". Since any state
 519       argument or return value that represents state must be of the \type{State}
 520       type, we can simply look at its type. However, we must be careful to
 521       ignore only {\em substates}, and not a function's own state.
 522
 523       In the example above, this means we should remove \type{accums'} from
 524       \type{s'}, but not throw away \type{s'} entirely. We should, however,
 525       remove \type{s'} from the output port of the function, since the state
 526       will be handled by a \small{VHDL} procedure within the function.
 527
 528       When looking at substates, these can appear in two places: As part of an
 529       argument and as part of a return value. As noted above, these substates
 530       can only be used in very specific ways.
 531
 532       \desc{State variables can appear as an argument.} When generating \small{VHDL}, we
 533       completely ignore the argument and generate no input port for it.
 534
 535       \desc{State variables can be extracted from other state variables.} When
 536       extracting a state variable from another state variable, this always means
 537       we're extracting a substate, which we can ignore. So, we simply generate no
 538       \small{VHDL} for any extraction operation that has a state variable as a result.
 539
 540       \desc{State variables can be passed to functions.} When passing a
 541       state variable to a function, this always means we're passing a substate
 542       to a subcomponent. The entire argument can simply be ingored in the
 543       resulting port map.
 544
 545       \desc{State variables can be returned from functions.} When returning a
 546       state variable from a function (probably as a part of an algebraic
 547       datatype), this always mean we're returning a substate from a
 548       subcomponent. The entire state variable should be ignored in the resulting
 549       port map. The type binder of the binder that the function call is bound
 550       to should not include the state type either.
 551
 552       \startdesc{State variables can be inserted into other variables.} When inserting
 553       a state variable into another variable (usually by constructing that new
 554       variable using its constructor), we can identify two cases:
 555
 556       \startitemize
 557         \item The state is inserted into another state variable. In this case,
 558         the inserted state is a substate, and can be safely left out of the
 559         constructed variable.
 560         \item The state is inserted into a non-state variable. This happens when
 561         building up the return value of a function, where you put state and
 562         retsult variables together in an algebraic type (usually a tuple). In
 563         this case, we should leave the state variable out as well, since we
 564         don't want it to be included as an output port.
 565       \stopitemize
 566
 567       So, in both cases, we can simply leave out the state variable from the
 568       resulting value. In the latter case, however, we should generate a state
 569       proc instead, which assigns the state variable to the input state variable
 570       at each clock tick.
 571       \stopdesc
 572
 573       \desc{State variables can appear as (part of) a function result.} When
 574       generating \small{VHDL}, we can completely ignore any part of a function result
 575       that has a state type. If the entire result is a state type, this will
 576       mean the entity will not have an output port. Otherwise, the state
 577       elements will be removed from the type of the output port.
 578
 579
 580       Now, we know how to handle each use of a state variable separately. If we
 581       look at the whole, we can conclude the following:
 582
 583       \startitemize
 584       \item A state unpack operation should not generate any \small{VHDL}. The binder
 585       to which the unpacked state is bound should still be declared, this signal
 586       will become the register and will hold the current state.
 587       \item A state pack operation should not generate any \small{VHDL}. The binder th
 588       which the packed state is bound should not be declared. The binder that is
 589       packed is the signal that will hold the new state.
 590       \item Any values of a State type should not be translated to \small{VHDL}. In
 591       particular, State elements should be removed from tuples (and other
 592       datatypes) and arguments with a state type should not generate ports.
 593       \item To make the state actually work, a simple \small{VHDL} proc should be
 594       generated. This proc updates the state at every clockcycle, by assigning
 595       the new state to the current state. This will be recognized by synthesis
 596       tools as a register specification.
 597       \stopitemize
 598
 599
 600       When applying these rules to the example program (in normal form), we will
 601       get the following result. All the parts that don't generate any value are
 602       crossed out, leaving some very boring assignments here and there.
 603
 604
 605   \starthaskell
 606     avg i --spacked-- = res
 607       where
 608         s = --case spacked of (State s) -> s--
 609         --accums = case s of (accums, \_) -> accums--
 610         count = case s of (--\_,-- count) -> count
 611         accumres = accum i --accums--
 612         --accums' = case accumres of (accums', \_) -> accums'--
 613         sum = case accumres of (--\_,-- sum) -> sum
 614         count' = count + 1
 615         o = sum / count'
 616         s' = (--accums',-- count')
 617         --spacked' = State s'--
 618         res = (--spacked',-- o)
 619   \stophaskell
 620
 621       When we would really leave out the crossed out parts, we get a slightly
 622       weird program: There is a variable \type{s} which has no value, and there
 623       is a variable \type{s'} that is never used. Together, these two will form
 624       the state proc of the function. \type{s} contains the "current" state,
 625       \type{s'} is assigned the "next" state. So, at the end of each clock
 626       cycle, \type{s'} should be assigned to \type{s}.
 627
 628       Note that the definition of \type{s'} is not removed, even though one
 629       might think it as having a state type. Since the state type has a single
 630       argument constructor \type{State}, some type that should be the resulting
 631       state should always be explicitly packed with the State constructor,
 632       allowing us to remove the packed version, but still generate \small{VHDL} for the
 633       unpacked version (of course with any substates removed).
 634
 635       As you can see, the definition of \type{s'} is still present, since it
 636       does not have a state type (The State constructor. The \type{accums'} substate has been removed,
 637       leaving us just with the state of \type{avg} itself.
 638     \subsection{Initial state}
 639       How to specify the initial state? Cannot be done inside a hardware
 640       function, since the initial state is its own state argument for the first
 641       call (unless you add an explicit, synchronous reset port).
 642
 643       External init state is natural for simulation.
 644
 645       External init state works for hardware generation as well.
 646
 647       Implementation issues: state splitting, linking input to output state,
 648       checking usage constraints on state variables.
 649
 650   \section[sec:recursion]{Recursion}
 651   An import concept in functional languages is recursion. In it's most basic
 652   form, recursion is a function that is defined in terms of itself. This
 653   usually requires multiple evaluations of this function, with changing
 654   arguments, until eventually an evaluation of the function no longer requires
 655   itself.
 656
 657   Recursion in a hardware description is a bit of a funny thing. Usually,
 658   recursion is associated with a lot of nondeterminism, stack overflows, but
 659   also flexibility and expressive power.
 660
 661   Given the notion that each function application will translate to a
 662   component instantiation, we are presented with a problem. A recursive
 663   function would translate to a component that contains itself. Or, more
 664   precisely, that contains an instance of itself. This instance would again
 665   contain an instance of itself, and again, into infinity. This is obviously a
 666   problem for generating hardware.
 667
 668   This is expected for functions that describe infinite recursion. In that
 669   case, we can't generate hardware that shows correct behaviour in a single
 670   cycle (at best, we could generate hardware that needs an infinite number of
 671   cycles to complete).
 672
 673   However, most recursive hardware descriptions will describe finite
 674   recursion. This is because the recursive call is done conditionally. There
 675   is usually a case statement where at least one alternative does not contain
 676   the recursive call, which we call the "base case". If, for each call to the
 677   recursive function, we would be able to detect which alternative applies,
 678   we would be able to remove the case expression and leave only the base case
 679   when it applies. This will ensure that expanding the recursive functions
 680   will terminate after a bounded number of expansions.
 681
 682   This does imply the extra requirement that the base case is detectable at
 683   compile time. In particular, this means that the decision between the base
 684   case and the recursive case must not depend on runtime data.
 685
 686   \subsection{List recursion}
 687   The most common deciding factor in recursion is the length of a list that is
 688   passed in as an argument. Since we represent lists as vectors that encode
 689   the length in the vector type, it seems easy to determine the base case. We
 690   can simply look at the argument type for this. However, it turns out that
 691   this is rather non-trivial to write down in Haskell in the first place. As
 692   an example, we would like to write down something like this:
 693
 694   \starthaskell
 695     sum :: Vector n Word -> Word
 696     sum xs = case null xs of
 697       True -> 0
 698       False -> head xs + sum (tail xs)
 699   \stophaskell
 700
 701   However, the typechecker will now use the following reasoning (element type
 702   of the vector is left out):
 703
 704   \startitemize
 705   \item tail has the type \hs{(n > 0) => Vector n -> Vector (n - 1)}
 706   \item This means that xs must have the type \hs{(n > 0) => Vector n}
 707   \item This means that sum must have the type \hs{(n > 0) => Vector n -> a}
 708   \item sum is called with the result of tail as an argument, which has the
 709   type \hs{Vector n} (since \hs{(n > 0) => n - 1 == m}).
 710   \item This means that sum must have the type \hs{Vector n -> a}
 711   \item This is a contradiction between the type deduced from the body of sum
 712   (the input vector must be non-empty) and the use of sum (the input vector
 713   could have any length).
 714   \stopitemize
 715
 716   As you can see, using a simple case at value level causes the type checker
 717   to always typecheck both alternatives, which can't be done! This is a
 718   fundamental problem, that would seem perfectly suited for a type class.
 719   Considering that we need to switch between to implementations of the sum
 720   function, based on the type of the argument, this sounds like the perfect
 721   problem to solve with a type class. However, this approach has its own
 722   problems (not the least of them that you need to define a new typeclass for
 723   every recursive function you want to define).
 724
 725   Another approach tried involved using GADTs to be able to do pattern
 726   matching on empty / non empty lists. While this worked partially, it also
 727   created problems with more complex expressions.
 728
 729   TODO: How much detail should there be here? I can probably refer to
 730   Christiaan instead.
 731
 732   Evaluating all possible (and non-possible) ways to add recursion to our
 733   descriptions, it seems better to leave out list recursion alltogether. This
 734   allows us to focus on other interesting areas instead. By including
 735   (builtin) support for a number of higher order functions like map and fold,
 736   we can still express most of the things we would use list recursion for.
 737
 738   \subsection{General recursion}
 739   Of course there are other forms of recursion, that do not depend on the
 740   length (and thus type) of a list. For example, simple recursion using a
 741   counter could be expressed, but only translated to hardware for a fixed
 742   number of iterations. Also, this would require extensive support for compile
 743   time simplification (constant propagation) and compile time evaluation
 744   (evaluation constant comparisons), to ensure non-termination. Even then, it
 745   is hard to really guarantee termination, since the user (or GHC desugarer)
 746   might use some obscure notation that results in a corner case of the
 747   simplifier that is not caught and thus non-termination.
 748
 749   Due to these complications, we leave other forms of recursion as
 750   future work as well.