to the corresponding input port. The output port of the function is also
mapped to a signal, which is used as the result of the application.
- An example of a simple program using only function application would be:
-
- \starthaskell
- -- | A simple function that returns the and of three bits
- and3 :: Bit -> Bit -> Bit -> Bit
- and3 a b c = and (and a b) c
- \stophaskell
-
- This results in the following hardware:
-
- TODO: Pretty picture
+ \in{Example}[ex:And3] shows a simple program using only function
+ application and the corresponding architecture.
+
+\startbuffer[And3]
+-- | A simple function that returns
+-- the and of three bits
+and3 :: Bit -> Bit -> Bit -> Bit
+and3 a b c = and (and a b) c
+\stopbuffer
+
+ \startuseMPgraphic{And3}
+ save a, b, c, anda, andb, out;
+
+ % I/O ports
+ newCircle.a(btex $a$ etex) "framed(false)";
+ newCircle.b(btex $b$ etex) "framed(false)";
+ newCircle.c(btex $c$ etex) "framed(false)";
+ newCircle.out(btex $out$ etex) "framed(false)";
+
+ % Components
+ newCircle.anda(btex $and$ etex);
+ newCircle.andb(btex $and$ etex);
+
+ a.c = origin;
+ b.c = a.c + (0cm, 1cm);
+ c.c = b.c + (0cm, 1cm);
+ anda.c = midpoint(a.c, b.c) + (2cm, 0cm);
+ andb.c = midpoint(b.c, c.c) + (4cm, 0cm);
+
+ out.c = andb.c + (2cm, 0cm);
+
+ % Draw objects and lines
+ drawObj(a, b, c, anda, andb, out);
+
+ ncarc(a)(anda) "arcangle(-10)";
+ ncarc(b)(anda);
+ ncarc(anda)(andb);
+ ncarc(c)(andb);
+ ncline(andb)(out);
+ \stopuseMPgraphic
+
+ \placeexample[here][ex:And3]{Simple three port and.}
+ \startcombination[2*1]
+ {\typebufferhs{And3}}{Haskell description using function applications.}
+ {\boxedgraphic{And3}}{The architecture described by the Haskell description.}
+ \stopcombination
+
+ TODO: Define top level function and subfunctions/circuits.
\subsection{Partial application}
It should be obvious that we cannot generate hardware signals for all
represented as a signal or i/o port to a component.
From this, we can see that the above translation rules do not apply to a
- partial application. Let's look at an example:
-
- \starthaskell
- -- | Multiply the input word by four.
- quadruple :: Word -> Word
- quadruple n = mul (mul n)
- where
- mul = (*) 2
- \stophaskell
-
- It should be clear that the above code describes the following hardware:
-
- TODO: Pretty picture
+ partial application. \in{Example}[ex:Quadruple] shows an example use of
+ partial application and the corresponding architecture.
+
+\startbuffer[Quadruple]
+-- | Multiply the input word by four.
+quadruple :: Word -> Word
+quadruple n = mul (mul n)
+ where
+ mul = (*) 2
+\stopbuffer
+
+ \startuseMPgraphic{Quadruple}
+ save in, two, mula, mulb, out;
+
+ % I/O ports
+ newCircle.in(btex $n$ etex) "framed(false)";
+ newCircle.two(btex $2$ etex) "framed(false)";
+ newCircle.out(btex $out$ etex) "framed(false)";
+
+ % Components
+ newCircle.mula(btex $\times$ etex);
+ newCircle.mulb(btex $\times$ etex);
+
+ two.c = origin;
+ in.c = two.c + (0cm, 1cm);
+ mula.c = in.c + (2cm, 0cm);
+ mulb.c = mula.c + (2cm, 0cm);
+ out.c = mulb.c + (2cm, 0cm);
+
+ % Draw objects and lines
+ drawObj(in, two, mula, mulb, out);
+
+ nccurve(two)(mula) "angleA(0)", "angleB(45)";
+ nccurve(two)(mulb) "angleA(0)", "angleB(45)";
+ ncline(in)(mula);
+ ncline(mula)(mulb);
+ ncline(mulb)(out);
+ \stopuseMPgraphic
+
+ \placeexample[here][ex:Quadruple]{Simple three port and.}
+ \startcombination[2*1]
+ {\typebufferhs{Quadruple}}{Haskell description using function applications.}
+ {\boxedgraphic{Quadruple}}{The architecture described by the Haskell description.}
+ \stopcombination
Here, the definition of mul is a partial function application: It applies
\hs{2 :: Word} to the function \hs{(*) :: Word -> Word -> Word} resulting in
completely applied.
\section{State}
- \subsection{Introduction}
- Provide some examples
-
+ A very important concept in hardware designs is \emph{state}. In a
+ stateless (or, \emph{combinatoric}) design, every output is a directly and solely dependent on the
+ inputs. In a stateful design, the outputs can depend on the history of
+ inputs, or the \emph{state}. State is usually stored in \emph{registers},
+ which retain their value during a clockcycle, and are typically updated at
+ the start of every clockcycle. Since the updating of the state is tightly
+ coupled (synchronized) to the clock signal, these state updates are often
+ called \emph{synchronous}.
+
+ To make our hardware description language useful to describe more that
+ simple combinatoric designs, we'll need to be able to describe state in
+ some way.
\subsection{Approaches to state}
- Explain impact of state (or rather, temporal behaviour) on function signature.
+ In Haskell, functions are always pure (except when using unsafe
+ functions like \hs{unsafePerformIO}, which should be prevented whenever
+ possible). This means that the output of a function solely depends on
+ its inputs. If you evaluate a given function with given inputs, it will
+ always provide the same output.
+
+ TODO: Define pure
+
+ This is a perfect match for a combinatoric circuit, where the output
+ also soley depend on the inputs. However, when state is involved, this
+ no longer holds. Since we're in charge of our own language, we could
+ remove this purity constraint and allow a function to return different
+ values depending on the cycle in which it is evaluated (or rather, the
+ current state). However, this means that all kinds of interesting
+ properties of our functional language get lost, and all kinds of
+ transformations and optimizations might no longer be meaning preserving.
+
+ Provided that we want to keep the function pure, the current state has
+ to be present in the function's arguments in some way. There seem to be
+ two obvious ways to do this: Adding the current state as an argument, or
+ including the full history of each argument.
+
\subsubsection{Stream arguments and results}
+ Including the entire history of each input (\eg, the value of that
+ input for each previous clockcycle) is an obvious way to make outputs
+ depend on all previous input. This is easily done by making every
+ input a list instead of a single value, containing all previous values
+ as well as the current value.
+
+ An obvious downside of this solution is that on each cycle, all the
+ previous cycles must be resimulated to obtain the current state. To do
+ this, it might be needed to have a recursive helper function as well,
+ wich might be hard to properly analyze by the compiler.
+
+ A slight variation on this approach is one taken by some of the other
+ functional \small{HDL}s in the field (TODO: References to Lava,
+ ForSyDe, ...): Make functions operate on complete streams. This means
+ that a function is no longer called on every cycle, but just once. It
+ takes stream as inputs instead of values, where each stream contains
+ all the values for every clockcycle since system start. This is easily
+ modeled using an (infinite) list, with one element for each clock
+ cycle. Since the funciton is only evaluated once, its output is also a
+ stream. Note that, since we are working with infinite lists and still
+ want to be able to simulate the system cycle-by-cycle, this relies
+ heavily on the lazy semantics of Haskell.
+
+ Since our inputs and outputs are streams, all other (intermediate)
+ values must be streams. All of our primitive operators (\eg, addition,
+ substraction, bitwise operations, etc.) must operate on streams as
+ well (note that changing a single-element operation to a stream
+ operation can done with \hs{map}, \hs{zipwith}, etc.).
+
+ Note that the concept of \emph{state} is no more than having some way
+ to communicate a value from one cycle to the next. By introducing a
+ \hs{delay} function, we can do exactly that: Delay (each value in) a
+ stream so that we can "look into" the past. This \hs{delay} function
+ simply outputs a stream where each value is the same as the input
+ value, but shifted one cycle. This causes a \quote{gap} at the
+ beginning of the stream: What is the value of the delay output in the
+ first cycle? For this, the \hs{delay} function has a second input
+ (which is a value, not a stream!).
+
+ \in{Example}[ex:DelayAcc] shows a simple accumulator expressed in this
+ style.
+
+\startbuffer[DelayAcc]
+acc :: Stream Word -> Stream Word
+acc in = out
+ where
+ out = (delay out 0) + in
+\stopbuffer
+
+\startuseMPgraphic{DelayAcc}
+ save in, out, add, reg;
+
+ % I/O ports
+ newCircle.in(btex $in$ etex) "framed(false)";
+ newCircle.out(btex $out$ etex) "framed(false)";
+
+ % Components
+ newReg.reg("") "dx(4mm)", "dy(6mm)", "reflect(true)";
+ newCircle.add(btex + etex);
+
+ in.c = origin;
+ add.c = in.c + (2cm, 0cm);
+ out.c = add.c + (2cm, 0cm);
+ reg.c = add.c + (0cm, 2cm);
+
+ % Draw objects and lines
+ drawObj(in, out, add, reg);
+
+ nccurve(add)(reg) "angleA(0)", "angleB(180)", "posB(d)";
+ nccurve(reg)(add) "angleA(180)", "angleB(-45)", "posA(out)";
+ ncline(in)(add);
+ ncline(add)(out);
+\stopuseMPgraphic
+
+
+ \placeexample[here][ex:DelayAcc]{Simple accumulator architecture.}
+ \startcombination[2*1]
+ {\typebufferhs{DelayAcc}}{Haskell description using streams.}
+ {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
+ \stopcombination
+
+
+ This notation can be confusing (especially due to the loop in the
+ definition of out), but is essentially easy to interpret. There is a
+ single call to delay, resulting in a circuit with a single register,
+ whose input is connected to \hs{outl (which is the output of the
+ adder)}, and it's output is the \hs{delay out 0} (which is connected
+ to one of the adder inputs).
+
+ This notation has a number of downsides, amongst which are limited
+ readability and ambiguity in the interpretation. TODO: Reference
+ Christiaan.
+
\subsubsection{Explicit state arguments and results}
- Nested state for called functions.
+ A more explicit way to model state, is to simply add an extra argument
+ containing the current state value. This allows an output to depend on
+ both the inputs as well as the current state while keeping the
+ function pure (letting the result depend only on the arguments), since
+ the current state is now an argument.
+
+ In Haskell, this would look like \in{example}[ex:ExplicitAcc].
+
+\startbuffer[ExplicitAcc]
+-- input -> current state -> (new state, output)
+acc :: Word -> Word -> (Word, Word)
+acc in (State s) = (State s', out)
+ where
+ out = s + in
+ s' = out
+\stopbuffer
+
+ \placeexample[here][ex:ExplicitAcc]{Simple accumulator architecture.}
+ \startcombination[2*1]
+ {\typebufferhs{ExplicitAcc}}{Haskell description using explicit state arguments.}
+ % Picture is identical to the one we had just now.
+ {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
+ \stopcombination
+
+ This approach makes a function's state very explicit, which state
+ variables are used by a function can be completely determined from its
+ type signature (as opposed to the stream approach, where a function
+ looks the same from the outside, regardless of what state variables it
+ uses (or wether it's stateful at all).
+
+ This approach is the one chosen for Cλash and will be examined more
+ closely below.
\subsection{Explicit state specification}
- Note about semantic correctness of top level state.
-
- Note about automatic ``down-pushing'' of state.
-
- Note about explicit state specification as the best solution.
-
- Note about substates
-
- Note about conditions on state variables and checking them.
+ We've seen the concept of explicit state in a simple example below, but
+ what are the implications of this approach?
+
+ \subsubsection{Substates}
+ Since a function's state is reflected directly in its type signature,
+ if a function calls other stateful functions (\eg, has subcircuits) it
+ has to somehow know the current state for these called functions. The
+ only way to do this, is to put these \emph{substates} inside the
+ caller's state. This means that a function's state is the sum of the
+ states of all functions it calls, and its own state.
+
+ This also means that the type of a function (at least the "state"
+ part) is dependent on its implementation and the functions it calls.
+ This is the major downside of this approach: The separation between
+ interface and implementation is limited. However, since Cλash is not
+ very suitable for separate compilation (see
+ \in{section}[sec:prototype:separate]) this is not a big problem in
+ practice. Additionally, when using a type synonym for the state type
+ of each function, we can still provide explicit type signatures
+ while keeping the state specification for a function near its
+ definition only.
+
+ \subsubsection{...}
+ We need some way to know which arguments should become input ports and
+ which argument(s?) should become the current state (\eg, be bound to
+ the register outputs). This does not hold holds not just for the top
+ level function, but also for any subfunctions. Or could we perhaps
+ deduce the statefulness of subfunctions by analyzing the flow of data
+ in the calling functions?
+
+ To explore this matter, we make an interesting observation: We get
+ completely correct behaviour when we put all state registers in the
+ top level entity (or even outside of it). All of the state arguments
+ and results on subfunctions are treated as normal input and output
+ ports. Effectively, a stateful function results in a stateless
+ hardware component that has one of its input ports connected to the
+ output of a register and one of its output ports connected to the
+ input of the same register.
+
+ TODO: Example?
+
+ Of course, even though the hardware described like this has the
+ correct behaviour, unless the layout tool does smart optimizations,
+ there will be a lot of extra wire in the design (since registers will
+ not be close to the component that uses them). Also, when working with
+ the generated \small{VHDL} code, there will be a lot of extra ports
+ just to pass one state values, which can get quite confusing.
+
+ To fix this, we can simply \quote{push} the registers down into the
+ subcircuits. When we see a register that is connected directly to a
+ subcircuit, we remove the corresponding input and output port and put
+ the register inside the subcircuit instead. This is slightly less
+ trivial when looking at the Haskell code instead of the resulting
+ circuit, but the idea is still the same.
+
+ TODO: Example?
+
+ However, when applying this technique, we might push registers down
+ too far. When you intend to store a result of a stateless subfunction
+ in the caller's state and pass the current value of that state
+ variable to that same function, the register might get pushed down too
+ far. It is impossible to distinguish this case from similar code where
+ the called function is in fact stateful. From this we can conclude
+ that we have to either:
+
+ \startitemize
+ \item accept that the generated hardware might not be exactly what we
+ intended, in some specific cases. In most cases, the hardware will be
+ what we intended.
+ \item explicitely annotate state arguments and results in the input
+ description.
+ \stopitemize
+
+ The first option causes (non-obvious) exceptions in the language
+ intepretation. Also, automatically determining where registers should
+ end up is easier to implement correctly with explicit annotations, so
+ for these reasons we will look at how this annotations could work.
+
+
+ TODO: Note about conditions on state variables and checking them.
\subsection{Explicit state implementation}
Recording state variables at the type level.