+ the function \hs{(*) :: Word -> Word -> Word} to the value \hs{2 :: Word},
+ resulting in the expression \hs{(*) 2 :: Word -> Word}. Since this resulting
+ expression is again a function, hardware cannot be generated for it
+ directly. This is because the hardware to generate for \hs{mul}
+ depends completely on where and how it is used. In this example, it is
+ even used twice.
+
+ However, it is clear that the above hardware description actually
+ describes valid hardware. In general, any partial applied function
+ must eventually become completely applied, at which point hardware for
+ it can be generated using the rules for function application given in
+ \in{section}[sec:description:application]. It might mean that a
+ partial application is passed around quite a bit (even beyond function
+ boundaries), but eventually, the partial application will become
+ completely applied. An example of this principe is given in
+ \in{section}[sec:normalization:defunctionalization].
+
+ \section{Costless specialization}
+ Each (complete) function application in our description generates a
+ component instantiation, or a specific piece of hardware in the final
+ design. It is interesting to note that each application of a function
+ generates a \emph{separate} piece of hardware. In the final design, none
+ of the hardware is shared between applications, even when the applied
+ function is the same (of course, if a particular value, such as the result
+ of a function application, is used twice, it is not calculated twice).
+
+ This is distinctly different from normal program compilation: Two separate
+ calls to the same function share the same machine code. Having more
+ machine code has implications for speed (due to less efficient caching)
+ and memory usage. For normal compilation, it is therefore important to
+ keep the amount of functions limited and maximize the code sharing
+ (though there is a tradeoff between speed and memory usage here).
+
+ When generating hardware, this is hardly an issue. Having more \quote{code
+ sharing} does reduce the amount of \small{VHDL} output (Since different
+ component instantiations still share the same component), but after
+ synthesis, the amount of hardware generated is not affected. This
+ means there is no tradeoff between speed and memory (or rather,
+ chip area) usage.
+
+ In particular, if we would duplicate all functions so that there is a
+ separate function for every application in the program (\eg, each function
+ is then only applied exactly once), there would be no increase in hardware
+ size whatsoever.
+
+ Because of this, a common optimization technique called
+ \emph{specialization} can be applied to hardware generation without any
+ performance or area cost (unlike for software).
+
+ \fxnote{Perhaps these next three sections are a bit too
+ implementation-oriented?}
+
+ \subsection{Specialization}
+ \defref{specialization}
+ Given some function that has a \emph{domain} $D$ (\eg, the set of
+ all possible arguments that the function could be applied to), we
+ create a specialized function with exactly the same behaviour, but
+ with a domain $D' \subset D$. This subset can be chosen in all
+ sorts of ways. Any subset is valid for the general definition of
+ specialization, but in practice only some of them provide useful
+ optimization opportunities.
+
+ Common subsets include limiting a polymorphic argument to a single type
+ (\ie, removing polymorphism) or limiting an argument to just a single
+ value (\ie, cross-function constant propagation, effectively removing
+ the argument).
+
+ Since we limit the argument domain of the specialized function, its
+ definition can often be optimized further (since now more types or even
+ values of arguments are already known). By replacing any application of
+ the function that falls within the reduced domain by an application of
+ the specialized version, the code gets faster (but the code also gets
+ bigger, since we now have two versions instead of one). If we apply
+ this technique often enough, we can often replace all applications of a
+ function by specialized versions, allowing the original function to be
+ removed (in some cases, this can even give a net reduction of the code
+ compared to the non-specialized version).
+
+ Specialization is useful for our hardware descriptions for functions
+ that contain arguments that cannot be translated to hardware directly
+ (polymorphic or higher-order arguments, for example). If we can create
+ specialized functions that remove the argument, or make it translatable,
+ we can use specialization to make the original, untranslatable, function
+ obsolete.
+
+ \section{Higher order values}
+ What holds for partial application, can be easily generalized to any
+ higher-order expression. This includes partial applications, plain
+ variables (e.g., a binder referring to a top level function), lambda
+ expressions and more complex expressions with a function type (a \hs{case}
+ expression returning lambda's, for example).
+
+ Each of these values cannot be directly represented in hardware (just like
+ partial applications). Also, to make them representable, they need to be
+ applied: function variables and partial applications will then eventually
+ become complete applications, applied lambda expressions disappear by
+ applying β-reduction, etc.
+
+ So any higher-order value will be \quote{pushed down} towards its
+ application just like partial applications. Whenever a function boundary
+ needs to be crossed, the called function can be specialized.
+
+ \fxnote{This section needs improvement and an example}
+
+ \section{Polymorphism}
+ In Haskell, values can be \emph{polymorphic}: They can have multiple types. For
+ example, the function \hs{fst :: (a, b) -> a} is an example of a
+ polymorphic function: It works for tuples with any two element types. Haskell
+ type classes allow a function to work on a specific set of types, but the
+ general idea is the same. The opposite of this is a \emph{monomorphic}
+ value, which has a single, fixed, type.
+
+% A type class is a collection of types for which some operations are
+% defined. It is thus possible for a value to be polymorphic while having
+% any number of \emph{class constraints}: The value is not defined for
+% every type, but only for types in the type class. An example of this is
+% the \hs{even :: (Integral a) => a -> Bool} function, which can map any
+% value of a type that is member of the \hs{Integral} type class
+
+ When generating hardware, polymorphism cannot be easily translated. How
+ many wires will you lay down for a value that could have any type? When
+ type classes are involved, what hardware components will you lay down for
+ a class method (whose behaviour depends on the type of its arguments)?
+ Note that Cλash currently does not allow user-defined type classes,
+ but does partly support some of the built-in type classes (like \hs{Num}).
+
+ Fortunately, we can again use the principle of specialization: Since every
+ function application generates a separate piece of hardware, we can know
+ the types of all arguments exactly. Provided that existential typing
+ (which is a \GHC extension) is not used typing, all of the
+ polymorphic types in a function must depend on the types of the
+ arguments (In other words, the only way to introduce a type variable
+ is in a lambda abstraction).
+
+ If a function is monomorphic, all values inside it are monomorphic as
+ well, so any function that is applied within the function can only be
+ applied to monomorphic values. The applied functions can then be
+ specialized to work just for these specific types, removing the
+ polymorphism from the applied functions as well.
+
+ \defref{entry function}The entry function must not have a
+ polymorphic type (otherwise no hardware interface could be generated
+ for the entry function).
+
+ By induction, this means that all functions that are (indirectly) called
+ by our top level function (meaning all functions that are translated in
+ the final hardware) become monomorphic.
+
+ \section{State}
+ A very important concept in hardware designs is \emph{state}. In a
+ stateless (or, \emph{combinational}) design, every output is directly and solely dependent on the
+ inputs. In a stateful design, the outputs can depend on the history of
+ inputs, or the \emph{state}. State is usually stored in \emph{registers},
+ which retain their value during a clockcycle, and are typically updated at
+ the start of every clockcycle. Since the updating of the state is tightly
+ coupled (synchronized) to the clock signal, these state updates are often
+ called \emph{synchronous} behaviour.
+
+ \todo{Sidenote? Registers can contain any (complex) type}
+
+ To make Cλash useful to describe more than simple combinational
+ designs, it needs to be able to describe state in some way.
+
+ \subsection{Approaches to state}
+ In Haskell, functions are always pure (except when using unsafe
+ functions with the \hs{IO} monad, which is not supported by Cλash). This
+ means that the output of a function solely depends on its inputs. If you
+ evaluate a given function with given inputs, it will always provide the
+ same output.
+
+ \placeintermezzo{}{
+ \defref{purity}
+ \startframedtext[width=8cm,background=box,frame=no]
+ \startalignment[center]
+ {\tfa Purity}
+ \stopalignment
+ \blank[medium]
+
+ A function is said to be pure if it satisfies two conditions:
+
+ \startitemize[KR]
+ \item When a pure function is called with the same arguments twice, it should
+ return the same value in both cases.
+ \item When a pure function is called, it should have not
+ observable side-effects.
+ \stopitemize
+
+ Purity is an important property in functional languages, since
+ it enables all kinds of mathematical reasoning and
+ optimizattions with pure functions, that are not guaranteed to
+ be correct for impure functions.
+
+ An example of a pure function is the square root function
+ \hs{sqrt}. An example of an impure function is the \hs{today}
+ function that returns the current date (which of course cannot
+ exist like this in Haskell).
+ \stopframedtext
+ }
+
+ This is a perfect match for a combinational circuit, where the output
+ also solely depends on the inputs. However, when state is involved, this
+ no longer holds. Of course this purity constraint cannot just be
+ removed from Haskell. But even when designing a completely new (hardware
+ description) language, it does not seem to be a good idea to
+ remove this purity. This would that all kinds of interesting properties of
+ the functional language get lost, and all kinds of transformations
+ and optimizations are no longer be meaning preserving.
+
+ So our functions must remain pure, meaning the current state has
+ to be present in the function's arguments in some way. There seem
+ to be two obvious ways to do this: Adding the current state as an
+ argument, or including the full history of each argument.
+
+ \subsubsection{Stream arguments and results}
+ Including the entire history of each input (\eg, the value of that
+ input for each previous clockcycle) is an obvious way to make outputs
+ depend on all previous input. This is easily done by making every
+ input a list instead of a single value, containing all previous values
+ as well as the current value.
+
+ An obvious downside of this solution is that on each cycle, all the
+ previous cycles must be resimulated to obtain the current state. To do
+ this, it might be needed to have a recursive helper function as well,
+ which might be hard to be properly analyzed by the compiler.
+
+ A slight variation on this approach is one taken by some of the other
+ functional \small{HDL}s in the field: \todo{References to Lava,
+ ForSyDe, ...} Make functions operate on complete streams. This means
+ that a function is no longer called on every cycle, but just once. It
+ takes stream as inputs instead of values, where each stream contains
+ all the values for every clockcycle since system start. This is easily
+ modeled using an (infinite) list, with one element for each clock
+ cycle. Since the function is only evaluated once, its output must also
+ be a stream. Note that, since we are working with infinite lists and
+ still want to be able to simulate the system cycle-by-cycle, this
+ relies heavily on the lazy semantics of Haskell.
+
+ Since our inputs and outputs are streams, all other (intermediate)
+ values must be streams. All of our primitive operators (\eg, addition,
+ substraction, bitwise operations, etc.) must operate on streams as
+ well (note that changing a single-element operation to a stream
+ operation can done with \hs{map}, \hs{zipwith}, etc.).
+
+ This also means that primitive operations from an existing
+ language such as Haskell cannot be used directly (including some
+ syntax constructs, like \hs{case} expressions and \hs{if}
+ expressions). This mkes this approach well suited for use in
+ \small{EDSL}s, since those already impose these same
+ limitations. \refdef{EDSL}
+
+ Note that the concept of \emph{state} is no more than having some way
+ to communicate a value from one cycle to the next. By introducing a
+ \hs{delay} function, we can do exactly that: Delay (each value in) a
+ stream so that we can "look into" the past. This \hs{delay} function
+ simply outputs a stream where each value is the same as the input
+ value, but shifted one cycle. This causes a \quote{gap} at the
+ beginning of the stream: What is the value of the delay output in the
+ first cycle? For this, the \hs{delay} function has a second input
+ (which is a value, not a stream!).
+
+ \in{Example}[ex:DelayAcc] shows a simple accumulator expressed in this
+ style.
+
+\startbuffer[DelayAcc]
+acc :: Stream Word -> Stream Word
+acc in = out
+ where
+ out = (delay out 0) + in
+\stopbuffer
+
+\startuseMPgraphic{DelayAcc}
+ save in, out, add, reg;
+
+ % I/O ports
+ newCircle.in(btex $in$ etex) "framed(false)";
+ newCircle.out(btex $out$ etex) "framed(false)";
+
+ % Components
+ newReg.reg("") "dx(4mm)", "dy(6mm)", "reflect(true)";
+ newCircle.add(btex + etex);
+
+ in.c = origin;
+ add.c = in.c + (2cm, 0cm);
+ out.c = add.c + (2cm, 0cm);
+ reg.c = add.c + (0cm, 2cm);
+
+ % Draw objects and lines
+ drawObj(in, out, add, reg);
+
+ nccurve(add)(reg) "angleA(0)", "angleB(180)", "posB(d)";
+ nccurve(reg)(add) "angleA(180)", "angleB(-45)", "posA(out)";
+ ncline(in)(add);
+ ncline(add)(out);
+\stopuseMPgraphic
+
+
+ \placeexample[here][ex:DelayAcc]{Simple accumulator architecture.}
+ \startcombination[2*1]
+ {\typebufferhs{DelayAcc}}{Haskell description using streams.}
+ {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
+ \stopcombination
+
+
+ This notation can be confusing (especially due to the loop in the
+ definition of out), but is essentially easy to interpret. There is a
+ single call to delay, resulting in a circuit with a single register,
+ whose input is connected to \hs{out} (which is the output of the
+ adder), and its output is the expression \hs{delay out 0} (which is
+ connected to one of the adder inputs).
+
+ \subsubsection{Explicit state arguments and results}
+ A more explicit way to model state, is to simply add an extra argument
+ containing the current state value. This allows an output to depend on
+ both the inputs as well as the current state while keeping the
+ function pure (letting the result depend only on the arguments), since
+ the current state is now an argument.
+
+ In Haskell, this would look like
+ \in{example}[ex:ExplicitAcc]\footnote[notfinalsyntax]{This
+ example is not in the final Cλash syntax}. \todo{Referencing
+ notfinalsyntax from Introduction.tex doesn't work}
+
+\startbuffer[ExplicitAcc]
+-- input -> current state -> (new state, output)
+acc :: Word -> Word -> (Word, Word)
+acc in s = (s', out)
+ where
+ out = s + in
+ s' = out
+\stopbuffer
+
+ \placeexample[here][ex:ExplicitAcc]{Simple accumulator architecture.}
+ \startcombination[2*1]
+ {\typebufferhs{ExplicitAcc}}{Haskell description using explicit state arguments.}
+ % Picture is identical to the one we had just now.
+ {\boxedgraphic{DelayAcc}}{The architecture described by the Haskell description.}
+ \stopcombination
+
+ This approach makes a function's state very explicit, which state
+ variables are used by a function can be completely determined from its
+ type signature (as opposed to the stream approach, where a function
+ looks the same from the outside, regardless of what state variables it
+ uses or whether it is stateful at all).
+
+ This approach to state has been one of the initial drives behind
+ this research. Unlike a stream based approach it is well suited
+ to completely use existing code and language features (like
+ \hs{if} and \hs{case} expressions) because it operates on normal
+ values. Because of these reasons, this is the approach chosen
+ for Cλash. It will be examined more closely below.
+
+ \subsection{Explicit state specification}
+ The concept of explicit state has been introduced with some
+ examples above, but what are the implications of this approach?
+
+ \subsubsection{Substates}
+ Since a function's state is reflected directly in its type signature,
+ if a function calls other stateful functions (\eg, has subcircuits), it
+ has to somehow know the current state for these called functions. The
+ only way to do this, is to put these \emph{substates} inside the
+ caller's state. This means that a function's state is the sum of the
+ states of all functions it calls, and its own state. This sum
+ can be obtained using something simple like a tuple, or possibly
+ custom algebraic types for clarity.
+
+ This also means that the type of a function (at least the "state"
+ part) is dependent on its own implementation and of the functions it
+ calls.
+
+ This is the major downside of this approach: The separation between
+ interface and implementation is limited. However, since Cλash is not
+ very suitable for separate compilation (see
+ \in{section}[sec:prototype:separate]) this is not a big problem in
+ practice.
+
+ Additionally, when using a type synonym for the state type
+ of each function, we can still provide explicit type signatures
+ while keeping the state specification for a function near its
+ definition only.
+ \todo{Example}
+
+ \subsubsection{Which arguments and results are stateful?}
+ \fxnote{This section should get some examples}
+ We need some way to know which arguments should become input ports and
+ which argument(s?) should become the current state (\eg, be bound to
+ the register outputs). This does not hold just for the top
+ level function, but also for any subfunction. Or could we perhaps
+ deduce the statefulness of subfunctions by analyzing the flow of data
+ in the calling functions?
+
+ To explore this matter, the following observeration is interesting: We
+ get completely correct behaviour when we put all state registers in
+ the top level entity (or even outside of it). All of the state
+ arguments and results on subfunctions are treated as normal input and
+ output ports. Effectively, a stateful function results in a stateless
+ hardware component that has one of its input ports connected to the
+ output of a register and one of its output ports connected to the
+ input of the same register.
+
+ \todo{Example}
+
+ Of course, even though the hardware described like this has the
+ correct behaviour, unless the layout tool does smart optimizations,
+ there will be a lot of extra wire in the design (since registers will
+ not be close to the component that uses them). Also, when working with
+ the generated \small{VHDL} code, there will be a lot of extra ports
+ just to pass on state values, which can get quite confusing.
+
+ To fix this, we can simply \quote{push} the registers down into the
+ subcircuits. When we see a register that is connected directly to a
+ subcircuit, we remove the corresponding input and output port and put
+ the register inside the subcircuit instead. This is slightly less
+ trivial when looking at the Haskell code instead of the resulting
+ circuit, but the idea is still the same.
+
+ \todo{Example}
+
+ However, when applying this technique, we might push registers down
+ too far. When you intend to store a result of a stateless subfunction
+ in the caller's state and pass the current value of that state
+ variable to that same function, the register might get pushed down too
+ far. It is impossible to distinguish this case from similar code where
+ the called function is in fact stateful. From this we can conclude
+ that we have to either:
+
+ \todo{Example of wrong downpushing}
+
+ \startitemize
+ \item accept that the generated hardware might not be exactly what we
+ intended, in some specific cases. In most cases, the hardware will be
+ what we intended.
+ \item explicitly annotate state arguments and results in the input
+ description.
+ \stopitemize
+
+ The first option causes (non-obvious) exceptions in the language
+ intepretation. Also, automatically determining where registers should
+ end up is easier to implement correctly with explicit annotations, so
+ for these reasons we will look at how this annotations could work.
+
+ \todo{Sidenote: One or more state arguments?}
+
+ \subsection[sec:description:stateann]{Explicit state annotation}
+ To make our stateful descriptions unambigious and easier to translate,
+ we need some way for the developer to describe which arguments and
+ results are intended to become stateful.
+
+ Roughly, we have two ways to achieve this:
+ \startitemize[KR]
+ \item Use some kind of annotation method or syntactic construction in
+ the language to indicate exactly which argument and (part of the)
+ result is stateful. This means that the annotation lives
+ \quote{outside} of the function, it is completely invisible when
+ looking at the function body.
+ \item Use some kind of annotation on the type level, \ie give stateful
+ arguments and stateful (parts of) results a different type. This has the
+ potential to make this annotation visible inside the function as well,
+ such that when looking at a value inside the function body you can
+ tell if it is stateful by looking at its type. This could possibly make
+ the translation process a lot easier, since less analysis of the
+ program flow might be required.
+ \stopitemize
+
+ From these approaches, the type level \quote{annotations} have been
+ implemented in Cλash. \in{Section}[sec:prototype:statetype] expands on
+ the possible ways this could have been implemented.
+
+ \todo{Note about conditions on state variables and checking them}
+
+ \section[sec:recursion]{Recursion}
+ An important concept in functional languages is recursion. In its most basic
+ form, recursion is a definition that is described in terms of itself. A
+ recursive function is thus a function that uses itself in its body. This