+ \subsection{Polymorphism}\label{sec:polymorhpism}
+ A powerful feature of most (functional) programming languages is
+ polymorphism, it allows a function to handle values of different data
+ types in a uniform way. Haskell supports \emph{parametric
+ polymorphism}~\cite{polymorphism}, meaning functions can be written
+ without mention of any specific type and can be used transparently with
+ any number of new types.
+
+ As an example of a parametric polymorphic function, consider the type of
+ the following \hs{append} function, which appends an element to a
+ vector:\footnote{The \hs{::} operator is used to annotate a function
+ with its type.}
+
+ \begin{code}
+ append :: [a|n] -> a -> [a|n + 1]
+ \end{code}
+
+ This type is parameterized by \hs{a}, which can contain any type at
+ all. This means that \hs{append} can append an element to a vector,
+ regardless of the type of the elements in the list (as long as the type of
+ the value to be added is of the same type as the values in the vector).
+ This kind of polymorphism is extremely useful in hardware designs to make
+ operations work on a vector without knowing exactly what elements are
+ inside, routing signals without knowing exactly what kinds of signals
+ these are, or working with a vector without knowing exactly how long it
+ is. Polymorphism also plays an important role in most higher order
+ functions, as we will see in the next section.
+
+ Another type of polymorphism is \emph{ad-hoc
+ polymorphism}~\cite{polymorphism}, which refers to polymorphic
+ functions which can be applied to arguments of different types, but which
+ behave differently depending on the type of the argument to which they are
+ applied. In Haskell, ad-hoc polymorphism is achieved through the use of
+ type classes, where a class definition provides the general interface of a
+ function, and class instances define the functionality for the specific
+ types. An example of such a type class is the \hs{Num} class, which
+ contains all of Haskell's numerical operations. A designer can make use
+ of this ad-hoc polymorphism by adding a constraint to a parametrically
+ polymorphic type variable. Such a constraint indicates that the type
+ variable can only be instantiated to a type whose members supports the
+ overloaded functions associated with the type class.
+
+ An example of a type signature that includes such a constraint if the
+ signature of the \hs{sum} function, which sums the values in a vector:
+ \begin{code}
+ sum :: Num a => [a|n] -> a
+ \end{code}
+
+ This type is again parameterized by \hs{a}, but it can only contain
+ types that are \emph{instances} of the \emph{type class} \hs{Num}, so that
+ the compiler knows that the addition (+) operator is defined for that
+ type.
+ % \CLaSH's built-in numerical types are also instances of the \hs{Num}
+ % class.
+ % so we can use the addition operator (and thus the \hs{sum}
+ % function) with \hs{Signed} as well as with \hs{Unsigned}.
+
+ \CLaSH\ supports both parametric polymorphism and ad-hoc polymorphism. Any
+ function defined can have any number of unconstrained type parameters. A
+ developer can also specify his own type classes and corresponding
+ instances. The \CLaSH\ compiler will infer the type of every polymorphic
+ argument depending on how the function is applied. There is however one
+ constraint: the top level function that is being translated can not have
+ any polymorphic arguments. The arguments of the top-level can not be
+ polymorphic as the function is never applied and consequently there is no
+ way to determine the actual types for the type parameters.
+
+ With regard to the built-in types, it should be noted that members of
+ some of the standard Haskell type classes are supported as built-in
+ functions. These include: the numerial operators of \hs{Num}, the equality
+ operators of \hs{Eq}, and the comparison/order operators of \hs{Ord}.
+
+ \subsection{Higher-order functions \& values}
+ Another powerful abstraction mechanism in functional languages, is
+ the concept of \emph{functions as a first class value}, also called
+ \emph{higher-order functions}. This allows a function to be treated as a
+ value and be passed around, even as the argument of another
+ function. The following example should clarify this concept:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ %format not = "\mathit{not}"
+ \begin{code}
+ negateVector xs = map not xs
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code6}
+ \end{example}
+ \end{minipage}
+
+ The code above defines the \hs{negateVector} function, which takes a
+ vector of booleans, \hs{xs}, and returns a vector where all the values are
+ negated. It achieves this by calling the \hs{map} function, and passing it
+ \emph{another function}, boolean negation, and the vector of booleans,
+ \hs{xs}. The \hs{map} function applies the negation function to all the
+ elements in the vector.
+
+ The \hs{map} function is called a higher-order function, since it takes
+ another function as an argument. Also note that \hs{map} is again a
+ parametric polymorphic function: it does not pose any constraints on the
+ type of the input vector, other than that its elements must have the same
+ type as the first argument of the function passed to \hs{map}. The element
+ type of the resulting vector is equal to the return type of the function
+ passed, which need not necessarily be the same as the element type of the
+ input vector. All of these characteristics can readily be inferred from
+ the type signature belonging to \hs{map}:
+
+ \begin{code}
+ map :: (a -> b) -> [a|n] -> [b|n]
+ \end{code}
+
+ So far, only functions have been used as higher-order values. In
+ Haskell, there are two more ways to obtain a function-typed value:
+ partial application and lambda abstraction. Partial application
+ means that a function that takes multiple arguments can be applied
+ to a single argument, and the result will again be a function (but
+ that takes one argument less). As an example, consider the following
+ expression, that adds one to every element of a vector:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ map (add 1) xs
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code7}
+ \end{example}
+ \end{minipage}
+
+ Here, the expression \hs{(add 1)} is the partial application of the
+ addition function to the value \hs{1}, which is again a function that
+ adds one to its (next) argument. A lambda expression allows one to
+ introduce an anonymous function in any expression. Consider the following
+ expression, which again adds one to every element of a vector:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ map (\x -> x + 1) xs
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code8}
+ \end{example}
+ \end{minipage}
+
+ Finally, not only built-in functions can have higher order arguments (such
+ as the \hs{map} function), but any function defined in \CLaSH\ may have
+ functions as arguments. This allows the circuit designer to use a
+ powerful amount of code reuse. The only exception is again the top-level
+ function: if a function-typed argument is not applied with an actual
+ function, no hardware can be generated.
+
+ % \comment{TODO: Describe ALU example (no code)}
+
+ \subsection{State}
+ A very important concept in hardware is the concept of state. In a
+ stateful design, the outputs depend on the history of the inputs, or the
+ state. State is usually stored in registers, which retain their value
+ during a clock cycle. As we want to describe more than simple
+ combinational designs, \CLaSH\ needs an abstraction mechanism for state.
+
+ An important property in Haskell, and in most other functional languages,
+ is \emph{purity}. A function is said to be \emph{pure} if it satisfies two
+ conditions:
+ \begin{inparaenum}
+ \item given the same arguments twice, it should return the same value in
+ both cases, and
+ \item that the function has no observable side-effects.
+ \end{inparaenum}
+ % This purity property is important for functional languages, since it
+ % enables all kinds of mathematical reasoning that could not be guaranteed
+ % correct for impure functions.
+ Pure functions are as such a perfect match for combinational circuits,
+ where the output solely depends on the inputs. When a circuit has state
+ however, it can no longer be simply described by a pure function.
+ % Simply removing the purity property is not a valid option, as the
+ % language would then lose many of it mathematical properties.
+ In \CLaSH\ we deal with the concept of state in pure functions by making
+ the current state an additional argument of the function, and the
+ updated state part of result. In this sense the descriptions made in
+ \CLaSH\ are the combinational parts of a mealy machine.
+
+ A simple example is adding an accumulator register to the earlier
+ multiply-accumulate circuit, of which the resulting netlist can be seen in
+ \Cref{img:mac-state}:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ macS (State c) a b = (State c', c')
+ where
+ c' = mac a b c
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code9}
+ \end{example}
+ \end{minipage}
+
+ \begin{figure}
+ \centerline{\includegraphics{mac-state.svg}}
+ \caption{Stateful Multiply-Accumulate}
+ \label{img:mac-state}
+ \vspace{-1.5em}
+ \end{figure}
+
+ Note that the \hs{macS} function returns both the new state and the value
+ of the output port. The \hs{State} keyword indicates which arguments are
+ part of the current state, and what part of the output is part of the
+ updated state. This aspect will also be reflected in the type signature of
+ the function. Abstracting the state of a circuit in this way makes it very
+ explicit: which variables are part of the state is completely determined
+ by the type signature. This approach to state is well suited to be used in
+ combination with the existing code and language features, such as all the
+ choice elements, as state values are just normal values. We can simulate
+ stateful descriptions using the recursive \hs{run} function:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ run f s (i : inps) = o : (run f s' inps)
+ where
+ (s', o) = f s i
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code10}
+ \end{example}
+ \end{minipage}
+
+ The \hs{(:)} operator is the list concatenation operator, where the
+ left-hand side is the head of a list and the right-hand side is the
+ remainder of the list. The \hs{run} function applies the function the
+ developer wants to simulate, \hs{f}, to the current state, \hs{s}, and the
+ first input value, \hs{i}. The result is the first output value, \hs{o},
+ and the updated state \hs{s'}. The next iteration of the \hs{run} function
+ is then called with the updated state, \hs{s'}, and the rest of the
+ inputs, \hs{inps}. For the time being, and in the context of this paper,
+ it is assumed that there is one input per clock cycle. Also note how the
+ order of the input, output, and state in the \hs{run} function corresponds
+ with the order of the input, output and state of the \hs{macS} function
+ described earlier.
+
+ As the \hs{run} function, the hardware description, and the test
+ inputs are also valid Haskell, the complete simulation can be compiled to
+ an executable binary by an optimizing Haskell compiler, or executed in an
+ Haskell interpreter. Both simulation paths are much faster than first
+ translating the description to \VHDL\ and then running a \VHDL\
+ simulation.
+
+\section{The \CLaSH\ compiler}
+An important aspect in this research is the creation of the prototype
+compiler, which allows us to translate descriptions made in the \CLaSH\
+language as described in the previous section to synthesizable \VHDL.
+% , allowing a designer to actually run a \CLaSH\ design on an \acro{FPGA}.
+
+The Glasgow Haskell Compiler (\GHC)~\cite{ghc} is an open-source Haskell
+compiler that also provides a high level API to most of its internals. The
+availability of this high-level API obviated the need to design many of the
+tedious parts of the prototype compiler, such as the parser, semantics
+checker, and especially the type-checker. These parts together form the
+front-end of the prototype compiler pipeline, as seen in
+\Cref{img:compilerpipeline}.
+
+\begin{figure}
+\centerline{\includegraphics{compilerpipeline.svg}}
+\caption{\CLaSHtiny\ compiler pipeline}
+\label{img:compilerpipeline}
+\vspace{-1.5em}
+\end{figure}
+
+The output of the \GHC\ front-end consists of the translation of the original
+Haskell description in \emph{Core}~\cite{Sulzmann2007}, which is a smaller,
+typed, functional language. This \emph{Core} language is relatively easy to
+process compared to the larger Haskell language. A description in \emph{Core}
+can still contain elements which have no direct translation to hardware, such
+as polymorphic types and function-valued arguments. Such a description needs
+to be transformed to a \emph{normal form}, which only contains elements that
+have a direct translation. The second stage of the compiler, the
+\emph{normalization} phase, exhaustively applies a set of
+\emph{meaning-preserving} transformations on the \emph{Core} description until
+this description is in a \emph{normal form}. This set of transformations
+includes transformations typically found in reduction systems and lambda
+calculus~\cite{lambdacalculus}, such as $\beta$-reduction and
+$\eta$-expansion. It also includes self-defined transformations that are
+responsible for the reduction of higher-order functions to `regular'
+first-order functions, and specializing polymorphic types to concrete types.
+
+The final step in the compiler pipeline is the translation to a \VHDL\
+\emph{netlist}, which is a straightforward process due to resemblance of a
+normalized description and a set of concurrent signal assignments. We call the
+end-product of the \CLaSH\ compiler a \VHDL\ \emph{netlist} as the resulting
+\VHDL\ resembles an actual netlist description and not idiomatic \VHDL.
+
+\section{Use cases}
+\label{sec:usecases}
+\subsection{FIR Filter}
+As an example of a common hardware design where the use of higher-order
+functions leads to a very natural description is a \acro{FIR} filter, which is
+basically the dot-product of two vectors:
+
+\begin{equation}
+y_t = \sum\nolimits_{i = 0}^{n - 1} {x_{t - i} \cdot h_i }
+\end{equation}
+
+A \acro{FIR} filter multiplies fixed constants ($h$) with the current
+and a few previous input samples ($x$). Each of these multiplications
+are summed, to produce the result at time $t$. The equation of a \acro{FIR}
+filter is indeed equivalent to the equation of the dot-product, which is
+shown below:
+
+\begin{equation}
+\mathbf{a}\bullet\mathbf{b} = \sum\nolimits_{i = 0}^{n - 1} {a_i \cdot b_i }
+\end{equation}
+
+We can easily and directly implement the equation for the dot-product
+using higher-order functions:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+as *+* bs = foldl1 (+) (zipWith (*) as bs)
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code13}
+ \end{example}
+\end{minipage}
+
+The \hs{zipWith} function is very similar to the \hs{map} function seen
+earlier: It takes a function, two vectors, and then applies the function to
+each of the elements in the two vectors pairwise (\emph{e.g.}, \hs{zipWith (*)
+[1, 2] [3, 4]} becomes \hs{[1 * 3, 2 * 4]}).
+
+The \hs{foldl1} function takes a binary function, a single vector, and applies
+the function to the first two elements of the vector. It then applies the
+function to the result of the first application and the next element in the
+vector. This continues until the end of the vector is reached. The result of
+the \hs{foldl1} function is the result of the last application. It is obvious
+that the \hs{zipWith (*)} function is pairwise multiplication and that the
+\hs{foldl1 (+)} function is summation.
+% Returning to the actual \acro{FIR} filter, we will slightly change the
+% equation describing it, so as to make the translation to code more obvious and
+% concise. What we do is change the definition of the vector of input samples
+% and delay the computation by one sample. Instead of having the input sample
+% received at time $t$ stored in $x_t$, $x_0$ now always stores the newest
+% sample, and $x_i$ stores the $ith$ previous sample. This changes the equation
+% to the following (note that this is completely equivalent to the original
+% equation, just with a different definition of $x$ that will better suit the
+% transformation to code):
+%
+% \begin{equation}
+% y_t = \sum\nolimits_{i = 0}^{n - 1} {x_i \cdot h_i }
+% \end{equation}
+The complete definition of the \acro{FIR} filter in code then becomes:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+fir (State (xs,hs)) x =
+ (State (x >> xs,hs), (x +> xs) *+* hs)
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code14}
+ \end{example}
+\end{minipage}
+
+Where the vector \hs{xs} contains the previous input samples, the vector
+\hs{hs} contains the \acro{FIR} coefficients, and \hs{x} is the current input
+sample. The concatenate operator (\hs{+>}) creates a new vector by placing the
+current sample (\hs{x}) in front of the previous samples vector (\hs{xs}). The
+code for the shift (\hs{>>}) operator, that adds the new input sample (\hs{x})
+to the list of previous input samples (\hs{xs}) and removes the oldest sample,
+is shown below:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+x >> xs = x +> init xs
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code15}
+ \end{example}
+\end{minipage}
+
+Where the \hs{init} function returns all but the last element of a vector.
+The resulting netlist of a 4-taps \acro{FIR} filter, created by specializing
+the vectors of the \acro{FIR} code to a length of 4, is depicted in
+\Cref{img:4tapfir}.
+
+\begin{figure}
+\centerline{\includegraphics{4tapfir.svg}}
+\caption{4-taps \acrotiny{FIR} Filter}
+\label{img:4tapfir}
+\vspace{-1.5em}
+\end{figure}
+
+\subsection{Higher-order CPU}
+The following simple \acro{CPU} is an example of user-defined higher order
+functions and pattern matching. The \acro{CPU} consists of four function
+units, of which three have a fixed function and one can perform certain less
+common operations.
+
+The \acro{CPU} contains a number of data sources, represented by the
+horizontal wires in \Cref{img:highordcpu}. These data sources offer the
+previous outputs of each function units, along with the single data input the
+\acro{CPU} has and two fixed initialization values.
+
+Each of the function units has both its operands connected to all data
+sources, and can be programmed to select any data source for either
+operand. In addition, the leftmost function unit has an additional
+opcode input to select the operation it performs. The output of the rightmost
+function unit is also the output of the entire \acro{CPU}.
+
+Looking at the code, the function unit (\hs{fu}) is the most simple. It
+arranges the operand selection for the function unit. Note that it does not
+define the actual operation that takes place inside the function unit,
+but simply accepts the (higher-order) argument \hs{op} which is a function
+of two arguments that defines the operation.
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+fu op inputs (addr1, addr2) = regIn
+ where
+ in1 = inputs!addr1
+ in2 = inputs!addr2
+ regIn = op in1 in2
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code16}
+ \end{example}
+\end{minipage}
+
+The \hs{multiop} function defines the operation that takes place in the
+leftmost function unit. It is essentially a simple three operation \acro{ALU}
+that makes good use of pattern matching and guards in its description.
+The \hs{shift} function used here shifts its first operand by the number
+of bits indicated in the second operand, the \hs{xor} function produces
+the bitwise xor of its operands.
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+data Opcode = Shift | Xor | Equal
+
+multiop :: Opcode -> Word -> Word -> Word
+multiop Shift a b = shift a b
+multiop Xor a b = xor a b
+multiop Equal a b | a == b = 1
+ | otherwise = 0
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code17}
+ \end{example}
+\end{minipage}
+
+The \acro{CPU} function ties everything together. It applies the \hs{fu}
+function four times, to create a different function unit each time. The
+first application is interesting, because it does not just pass a
+function to \hs{fu}, but a partial application of \hs{multiop}. This
+shows how the first function unit effectively gets an extra input,
+compared to the others.
+
+The vector \hs{inputs} is the set of data sources, which is passed to
+each function unit as a set of possible operants. The \acro{CPU} also receives
+a vector of address pairs, which are used by each function unit to select
+their operand. The application of the function units to the \hs{inputs} and
+\hs{addrs} arguments seems quite repetitive and could be rewritten to use
+a combination of the \hs{map} and \hs{zipwith} functions instead.
+However, the prototype compiler does not currently support working with lists
+of functions, so a more explicit version of the code is given instead.
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+type CpuState = State [Word | 4]
+
+cpu :: CpuState -> Word -> [(Index 6, Index 6) | 4]
+ -> Opcode -> (CpuState, Word)
+cpu (State s) input addrs opc = (State s', out)
+ where
+ s' = [ fu (multiop opc) inputs (addrs!0)
+ , fu add inputs (addrs!1)
+ , fu sub inputs (addrs!2)
+ , fu mul inputs (addrs!3)
+ ]
+ inputs = 0 +> (1 +> (input +> s))
+ out = head s'
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{lst:code18}
+ \end{example}
+\end{minipage}
+
+This is still a simple example, but it could form the basis
+of an actual design, in which the same techniques can be reused.