+ The \hs{State} keyword indicates which arguments are part of the current
+ state, and what part of the output is part of the updated state. This
+ aspect will also be reflected in the type signature of the function.
+ Abstracting the state of a circuit in this way makes it very explicit:
+ which variables are part of the state is completely determined by the
+ type signature. This approach to state is well suited to be used in
+ combination with the existing code and language features, such as all the
+ choice elements, as state values are just normal values. We can simulate
+ stateful descriptions using the recursive \hs{run} function:
+
+ \begin{code}
+ run f s (i : inps) = o : (run f s' inps)
+ where
+ (s', o) = f s i
+ \end{code}
+
+ The \hs{(:)} operator is the list concatenation operator, where the
+ left-hand side is the head of a list and the right-hand side is the
+ remainder of the list. The \hs{run} function applies the function the
+ developer wants to simulate, \hs{f}, to the current state, \hs{s}, and the
+ first input value, \hs{i}. The result is the first output value, \hs{o},
+ and the updated state \hs{s'}. The next iteration of the \hs{run} function
+ is then called with the updated state, \hs{s'}, and the rest of the
+ inputs, \hs{inps}. It is assumed that there is one input per clock cycle.
+ Also note how the order of the input, output, and state in the \hs{run}
+ function corresponds with the order of the input, output and state of the
+ \hs{macS} function described earlier.
+
+ As both the \hs{run} function, the hardware description, and the test
+ inputs are plain Haskell, the complete simulation can be compiled to an
+ executable binary by an optimizing Haskell compiler, or executed in an
+ Haskell interpreter. Both simulation paths are much faster than first
+ translating the description to \VHDL\ and then running a \VHDL\
+ simulation, where the executable binary has an additional simulation speed
+ bonus in case there is a large set of test inputs.
+
+\section{\CLaSH\ compiler}
+An important aspect in this research is the creation of the prototype
+compiler, which allows us to translate descriptions made in the \CLaSH\
+language as described in the previous section to synthesizable \VHDL, allowing
+a designer to actually run a \CLaSH\ design on an \acro{FPGA}.
+
+The Glasgow Haskell Compiler (\GHC) is an open-source Haskell compiler that
+also provides a high level API to most of its internals. The availability of
+this high-level API obviated the need to design many of the tedious parts of
+the prototype compiler, such as the parser, semantic checker, and especially
+the type-checker. The parser, semantic checker, and type-checker together form
+the front-end of the prototype compiler pipeline, as depicted in
+\Cref{img:compilerpipeline}.
+
+\begin{figure}
+\centerline{\includegraphics{compilerpipeline.svg}}
+\caption{\CLaSHtiny\ compiler pipeline}
+\label{img:compilerpipeline}
+\end{figure}
+
+The output of the \GHC\ front-end is the original Haskell description
+translated to \emph{Core}~\cite{Sulzmann2007}, which is smaller, typed,
+functional language that is relatively easier to process than the larger
+Haskell language. A description in \emph{Core} can still contain properties
+which have no direct translation to hardware, such as polymorphic types and
+function-valued arguments. Such a description needs to be transformed to a
+\emph{normal form}, which only contains properties that have a direct
+translation. The second stage of the compiler, the \emph{normalization} phase,
+exhaustively applies a set of \emph{meaning-preserving} transformations on the
+\emph{Core} description until this description is in a \emph{normal form}.
+This set of transformations includes transformations typically found in
+reduction systems for lambda calculus~\cite{lambdacalculus}, such a
+$\beta$-reduction and $\eta$-expansion, but also includes self-defined
+transformations that are responsible for the reduction of higher-order
+functions to `regular' first-order functions.
+
+The final step in the compiler pipeline is the translation to a \VHDL\
+\emph{netlist}, which is a straightforward process due to resemblance of a
+normalized description and a set of concurrent signal assignments. We call the
+end-product of the \CLaSH\ compiler a \VHDL\ \emph{netlist} as the resulting
+\VHDL\ resembles an actual netlist description and not idiomatic \VHDL.
+
+\section{Use cases}
+\label{sec:usecases}
+\subsection{FIR Filter}
+As an example of a common hardware design where the use of higher-order
+functions leads to a very natural description is a \acro{FIR} filter, which is
+basically the dot-product of two vectors:
+
+\begin{equation}
+y_t = \sum\nolimits_{i = 0}^{n - 1} {x_{t - i} \cdot h_i }
+\end{equation}
+
+A \acro{FIR} filter multiplies fixed constants ($h$) with the current
+and a few previous input samples ($x$). Each of these multiplications
+are summed, to produce the result at time $t$. The equation of a \acro{FIR}
+filter is indeed equivalent to the equation of the dot-product, which is
+shown below:
+
+\begin{equation}
+\mathbf{a}\bullet\mathbf{b} = \sum\nolimits_{i = 0}^{n - 1} {a_i \cdot b_i }
+\end{equation}
+
+We can easily and directly implement the equation for the dot-product
+using higher-order functions:
+
+\begin{code}
+as *+* bs = foldl1 (+) (zipWith (*) as bs)
+\end{code}
+
+The \hs{zipWith} function is very similar to the \hs{map} function seen
+earlier: It takes a function, two vectors, and then applies the function to
+each of the elements in the two vectors pairwise (\emph{e.g.}, \hs{zipWith (*)
+[1, 2] [3, 4]} becomes \hs{[1 * 3, 2 * 4]}).
+
+The \hs{foldl1} function takes a binary function, a single vector, and applies
+the function to the first two elements of the vector. It then applies the
+function to the result of the first application and the next element in the
+vector. This continues until the end of the vector is reached. The result of
+the \hs{foldl1} function is the result of the last application. It is obvious
+that the \hs{zipWith (*)} function is pairwise multiplication and that the
+\hs{foldl1 (+)} function is summation.
+
+Returning to the actual \acro{FIR} filter, we will slightly change the
+equation describing it, so as to make the translation to code more obvious and
+concise. What we do is change the definition of the vector of input samples
+and delay the computation by one sample. Instead of having the input sample
+received at time $t$ stored in $x_t$, $x_0$ now always stores the newest
+sample, and $x_i$ stores the $ith$ previous sample. This changes the equation
+to the following (note that this is completely equivalent to the original
+equation, just with a different definition of $x$ that will better suit the
+transformation to code):
+
+\begin{equation}
+y_t = \sum\nolimits_{i = 0}^{n - 1} {x_i \cdot h_i }
+\end{equation}
+
+The complete definition of the \acro{FIR} filter in code then becomes:
+
+\begin{code}
+fir (State (xs,hs)) x = (State (x >> xs,hs), xs *+* hs)
+\end{code}
+
+Where the vector \hs{hs} contains the \acro{FIR} coefficients and the vector
+\hs{xs} contains the previous input sample in front and older samples behind.
+The code for the shift (\hs{>>}) operator, that adds the new input sample
+(\hs{x}) to the list of previous input samples (\hs{xs}) and removes the
+oldest sample, is shown below:
+
+\begin{code}
+x >> xs = x +> init xs
+\end{code}
+
+The \hs{init} function returns all but the last element of a vector, and the
+concatenate operator (\hs{+>}) adds a new element to the front of a vector.
+The resulting netlist of a 4-taps \acro{FIR} filter, created by specializing
+the vectors of the \acro{FIR} code to a length of 4, is depicted in
+\Cref{img:4tapfir}.
+
+\begin{figure}
+\centerline{\includegraphics{4tapfir.svg}}
+\caption{4-taps \acrotiny{FIR} Filter}
+\label{img:4tapfir}
+\end{figure}
+
+\subsection{Higher order CPU}
+
+\begin{code}
+fu op inputs (addr1, addr2) = regIn
+ where
+ in1 = inputs!addr1
+ in2 = inputs!addr2
+ regIn = op in1 in2
+\end{code}
+
+\begin{code}
+type CpuState = State [Word | 4]
+
+cpu :: CpuState -> Word -> [(Index 6, Index 6) | 4]
+ -> (CpuState, Word)
+cpu (State regsOut) input addrs = (State regsIn, out)
+ where
+ regsIn = [ fu const inputs (addrs!0)
+ , fu (+) inputs (addrs!1)
+ , fu (-) inputs (addrs!2)
+ , fu (*) inputs (addrs!3)
+ ]
+ inputs = 0 +> (1 +> (input +> regsOut))
+ out = head regsOut
+\end{code}