+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:runmacs}
+ \end{example}
+ \end{minipage}
+
+ \begin{figure}
+ \centerline{\includegraphics{mac-state.svg}}
+ \caption{Stateful Multiply-Accumulate}
+ \label{img:mac-state}
+ \vspace{-1.5em}
+ \end{figure}
+
+ The complete simulation can be compiled to an executable binary by a
+ Haskell compiler, or executed in a Haskell interpreter. Both
+ simulation paths require less effort from a circuit designer than first
+ translating the description to \VHDL\ and then running a \VHDL\
+ simulation; it is also very likely that both simulation paths are much
+ faster.
+
+\section{The \CLaSH\ compiler}
+\label{sec:compiler}
+The prototype \CLaSH\ compiler translates descriptions made in the \CLaSH\
+language as described in the previous section to synthesizable \VHDL.
+% , allowing a designer to actually run a \CLaSH\ design on an \acro{FPGA}.
+
+The Glasgow Haskell Compiler (\GHC)~\cite{ghc} is an open source Haskell
+compiler that also provides a high level \acro{API} to most of its internals.
+Furthermore, it provides several parts of the prototype compiler for free,
+such as the parser, the semantics checker, and the type checker. These parts
+together form the front-end of the prototype compiler pipeline, as seen in
+\Cref{img:compilerpipeline}.
+
+\begin{figure}
+\vspace{1em}
+\centerline{\includegraphics{compilerpipeline.svg}}
+\caption{\CLaSHtiny\ compiler pipeline}
+\label{img:compilerpipeline}
+\vspace{-1.5em}
+\end{figure}
+
+The output of the \GHC\ front-end consists of the translation of the original
+Haskell description to \emph{Core}~\cite{Sulzmann2007}, which is a small
+typed functional language. This \emph{Core} language is relatively easy to
+process compared to the larger Haskell language. A description in \emph{Core}
+can still contain elements which have no direct translation to hardware, such
+as polymorphic types and function-valued arguments. Such a description needs
+to be transformed to a \emph{normal form}, which corresponds directly to
+hardware. The second stage of the compiler, the \emph{normalization} phase,
+exhaustively applies a set of \emph{meaning-preserving} transformations on the
+\emph{Core} description until this description is in a \emph{normal form}.
+This set of transformations includes transformations typically found in
+reduction systems and lambda calculus~\cite{lambdacalculus}, such as
+$\beta$-reduction and $\eta$-expansion. It also includes transformations that
+are responsible for the specialization of higher-order functions to `regular'
+first-order functions, and specializing polymorphic types to concrete types.
+
+The final step in the compiler pipeline is the translation to a \VHDL\
+\emph{netlist}, which is a straightforward process due to the resemblance of a
+normalized description and a set of concurrent signal assignments. The
+end-product of the \CLaSH\ compiler is called a \VHDL\ \emph{netlist} as the
+result resembles an actual netlist description, and the fact that it is \VHDL\
+is only an implementation detail; e.g., the output could have been Verilog or
+even \acro{EDIF}.
+
+\section{Use cases}
+\label{sec:usecases}
+\subsection{FIR Filter}
+As an example of a common hardware design where the relation between
+functional languages and mathematical functions, combined with the use of
+higher-order functions leads to a very natural description is a \acro{FIR}
+filter:
+
+\begin{equation}
+y_t = \sum\nolimits_{i = 0}^{n - 1} {x_{t - i} \cdot h_i }
+\end{equation}
+
+A \acro{FIR} filter multiplies fixed constants ($h$) with the current
+and a few previous input samples ($x$). Each of these multiplications
+are summed, to produce the result at time $t$. The equation of a \acro{FIR}
+filter is equivalent to the equation of the dot-product of two vectors, which
+is shown below:
+
+\begin{equation}
+\mathbf{a}\bullet\mathbf{b} = \sum\nolimits_{i = 0}^{n - 1} {a_i \cdot b_i }
+\end{equation}
+
+The equation for the dot-product is easily and directly implemented using
+higher-order functions:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+as *+* bs = fold (+) (zip{-"\!\!\!"-}With (*) as bs)
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:dotproduct}
+ \end{example}
+\end{minipage}
+
+The \hs{zip{-"\!\!\!"-}With} function is very similar to the \hs{map} function
+seen earlier: It takes a function, two vectors, and then applies the function
+to each of the elements in the two vectors pairwise (\emph{e.g.},
+\hs{zip{-"\!\!\!"-}With (*) [1, 2] [3, 4]} becomes \hs{[1 * 3, 2 * 4]}).
+
+The \hs{fold} function takes a binary function, a single vector, and applies
+the function to the first two elements of the vector. It then applies the
+function to the result of the first application and the next element in the
+vector. This continues until the end of the vector is reached. The result of
+the \hs{fold} function is the result of the last application. It is obvious
+that the \hs{zip{-"\!\!\!\!"-}With (*)} function is pairwise multiplication
+and that the \hs{fold (+)} function is summation.
+% Returning to the actual \acro{FIR} filter, we will slightly change the
+% equation describing it, so as to make the translation to code more obvious and
+% concise. What we do is change the definition of the vector of input samples
+% and delay the computation by one sample. Instead of having the input sample
+% received at time $t$ stored in $x_t$, $x_0$ now always stores the newest
+% sample, and $x_i$ stores the $ith$ previous sample. This changes the equation
+% to the following (note that this is completely equivalent to the original
+% equation, just with a different definition of $x$ that will better suit the
+% transformation to code):
+%
+% \begin{equation}
+% y_t = \sum\nolimits_{i = 0}^{n - 1} {x_i \cdot h_i }
+% \end{equation}
+The complete definition of the \acro{FIR} filter in \CLaSH\ is:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+fir (State (xs,hs)) x =
+ (State (shiftInto x xs,hs), (x +> xs) *+* hs)
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:fir}
+ \end{example}
+\end{minipage}
+
+where the vector \hs{xs} contains the previous input samples, the vector
+\hs{hs} contains the \acro{FIR} coefficients, and \hs{x} is the current input
+sample. The concatenate operator (\hs{+>}) creates a new vector by placing the
+current sample (\hs{x}) in front of the previous samples vector (\hs{xs}). The
+code for the \hs{shiftInto} function, that adds the new input sample (\hs{x})
+to the list of previous input samples (\hs{xs}) and removes the oldest sample,
+is shown below:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+shiftInto x xs = x +> init xs
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:shiftinto}
+ \end{example}
+\end{minipage}
+
+where the \hs{init} function returns all but the last element of a vector.
+The resulting netlist of a 4-taps \acro{FIR} filter, created by specializing
+the vectors of the \acro{FIR} code to a length of 4, is depicted in
+\Cref{img:4tapfir}.
+
+\begin{figure}
+\centerline{\includegraphics{4tapfir.svg}}
+\caption{4-taps \acrotiny{FIR} Filter}
+\label{img:4tapfir}
+\vspace{-1.5em}
+\end{figure}
+
+\subsection{Higher-order CPU}
+%format fun x = "\textit{fu}_" x
+This section discusses a somewhat more elaborate example in which user-defined
+higher-order function, partial application, lambda expressions, and pattern
+matching are exploited. The example concerns a \acro{CPU} which consists of
+four function units, \hs{fun 0,{-"\ldots"-},fun 3}, (see
+\Cref{img:highordcpu}) that each perform some binary operation.
+
+\begin{figure}
+\centerline{\includegraphics{highordcpu.svg}}
+\caption{CPU with higher-order Function Units}
+\label{img:highordcpu}
+\vspace{-1.5em}
+\end{figure}
+
+Every function unit has seven data inputs (of type \hs{Signed 16}), and two
+address inputs (of type \hs{Index 6}) that indicate which data inputs have to
+be chosen as arguments for the binary operation that the unit performs.
+These data inputs consist of one external input \hs{x}, two fixed
+initialization values (0 and 1), and the previous outputs of the four function
+units. The output of the \acro{CPU} as a whole is the previous output of
+\hs{fun 3}.
+
+The function units \hs{fun 1}, \hs{fun 2}, and \hs{fun 3} can perform a fixed
+binary operation, whereas \hs{fun 0} has an additional input for an opcode to
+choose a binary operation out of a few possibilities. Each function unit
+outputs its result into a register, i.e., the state of the \acro{CPU}. This
+state can e.g. be defined as follows:
+
+\begin{code}
+type CpuState = State [Signed 16 | 4]
+\end{code}