+ \subsection{Polymorphism}
+ A powerful feature of most (functional) programming languages is
+ polymorphism, it allows a function to handle values of different data
+ types in a uniform way. Haskell supports \emph{parametric
+ polymorphism}~\cite{polymorphism}, meaning functions can be written
+ without mention of any specific type and can be used transparently with
+ any number of new types.
+
+ As an example of a parametric polymorphic function, consider the type of
+ the following \hs{append} function, which appends an element to a vector:
+
+ \begin{code}
+ append :: [a|n] -> a -> [a|n + 1]
+ \end{code}
+
+ This type is parameterized by \hs{a}, which can contain any type at
+ all. This means that \hs{append} can append an element to a vector,
+ regardless of the type of the elements in the list (as long as the type of
+ the value to be added is of the same type as the values in the vector).
+ This kind of polymorphism is extremely useful in hardware designs to make
+ operations work on a vector without knowing exactly what elements are
+ inside, routing signals without knowing exactly what kinds of signals
+ these are, or working with a vector without knowing exactly how long it
+ is. Polymorphism also plays an important role in most higher order
+ functions, as we will see in the next section.
+
+ Another type of polymorphism is \emph{ad-hoc
+ polymorphism}~\cite{polymorphism}, which refers to polymorphic
+ functions which can be applied to arguments of different types, but which
+ behave differently depending on the type of the argument to which they are
+ applied. In Haskell, ad-hoc polymorphism is achieved through the use of
+ type classes, where a class definition provides the general interface of a
+ function, and class instances define the functionality for the specific
+ types. An example of such a type class is the \hs{Num} class, which
+ contains all of Haskell's numerical operations. A designer can make use
+ of this ad-hoc polymorphism by adding a constraint to a parametrically
+ polymorphic type variable. Such a constraint indicates that the type
+ variable can only be instantiated to a type whose members supports the
+ overloaded functions associated with the type class.
+
+ As an example we will take a look at type signature of the function
+ \hs{sum}, which sums the values in a vector:
+ \begin{code}
+ sum :: Num a => [a|n] -> a
+ \end{code}
+
+ This type is again parameterized by \hs{a}, but it can only contain
+ types that are \emph{instances} of the \emph{type class} \hs{Num}, so that
+ we know that the addition (+) operator is defined for that type.
+ \CLaSH's built-in numerical types are also instances of the \hs{Num}
+ class, so we can use the addition operator on \hs{SizedWords} as
+ well as on \hs{SizedInts}.
+
+ In \CLaSH, parametric polymorphism is completely supported. Any function
+ defined can have any number of unconstrained type parameters. The \CLaSH\
+ compiler will infer the type of every such argument depending on how the
+ function is applied. There is however one constraint: the top level
+ function that is being translated can not have any polymorphic arguments.
+ The arguments can not be polymorphic as they are never applied and
+ consequently there is no way to determine the actual types for the type
+ parameters.
+
+ \CLaSH\ does not support user-defined type classes, but does use some
+ of the standard Haskell type classes for its built-in function, such as:
+ \hs{Num} for numerical operations, \hs{Eq} for the equality operators, and
+ \hs{Ord} for the comparison/order operators.
+
+ \subsection{Higher-order functions \& values}
+ Another powerful abstraction mechanism in functional languages, is
+ the concept of \emph{higher-order functions}, or \emph{functions as
+ a first class value}. This allows a function to be treated as a
+ value and be passed around, even as the argument of another
+ function. The following example should clarify this concept:
+
+ \begin{code}
+ negateVector xs = map not xs
+ \end{code}
+
+ The code above defines the \hs{negateVector} function, which takes a
+ vector of booleans, \hs{xs}, and returns a vector where all the values are
+ negated. It achieves this by calling the \hs{map} function, and passing it
+ \emph{another function}, boolean negation, and the vector of booleans,
+ \hs{xs}. The \hs{map} function applies the negation function to all the
+ elements in the vector.
+
+ The \hs{map} function is called a higher-order function, since it takes
+ another function as an argument. Also note that \hs{map} is again a
+ parametric polymorphic function: it does not pose any constraints on the
+ type of the vector elements, other than that it must be the same type as
+ the input type of the function passed to \hs{map}. The element type of the
+ resulting vector is equal to the return type of the function passed, which
+ need not necessarily be the same as the element type of the input vector.
+ All of these characteristics can readily be inferred from the type
+ signature belonging to \hs{map}:
+
+ \begin{code}
+ map :: (a -> b) -> [a|n] -> [b|n]
+ \end{code}
+
+ So far, only functions have been used as higher-order values. In
+ Haskell, there are two more ways to obtain a function-typed value:
+ partial application and lambda abstraction. Partial application
+ means that a function that takes multiple arguments can be applied
+ to a single argument, and the result will again be a function (but
+ that takes one argument less). As an example, consider the following
+ expression, that adds one to every element of a vector:
+
+ \begin{code}
+ map (+ 1) xs
+ \end{code}
+
+ Here, the expression \hs{(+) 1} is the partial application of the
+ plus operator to the value \hs{1}, which is again a function that
+ adds one to its argument. A lambda expression allows one to introduce an
+ anonymous function in any expression. Consider the following expression,
+ which again adds one to every element of a vector:
+
+ \begin{code}
+ map (\x -> x + 1) xs
+ \end{code}
+
+ Finally, higher order arguments are not limited to just built-in
+ functions, but any function defined by a developer can have function
+ arguments. This allows the hardware designer to use a powerful
+ abstraction mechanism in his designs and have an optimal amount of
+ code reuse. The only exception is again the top-level function: if a
+ function-typed argument is not applied with an actual function, no
+ hardware can be generated.
+
+ % \comment{TODO: Describe ALU example (no code)}
+
+ \subsection{State}
+ A very important concept in hardware is the concept of state. In a
+ stateful design, the outputs depend on the history of the inputs, or the
+ state. State is usually stored in registers, which retain their value
+ during a clock cycle. As we want to describe more than simple
+ combinatorial designs, \CLaSH\ needs an abstraction mechanism for state.
+
+ An important property in Haskell, and in most other functional languages,
+ is \emph{purity}. A function is said to be \emph{pure} if it satisfies two
+ conditions:
+ \begin{inparaenum}
+ \item given the same arguments twice, it should return the same value in
+ both cases, and
+ \item when the function is called, it should not have observable
+ side-effects.
+ \end{inparaenum}
+ % This purity property is important for functional languages, since it
+ % enables all kinds of mathematical reasoning that could not be guaranteed
+ % correct for impure functions.
+ Pure functions are as such a perfect match for combinatorial circuits,
+ where the output solely depends on the inputs. When a circuit has state
+ however, it can no longer be simply described by a pure function.
+ % Simply removing the purity property is not a valid option, as the
+ % language would then lose many of it mathematical properties.
+ In \CLaSH\ we deal with the concept of state in pure functions by making
+ current value of the state an additional argument of the function and the
+ updated state part of result. In this sense the descriptions made in
+ \CLaSH\ are the combinatorial parts of a mealy machine.
+
+ A simple example is adding an accumulator register to the earlier
+ multiply-accumulate circuit, of which the resulting netlist can be seen in
+ \Cref{img:mac-state}:
+
+ \begin{code}
+ macS (State c) a b = (State c', c')
+ where
+ c' = mac a b c
+ \end{code}
+
+ \begin{figure}
+ \centerline{\includegraphics{mac-state.svg}}
+ \caption{Stateful Multiply-Accumulate}
+ \label{img:mac-state}
+ \end{figure}
+
+ The \hs{State} keyword indicates which arguments are part of the current
+ state, and what part of the output is part of the updated state. This
+ aspect will also be reflected in the type signature of the function.
+ Abstracting the state of a circuit in this way makes it very explicit:
+ which variables are part of the state is completely determined by the
+ type signature. This approach to state is well suited to be used in
+ combination with the existing code and language features, such as all the
+ choice constructs, as state values are just normal values. We can simulate
+ stateful descriptions using the recursive \hs{run} function:
+
+ \begin{code}
+ run f s (i : inps) = o : (run f s' inps)
+ where
+ (s', o) = f s i
+ \end{code}
+
+ The \hs{(:)} operator is the list concatenation operator, where the
+ left-hand side is the head of a list and the right-hand side is the
+ remainder of the list. The \hs{run} function applies the function the
+ developer wants to simulate, \hs{f}, to the current state, \hs{s}, and the
+ first input value, \hs{i}. The result is the first output value, \hs{o},
+ and the updated state \hs{s'}. The next iteration of the \hs{run} function
+ is then called with the updated state, \hs{s'}, and the rest of the
+ inputs, \hs{inps}. It is assumed that there is one input per clock cycle.
+ Also note how the order of the input, output, and state in the \hs{run}
+ function corresponds with the order of the input, output and state of the
+ \hs{macS} function described earlier.
+
+ As both the \hs{run} function, the hardware description, and the test
+ inputs are plain Haskell, the complete simulation can be compiled to an
+ executable binary by an optimizing Haskell compiler, or executed in an
+ Haskell interpreter. Both simulation paths are much faster than first
+ translating the description to \VHDL\ and then running a \VHDL\
+ simulation, where the executable binary has an additional simulation speed
+ bonus in case there is a large set of test inputs.
+
+\section{\CLaSH\ compiler}
+
+The \CLaSH\ language as presented above can be translated to \VHDL\ using
+the prototype \CLaSH\ compiler. This compiler allows experimentation with
+the \CLaSH\ language and allows for running \CLaSH\ designs on actual FPGA
+hardware.
+
+\begin{figure}
+\centerline{\includegraphics{compilerpipeline.svg}}
+\caption{\CLaSHtiny\ compiler pipeline}
+\label{img:compilerpipeline}
+\end{figure}
+
+The prototype heavily uses \GHC, the Glasgow Haskell Compiler.
+\Cref{img:compilerpipeline} shows the \CLaSH\ compiler pipeline. As you can
+see, the front-end is completely reused from \GHC, which allows the \CLaSH\
+prototype to support most of the Haskell Language. The \GHC\ front-end
+produces the program in the \emph{Core}~\cite{Sulzmann2007} format, which is a very small,
+functional, typed language which is relatively easy to process.
+
+The second step in the compilation process is \emph{normalization}. This
+step runs a number of \emph{meaning preserving} transformations on the
+Core program, to bring it into a \emph{normal form}. This normal form
+has a number of restrictions that make the program similar to hardware.
+In particular, a program in normal form no longer has any polymorphism
+or higher order functions.
+
+The final step is a simple translation to \VHDL.
+
+\section{Use cases}
+\label{sec:usecases}
+As an example of a common hardware design where the use of higher-order
+functions leads to a very natural description is a FIR filter, which is
+basically the dot-product of two vectors:
+
+\begin{equation}
+y_t = \sum\nolimits_{i = 0}^{n - 1} {x_{t - i} \cdot h_i }
+\end{equation}
+
+A FIR filter multiplies fixed constants ($h$) with the current
+and a few previous input samples ($x$). Each of these multiplications
+are summed, to produce the result at time $t$. The equation of a FIR
+filter is indeed equivalent to the equation of the dot-product, which is
+shown below:
+
+\begin{equation}
+\mathbf{x}\bullet\mathbf{y} = \sum\nolimits_{i = 0}^{n - 1} {x_i \cdot y_i }
+\end{equation}
+
+We can easily and directly implement the equation for the dot-product
+using higher-order functions:
+
+\begin{code}
+xs *+* ys = foldl1 (+) (zipWith (*) xs hs)
+\end{code}
+
+The \hs{zipWith} function is very similar to the \hs{map} function seen
+earlier: It takes a function, two vectors, and then applies the function to
+each of the elements in the two vectors pairwise (\emph{e.g.}, \hs{zipWith (*)
+[1, 2] [3, 4]} becomes \hs{[1 * 3, 2 * 4]} $\equiv$ \hs{[3,8]}).
+
+The \hs{foldl1} function takes a function, a single vector, and applies
+the function to the first two elements of the vector. It then applies the
+function to the result of the first application and the next element from
+the vector. This continues until the end of the vector is reached. The
+result of the \hs{foldl1} function is the result of the last application.
+As you can see, the \hs{zipWith (*)} function is just pairwise
+multiplication and the \hs{foldl1 (+)} function is just summation.
+
+Returning to the actual FIR filter, we will slightly change the
+equation belong to it, so as to make the translation to code more obvious.
+What we will do is change the definition of the vector of input samples.
+So, instead of having the input sample received at time
+$t$ stored in $x_t$, $x_0$ now always stores the current sample, and $x_i$
+stores the $ith$ previous sample. This changes the equation to the
+following (Note that this is completely equivalent to the original
+equation, just with a different definition of $x$ that will better suit
+the transformation to code):
+
+\begin{equation}
+y_t = \sum\nolimits_{i = 0}^{n - 1} {x_i \cdot h_i }
+\end{equation}
+
+Consider that the vector \hs{hs} contains the FIR coefficients and the
+vector \hs{xs} contains the current input sample in front and older
+samples behind. The function that shifts the input samples is shown below:
+
+\begin{code}
+x >> xs = x +> tail xs
+\end{code}
+
+Where the \hs{tail} function returns all but the first element of a
+vector, and the concatenate operator ($\succ$) adds a new element to the
+left of a vector. The complete definition of the FIR filter then becomes:
+
+\begin{code}
+fir (State (xs,hs)) x = (State (x >> xs,hs), xs *+* hs)
+\end{code}
+
+The resulting netlist of a 4-taps FIR filter based on the above definition
+is depicted in \Cref{img:4tapfir}.
+
+\begin{figure}
+\centerline{\includegraphics{4tapfir.svg}}
+\caption{4-taps \acrotiny{FIR} Filter}
+\label{img:4tapfir}
+\end{figure}
+
+
+\subsection{Higher order CPU}
+
+
+\begin{code}
+type FuState = State Word
+fu :: (a -> a -> a)
+ -> [a]:n
+ -> (RangedWord n, RangedWord n)
+ -> FuState
+ -> (FuState, a)
+fu op inputs (addr1, addr2) (State out) =
+ (State out', out)
+ where
+ in1 = inputs!addr1
+ in2 = inputs!addr2
+ out' = op in1 in2
+\end{code}
+
+\begin{code}
+type CpuState = State [FuState]:4
+cpu :: Word
+ -> [(RangedWord 7, RangedWord 7)]:4
+ -> CpuState
+ -> (CpuState, Word)
+cpu input addrs (State fuss) =
+ (State fuss', out)
+ where
+ fures = [ fu const inputs!0 fuss!0
+ , fu (+) inputs!1 fuss!1
+ , fu (-) inputs!2 fuss!2
+ , fu (*) inputs!3 fuss!3
+ ]
+ (fuss', outputs) = unzip fures
+ inputs = 0 +> 1 +> input +> outputs
+ out = head outputs
+\end{code}