- Translation of two most basic functional concepts has been
- discussed: function application and choice. Before looking further
- into less obvious concepts like higher-order expressions and
- polymorphism, the possible types that can be used in hardware
- descriptions will be discussed.
-
- Some way is needed to translate every values used to its hardware
- equivalents. In particular, this means a hardware equivalent for
- every \emph{type} used in a hardware description is needed
-
- Since most functional languages have a lot of standard types that
- are hard to translate (integers without a fixed size, lists without
- a static length, etc.), a number of \quote{built-in} types will be
- defined first. These types are built-in in the sense that our
- compiler will have a fixed \VHDL\ type for these. User defined types,
- on the other hand, will have their hardware type derived directly
- from their Haskell declaration automatically, according to the rules
- sketched here.
-
- \subsection{Built-in types}
- The language currently supports the following built-in types. Of these,
- only the \hs{Bool} type is supported by Haskell out of the box (the
- others are defined by the \CLaSH\ package, so they are user-defined types
- from Haskell's point of view).
-
- \begin{description}
- \item[\hs{Bit}]
- This is the most basic type available. It is mapped directly onto
- the \texttt{std\_logic} \VHDL\ type. Mapping this to the
- \texttt{bit} type might make more sense (since the Haskell version
- only has two values), but using \texttt{std\_logic} is more standard
- (and allowed for some experimentation with don't care values)
-
- \item[\hs{Bool}]
- This is the only built-in Haskell type supported and is translated
- exactly like the Bit type (where a value of \hs{True} corresponds to a
- value of \hs{High}). Supporting the Bool type is particularly
- useful to support \hs{if ... then ... else ...} expressions, which
- always have a \hs{Bool} value for the condition.
-
- A \hs{Bool} is translated to a \texttt{std\_logic}, just like \hs{Bit}.
- \item[\hs{SizedWord}, \hs{SizedInt}]
- These are types to represent integers. A \hs{SizedWord} is unsigned,
- while a \hs{SizedInt} is signed. These types are parametrized by a
- length type, so you can define an unsigned word of 32 bits wide as
- ollows:
-
- \begin{verbatim}
- type Word32 = SizedWord D32
- \end{verbatim}
-
- Here, a type synonym \hs{Word32} is defined that is equal to the
- \hs{SizedWord} type constructor applied to the type \hs{D32}. \hs{D32}
- is the \emph{type level representation} of the decimal number 32,
- making the \hs{Word32} type a 32-bit unsigned word.
-
- These types are translated to the \VHDL\ \texttt{unsigned} and
- \texttt{signed} respectively.
- \item[\hs{Vector}]
- This is a vector type, that can contain elements of any other type and
- has a fixed length. It has two type parameters: its
- length and the type of the elements contained in it. By putting the
- length parameter in the type, the length of a vector can be determined
- at compile time, instead of only at run-time for conventional lists.
-
- The \hs{Vector} type constructor takes two type arguments: the length
- of the vector and the type of the elements contained in it. The state
- type of an 8 element register bank would then for example be:
-
- \begin{verbatim}
- type RegisterState = Vector D8 Word32
- \end{verbatim}
-
- Here, a type synonym \hs{RegisterState} is defined that is equal to
- the \hs{Vector} type constructor applied to the types \hs{D8} (The type
- level representation of the decimal number 8) and \hs{Word32} (The 32
- bit word type as defined above). In other words, the
- \hs{RegisterState} type is a vector of 8 32-bit words.
-
- A fixed size vector is translated to a \VHDL\ array type.
- \item[\hs{RangedWord}]
- This is another type to describe integers, but unlike the previous
- two it has no specific bit-width, but an upper bound. This means that
- its range is not limited to powers of two, but can be any number.
- A \hs{RangedWord} only has an upper bound, its lower bound is
- implicitly zero. There is a lot of added implementation complexity
- when adding a lower bound and having just an upper bound was enough
- for the primary purpose of this type: type-safely indexing vectors.
-
- To define an index for the 8 element vector above, we would do:
-
- \begin{verbatim}
- type RegisterIndex = RangedWord D7
- \end{verbatim}
-
- Here, a type synonym \hs{RegisterIndex} is defined that is equal to
- the \hs{RangedWord} type constructor applied to the type \hs{D7}. In
- other words, this defines an unsigned word with values from
- 0 to 7 (inclusive). This word can be be used to index the
- 8 element vector \hs{RegisterState} above.
-
- This type is translated to the \texttt{unsigned} \VHDL type.
- \end{description}
- \subsection{User-defined types}
- There are three ways to define new types in Haskell: algebraic
- data-types with the \hs{data} keyword, type synonyms with the \hs{type}
- keyword and type renamings with the \hs{newtype} keyword. \GHC\
- offers a few more advanced ways to introduce types (type families,
- existential typing, {\small{GADT}}s, etc.) which are not standard
- Haskell. These will be left outside the scope of this research.
-
- Only an algebraic datatype declaration actually introduces a
- completely new type, for which we provide the \VHDL\ translation
- below. Type synonyms and renamings only define new names for
- existing types (where synonyms are completely interchangeable and
- renamings need explicit conversion). Therefore, these do not need
- any particular \VHDL\ translation, a synonym or renamed type will
- just use the same representation as the original type. The
- distinction between a renaming and a synonym does no longer matter
- in hardware and can be disregarded in the generated \VHDL.
-
- For algebraic types, we can make the following distinction:
-
- \begin{description}
-
- \item[Product types]
- A product type is an algebraic datatype with a single constructor with
- two or more fields, denoted in practice like (a,b), (a,b,c), etc. This
- is essentially a way to pack a few values together in a record-like
- structure. In fact, the built-in tuple types are just algebraic product
- types (and are thus supported in exactly the same way).
-
- The \quote{product} in its name refers to the collection of values
- belonging to this type. The collection for a product type is the
- Cartesian product of the collections for the types of its fields.
-
- These types are translated to \VHDL\ record types, with one field for
- every field in the constructor. This translation applies to all single
- constructor algebraic data-types, including those with just one
- field (which are technically not a product, but generate a VHDL
- record for implementation simplicity).
- \item[Enumerated types]
- An enumerated type is an algebraic datatype with multiple constructors, but
- none of them have fields. This is essentially a way to get an
- enumeration-like type containing alternatives.
-
- Note that Haskell's \hs{Bool} type is also defined as an
- enumeration type, but we have a fixed translation for that.
-
- These types are translated to \VHDL\ enumerations, with one value for
- each constructor. This allows references to these constructors to be
- translated to the corresponding enumeration value.
- \item[Sum types]
- A sum type is an algebraic datatype with multiple constructors, where
- the constructors have one or more fields. Technically, a type with
- more than one field per constructor is a sum of products type, but
- for our purposes this distinction does not really make a
- difference, so this distinction is note made.
-
- The \quote{sum} in its name refers again to the collection of values
- belonging to this type. The collection for a sum type is the
- union of the the collections for each of the constructors.
-
- Sum types are currently not supported by the prototype, since there is
- no obvious \VHDL\ alternative. They can easily be emulated, however, as
- we will see from an example:
-
- \begin{verbatim}
- data Sum = A Bit Word | B Word
- \end{verbatim}
-
- An obvious way to translate this would be to create an enumeration to
- distinguish the constructors and then create a big record that
- contains all the fields of all the constructors. This is the same
- translation that would result from the following enumeration and
- product type (using a tuple for clarity):
-
- \begin{verbatim}
- data SumC = A | B
- type Sum = (SumC, Bit, Word, Word)
- \end{verbatim}
-
- Here, the \hs{SumC} type effectively signals which of the latter three
- fields of the \hs{Sum} type are valid (the first two if \hs{A}, the
- last one if \hs{B}), all the other ones have no useful value.
-
- An obvious problem with this naive approach is the space usage: the
- example above generates a fairly big \VHDL\ type. Since we can be
- sure that the two \hs{Word}s in the \hs{Sum} type will never be valid
- at the same time, this is a waste of space.
-
- Obviously, duplication detection could be used to reuse a
- particular field for another constructor, but this would only
- partially solve the problem. If two fields would be, for
- example, an array of 8 bits and an 8 bit unsigned word, these are
- different types and could not be shared. However, in the final
- hardware, both of these types would simply be 8 bit connections,
- so we have a 100\% size increase by not sharing these.
- \end{description}
-
-
-\section{\CLaSH\ prototype}
-
-foo\par bar
+ Haskell is a statically-typed language, meaning that the type of a
+ variable or function is determined at compile-time. Not all of
+ Haskell's typing constructs have a clear translation to hardware,
+ therefore this section only deals with the types that do have a clear
+ correspondence to hardware. The translatable types are divided into two
+ categories: \emph{built-in} types and \emph{user-defined} types. Built-in
+ types are those types for which a fixed translation is defined within the
+ \CLaSH\ compiler. The \CLaSH\ compiler has generic translation rules to
+ translate the user-defined types, which are described later on.
+
+ Type annotations (entities in \VHDL) are optional, since the \CLaSH\
+ compiler can derive them when the top-level function \emph{is} annotated
+ with its type.
+
+ % Translation of two most basic functional concepts has been
+ % discussed: function application and choice. Before looking further
+ % into less obvious concepts like higher-order expressions and
+ % polymorphism, the possible types that can be used in hardware
+ % descriptions will be discussed.
+ %
+ % Some way is needed to translate every value used to its hardware
+ % equivalents. In particular, this means a hardware equivalent for
+ % every \emph{type} used in a hardware description is needed.
+ %
+ % The following types are \emph{built-in}, meaning that their hardware
+ % translation is fixed into the \CLaSH\ compiler. A designer can also
+ % define his own types, which will be translated into hardware types
+ % using translation rules that are discussed later on.
+
+ \subsubsection{Built-in types}
+ The following types have fixed translations defined within the \CLaSH\
+ compiler:
+ \begin{xlist}
+ \item[\bf{Bit}]
+ the most basic type available. It can have two values:
+ \hs{Low} or \hs{High}.
+ % It is mapped directly onto the \texttt{std\_logic} \VHDL\ type.
+ \item[\bf{Bool}]
+ this is a basic logic type. It can have two values: \hs{True}
+ or \hs{False}.
+ % It is translated to \texttt{std\_logic} exactly like the \hs{Bit}
+ % type (where a value of \hs{True} corresponds to a value of
+ % \hs{High}).
+ Supporting the Bool type is required in order to support the
+ \hs{if-then-else} expression.
+ \item[\bf{Signed}, \bf{Unsigned}]
+ these are types to represent integers, and both are parametrizable in
+ their size. The overflow behavior of the numeric operators defined for
+ these types is \emph{wrap-around}.
+ % , so you can define an unsigned word of 32 bits wide as follows:
+
+ % \begin{code}
+ % type Word32 = SizedWord D32
+ % \end{code}
+
+ % Here, a type synonym \hs{Word32} is defined that is equal to the
+ % \hs{SizedWord} type constructor applied to the type \hs{D32}.
+ % \hs{D32} is the \emph{type level representation} of the decimal
+ % number 32, making the \hs{Word32} type a 32-bit unsigned word. These
+ % types are translated to the \VHDL\ \texttt{unsigned} and
+ % \texttt{signed} respectively.
+ \item[\bf{Vector}]
+ this type can contain elements of any type and has a static length.
+ The \hs{Vector} type constructor takes two arguments: the length of
+ the vector and the type of the elements contained in it. The
+ short-hand notation used for the vector type in the rest of paper is:
+ \hs{[a|n]}, where \hs{a} is the element type, and \hs{n} is the length
+ of the vector.
+ % Note that this is a notation used in this paper only, vectors are
+ % slightly more verbose in real \CLaSH\ descriptions.
+ % The state type of an 8 element register bank would then for example
+ % be:
+
+ % \begin{code}
+ % type RegisterState = Vector D8 Word32
+ % \end{code}
+
+ % Here, a type synonym \hs{RegisterState} is defined that is equal to
+ % the \hs{Vector} type constructor applied to the types \hs{D8} (The
+ % type level representation of the decimal number 8) and \hs{Word32}
+ % (The 32 bit word type as defined above). In other words, the
+ % \hs{RegisterState} type is a vector of 8 32-bit words. A fixed size
+ % vector is translated to a \VHDL\ array type.
+ \item[\bf{Index}]
+ the main purpose of the \hs{Index} type is to be used as an index into
+ a \hs{Vector}, and has an integer range from zero to a specified upper
+ bound.
+ % This means that its range is not limited to powers of two, but
+ % can be any number.
+ If a value of this type exceeds either bounds, an error will be thrown
+ during simulation.
+
+ % \comment{TODO: Perhaps remove this example?} To define an index for
+ % the 8 element vector above, we would do:
+
+ % \begin{code}
+ % type RegisterIndex = RangedWord D7
+ % \end{code}
+
+ % Here, a type synonym \hs{RegisterIndex} is defined that is equal to
+ % the \hs{RangedWord} type constructor applied to the type \hs{D7}. In
+ % other words, this defines an unsigned word with values from
+ % 0 to 7 (inclusive). This word can be be used to index the
+ % 8 element vector \hs{RegisterState} above. This type is translated
+ % to the \texttt{unsigned} \VHDL type.
+ \end{xlist}
+
+ \subsubsection{User-defined types}
+ % There are three ways to define new types in Haskell: algebraic
+ % data-types with the \hs{data} keyword, type synonyms with the \hs{type}
+ % keyword and datatype renaming constructs with the \hs{newtype} keyword.
+ % \GHC\ offers a few more advanced ways to introduce types (type families,
+ % existential typing, {\acro{GADT}}s, etc.) which are not standard
+ % Haskell. As it is currently unclear how these advanced type constructs
+ % correspond to hardware, they are for now unsupported by the \CLaSH\
+ % compiler.
+ A designer may define a completely new type by an algebraic datatype
+ declaration using the \hs{data} keyword. Type synonyms can be introduced
+ using the \hs{type} keyword.
+ % Only an algebraic datatype declaration actually introduces a
+ % completely new type. Type synonyms and type renaming only define new
+ % names for existing types, where synonyms are completely interchangeable
+ % and a type renaming requires an explicit conversion.
+ Type synonyms do not need any particular translation, as a synonym will
+ use the same representation as the original type.
+
+ Algebraic datatypes can be categorized as follows:
+ \begin{xlist}
+ \item[\bf{Single constructor}]
+ datatypes with a single constructor with one or more fields allow
+ values to be packed together in a record-like structure. Haskell's
+ built-in tuple types are also defined as single constructor algebraic
+ types (using some syntactic sugar). An example of a single constructor
+ type with multiple fields is the following pair of integers:
+ \begin{code}
+ data IntPair = IntPair Int Int
+ \end{code}
+ % These types are translated to \VHDL\ record types, with one field
+ % for every field in the constructor.
+ \item[\bf{Multiple constructors, No fields}]
+ datatypes with multiple constructors, but without any fields are
+ enumeration types.
+ % Note that Haskell's \hs{Bool} type is also defined as an enumeration
+ % type, but that there is a fixed translation for that type within the
+ % \CLaSH\ compiler.
+ An example of an enumeration type definition is:
+ \begin{code}
+ data TrafficLight = Red | Orange | Green
+ \end{code}
+ % These types are translated to \VHDL\ enumerations, with one
+ % value for each constructor. This allows references to these
+ % constructors to be translated to the corresponding enumeration
+ % value.
+ \item[\bf{Multiple constructors with fields}]
+ datatypes with multiple constructors, where at least
+ one of these constructors has one or more fields are currently not
+ supported. Additional research is required to optimize the overlap of
+ fields belonging to the different constructors.
+ \end{xlist}
+
+ \subsection{Polymorphism}\label{sec:polymorhpism}
+ A powerful feature of some programming languages is polymorphism, it
+ allows a function to handle values of different data types in a uniform
+ way. Haskell supports \emph{parametric polymorphism}, meaning that
+ functions can be written without mentioning specific types, and that those
+ functions can be used for arbitrary types.
+
+ As an example of a parametric polymorphic function, consider the type of
+ the \hs{first} function, which returns the first element of a
+ tuple:\footnote{The \hs{::} operator is used to annotate a function
+ with its type.}
+
+ \begin{code}
+ first :: (a,b) -> a
+ \end{code}
+
+ This type is parameterized in \hs{a} and \hs{b}, which can both
+ represent any type that is supported by the \CLaSH\ compiler. This means
+ that \hs{first} works for any tuple, regardless of what elements it
+ contains. This kind of polymorphism is extremely useful in hardware
+ designs, for example when routing signals without knowing their exact
+ type, or specifying vector operations that work on vectors of any length
+ and element type. Polymorphism also plays an important role in most higher
+ order functions, as will be shown in the next section.
+
+ % Another type of polymorphism is \emph{ad-hoc
+ % polymorphism}~\cite{polymorphism}, which refers to polymorphic
+ % functions which can be applied to arguments of different types, but
+ % which behave differently depending on the type of the argument to which
+ % they are applied. In Haskell, ad-hoc polymorphism is achieved through
+ % the use of \emph{type classes}, where a class definition provides the
+ % general interface of a function, and class \emph{instances} define the
+ % functionality for the specific types. An example of such a type class is
+ % the \hs{Num} class, which contains all of Haskell's numerical
+ % operations. A designer can make use of this ad-hoc polymorphism by
+ % adding a \emph{constraint} to a parametrically polymorphic type
+ % variable. Such a constraint indicates that the type variable can only be
+ % instantiated to a type whose members supports the overloaded functions
+ % associated with the type class.
+
+ Another type of polymorphism is \emph{ad-hoc polymorphism}, which refers
+ to functions that can be applied to arguments of a limited set to types.
+ Furthermore, how such functions work may depend on the type of their
+ arguments. For instance, multiplication only works for numeric types, and
+ it works differently for e.g. integers and complex numbers.
+
+ In Haskell, ad-hoc polymorphism is achieved through the use of \emph{type
+ classes}, where a class definition provides the general interface of a
+ function, and class \emph{instances} define the functionality for the
+ specific types. For example, all numeric operators are gathered in the
+ \hs{Num} class, so every type that wants to use those operators must be
+ made an instance of \hs{Num}.
+
+ By prefixing a type signature with class constraints, the constrained type
+ parameters are forced to belong to that type class. For example, the
+ arguments of the \hs{add} function must belong to the \hs{Num} type class
+ because the \hs{add} function adds them with the (\hs{+}) operator:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ add :: Num a => a -> a -> a
+ add a b = a + b
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:add}
+ \end{example}
+ \end{minipage}
+
+ % An example of a type signature that includes such a constraint if the
+ % signature of the \hs{sum} function, which sums the values in a vector:
+ % \begin{code}
+ % sum :: Num a => [a|n] -> a
+ % \end{code}
+ %
+ % This type is again parameterized by \hs{a}, but it can only contain
+ % types that are \emph{instances} of the \emph{type class} \hs{Num}, so
+ % that the compiler knows that the addition (+) operator is defined for
+ % that type.
+
+ % A place where class constraints also play a role is in the size and
+ % range parameters of the \hs{Vector} and numeric types. The reason being
+ % that these parameters have to be limited to types that can represent
+ % \emph{natural} numbers. The complete type of for example the \hs{Vector}
+ % type is:
+ % \begin{code}
+ % Natural n => Vector n a
+ % \end{code}
+
+ % \CLaSH's built-in numerical types are also instances of the \hs{Num}
+ % class.
+ % so we can use the addition operator (and thus the \hs{sum}
+ % function) with \hs{Signed} as well as with \hs{Unsigned}.
+
+ \CLaSH\ supports both parametric polymorphism and ad-hoc polymorphism. A
+ circuit designer can specify his own type classes and corresponding
+ instances. The \CLaSH\ compiler will infer the type of every polymorphic
+ argument depending on how the function is applied. There is however one
+ constraint: the top level function that is being translated can not have
+ polymorphic arguments. The arguments of the top-level can not be
+ polymorphic as there is no way to infer the \emph{specific} types of the
+ arguments.
+
+ With regard to the built-in types, it should be noted that members of
+ some of the standard Haskell type classes are supported as built-in
+ functions. These include: the numerial operators of \hs{Num}, the equality
+ operators of \hs{Eq}, and the comparison (order) operators of \hs{Ord}.
+
+ \subsection{Higher-order functions \& values}
+ Another powerful abstraction mechanism in functional languages, is
+ the concept of \emph{functions as a first class value} and
+ \emph{higher-order functions}. These concepts allows a function to be
+ treated as a value and be passed around, even as the argument of another
+ function. The following example clarifies this concept:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ %format not = "\mathit{not}"
+ \begin{code}
+ negate{-"\!\!\!"-}Vector xs = map not xs
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:negatevector}
+ \end{example}
+ \end{minipage}
+
+ The code above defines the \hs{negate{-"\!\!\!"-}Vector} function, which
+ takes a vector of booleans, \hs{xs}, and returns a vector where all the
+ values are negated. It achieves this by calling the \hs{map} function, and
+ passing it another \emph{function}, boolean negation, and the vector of
+ booleans, \hs{xs}. The \hs{map} function applies the negation function to
+ all the elements in the vector.
+
+ The \hs{map} function is called a higher-order function, since it takes
+ another function as an argument. Also note that \hs{map} is again a
+ parametric polymorphic function: it does not pose any constraints on the
+ type of the input vector, other than that its elements must have the same
+ type as the first argument of the function passed to \hs{map}. The element
+ type of the resulting vector is equal to the return type of the function
+ passed, which need not necessarily be the same as the element type of the
+ input vector. All of these characteristics can be inferred from the type
+ signature of \hs{map}:
+
+ \begin{code}
+ map :: (a -> b) -> [a|n] -> [b|n]
+ \end{code}
+
+ In Haskell, there are two more ways to obtain a function-typed value:
+ partial application and lambda abstraction. Partial application means that
+ a function that takes multiple arguments can be applied to a single
+ argument, and the result will again be a function, but takes one argument
+ less. As an example, consider the following expression, that adds one to
+ every element of a vector:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ map (add 1) xs
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:partialapplication}
+ \end{example}
+ \end{minipage}
+
+ Here, the expression \hs{(add 1)} is the partial application of the
+ addition function to the value \hs{1}, which is again a function that
+ adds 1 to its (next) argument.
+
+ A lambda expression allows a designer to introduce an anonymous function
+ in any expression. Consider the following expression, which again adds 1
+ to every element of a vector:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ map (\x -> x + 1) xs
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:lambdaexpression}
+ \end{example}
+ \end{minipage}
+
+ Finally, not only built-in functions can have higher-order arguments (such
+ as the \hs{map} function), but any function defined in \CLaSH\ may have
+ functions as arguments. This allows the circuit designer to apply a
+ large amount of code reuse. The only exception is again the top-level
+ function: if a function-typed argument is not instantiated with an actual
+ function, no hardware can be generated.
+
+ An example of a common circuit where higher-order functions and partial
+ application lead to a very concise and natural description is a crossbar.
+ The code (\ref{code:crossbar}) for this example can be seen below:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ crossbar inputs selects = map (mux inputs) selects
+ where
+ mux inp x = (inp ! x)
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:crossbar}
+ \end{example}
+ \end{minipage}
+
+ The \hs{crossbar} function selects those values from \hs{inputs} that
+ are indicated by the indexes in the vector \hs{selects}. The crossbar is
+ polymorphic in the width of the input (defined by the length of
+ \hs{inputs}), the width of the output (defined by the length of
+ \hs{selects}), and the signal type (defined by the element type of
+ \hs{inputs}). The type-checker can also automatically infer that
+ \hs{selects} is a vector of \hs{Index} values due to the use of the vector
+ indexing operator (\hs{!}).
+
+ \subsection{State}
+ In a stateful design, the outputs depend on the history of the inputs, or
+ the state. State is usually stored in registers, which retain their value
+ during a clock cycle.
+ % As \CLaSH\ has to be able to describe more than plain combinational
+ % designs, there is a need for an abstraction mechanism for state.
+
+ An important property in Haskell, and in many other functional languages,
+ is \emph{purity}. A function is said to be \emph{pure} if it satisfies two
+ conditions:
+ \begin{inparaenum}
+ \item given the same arguments twice, it should return the same value in
+ both cases, and
+ \item that the function has no observable side-effects.
+ \end{inparaenum}
+ % This purity property is important for functional languages, since it
+ % enables all kinds of mathematical reasoning that could not be guaranteed
+ % correct for impure functions.
+ Pure functions are a perfect match for combinational circuits, where the
+ output solely depends on the inputs. When a circuit has state however, it
+ can no longer be described by a pure function.
+ % Simply removing the purity property is not a valid option, as the
+ % language would then lose many of it mathematical properties.
+ \CLaSH\ deals with the concept of state by making the current state an
+ additional argument of the function, and the updated state part of the
+ result. In this sense the descriptions made in \CLaSH\ are the
+ combinational parts of a mealy machine.
+
+ A simple example is adding an accumulator register to the earlier
+ multiply-accumulate circuit, of which the resulting netlist can be seen in
+ \Cref{img:mac-state}:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ macS (State c) (a, b) = (State c', c')
+ where
+ c' = mac a b c
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:macstate}
+ \end{example}
+ \end{minipage}
+
+ Note that the \hs{macS} function returns both the new state and the value
+ of the output port. The \hs{State} wrapper indicates which arguments are
+ part of the current state, and what part of the output is part of the
+ updated state. This aspect will also be reflected in the type signature of
+ the function. Abstracting the state of a circuit in this way makes it very
+ explicit: which variables are part of the state is completely determined
+ by the type signature. This approach to state is well suited to be used in
+ combination with the existing code and language features, such as all the
+ choice elements, as state values are just normal values from Haskell's
+ point of view. Stateful descriptions are simulated using the recursive
+ \hs{run} function:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ run f s (i : inps) = o : (run f s' inps)
+ where
+ (s', o) = f s i
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:run}
+ \end{example}
+ \end{minipage}
+
+ The \hs{(:)} operator is the list concatenation operator, where the
+ left-hand side is the head of a list and the right-hand side is the
+ remainder of the list. The \hs{run} function applies the function the
+ developer wants to simulate, \hs{f}, to the current state, \hs{s}, and the
+ first input value, \hs{i}. The result is the first output value, \hs{o},
+ and the updated state \hs{s'}. The next iteration of the \hs{run} function
+ is then called with the updated state, \hs{s'}, and the rest of the
+ inputs, \hs{inps}. In the context of this paper, it is assumed that there
+ is one input per clock cycle. Note that the order of \hs{s',o,s,i} in the
+ \hs{where} clause of the \hs{run} functions corresponds with the order of
+ the input, output and state of the \hs{macS} function
+ (\ref{code:macstate}). Thus, the expression below (\ref{code:runmacs})
+ simulates \hs{macS} on \hs{inputpairs} starting with the value \hs{0}:
+
+ \hspace{-1.7em}
+ \begin{minipage}{0.93\linewidth}
+ \begin{code}
+ run macS 0 inputpairs
+ \end{code}
+ \end{minipage}
+ \begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:runmacs}
+ \end{example}
+ \end{minipage}
+
+ \begin{figure}
+ \centerline{\includegraphics{mac-state.svg}}
+ \caption{Stateful Multiply-Accumulate}
+ \label{img:mac-state}
+ \vspace{-1.5em}
+ \end{figure}
+
+ The complete simulation can be compiled to an executable binary by a
+ Haskell compiler, or executed in a Haskell interpreter. Both
+ simulation paths require less effort from a circuit designer than first
+ translating the description to \VHDL\ and then running a \VHDL\
+ simulation; it is also very likely that both simulation paths are much
+ faster.
+
+\section{The \CLaSH\ compiler}
+\label{sec:compiler}
+The prototype \CLaSH\ compiler translates descriptions made in the \CLaSH\
+language as described in the previous section to synthesizable \VHDL.
+% , allowing a designer to actually run a \CLaSH\ design on an \acro{FPGA}.
+
+The Glasgow Haskell Compiler (\GHC)~\cite{ghc} is an open source Haskell
+compiler that also provides a high level \acro{API} to most of its internals.
+Furthermore, it provides several parts of the prototype compiler for free,
+such as the parser, the semantics checker, and the type checker. These parts
+together form the front-end of the prototype compiler pipeline, as seen in
+\Cref{img:compilerpipeline}.
+
+\begin{figure}
+\vspace{1em}
+\centerline{\includegraphics{compilerpipeline.svg}}
+\caption{\CLaSHtiny\ compiler pipeline}
+\label{img:compilerpipeline}
+\vspace{-1.5em}
+\end{figure}
+
+The output of the \GHC\ front-end consists of the translation of the original
+Haskell description to \emph{Core}~\cite{Sulzmann2007}, which is a small
+typed functional language. This \emph{Core} language is relatively easy to
+process compared to the larger Haskell language. A description in \emph{Core}
+can still contain elements which have no direct translation to hardware, such
+as polymorphic types and function-valued arguments. Such a description needs
+to be transformed to a \emph{normal form}, which corresponds directly to
+hardware. The second stage of the compiler, the \emph{normalization} phase,
+exhaustively applies a set of \emph{meaning-preserving} transformations on the
+\emph{Core} description until this description is in a \emph{normal form}.
+This set of transformations includes transformations typically found in
+reduction systems and lambda calculus~\cite{lambdacalculus}, such as
+$\beta$-reduction and $\eta$-expansion. It also includes transformations that
+are responsible for the specialization of higher-order functions to `regular'
+first-order functions, and specializing polymorphic types to concrete types.
+
+The final step in the compiler pipeline is the translation to a \VHDL\
+\emph{netlist}, which is a straightforward process due to the resemblance of a
+normalized description and a set of concurrent signal assignments. The
+end-product of the \CLaSH\ compiler is called a \VHDL\ \emph{netlist} as the
+result resembles an actual netlist description, and the fact that it is \VHDL\
+is only an implementation detail; e.g., the output could have been Verilog or
+even \acro{EDIF}.
+
+\section{Use cases}
+\label{sec:usecases}
+\subsection{FIR Filter}
+As an example of a common hardware design where the relation between
+functional languages and mathematical functions, combined with the use of
+higher-order functions leads to a very natural description is a \acro{FIR}
+filter:
+
+\begin{equation}
+y_t = \sum\nolimits_{i = 0}^{n - 1} {x_{t - i} \cdot h_i }
+\end{equation}
+
+A \acro{FIR} filter multiplies fixed constants ($h$) with the current
+and a few previous input samples ($x$). Each of these multiplications
+are summed, to produce the result at time $t$. The equation of a \acro{FIR}
+filter is equivalent to the equation of the dot-product of two vectors, which
+is shown below:
+
+\begin{equation}
+\mathbf{a}\bullet\mathbf{b} = \sum\nolimits_{i = 0}^{n - 1} {a_i \cdot b_i }
+\end{equation}
+
+The equation for the dot-product is easily and directly implemented using
+higher-order functions:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+as *+* bs = fold (+) (zip{-"\!\!\!"-}With (*) as bs)
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:dotproduct}
+ \end{example}
+\end{minipage}
+
+The \hs{zip{-"\!\!\!"-}With} function is very similar to the \hs{map} function
+seen earlier: It takes a function, two vectors, and then applies the function
+to each of the elements in the two vectors pairwise (\emph{e.g.},
+\hs{zip{-"\!\!\!"-}With (*) [1, 2] [3, 4]} becomes \hs{[1 * 3, 2 * 4]}).
+
+The \hs{fold} function takes a binary function, a single vector, and applies
+the function to the first two elements of the vector. It then applies the
+function to the result of the first application and the next element in the
+vector. This continues until the end of the vector is reached. The result of
+the \hs{fold} function is the result of the last application. It is obvious
+that the \hs{zip{-"\!\!\!\!"-}With (*)} function is pairwise multiplication
+and that the \hs{fold (+)} function is summation.
+% Returning to the actual \acro{FIR} filter, we will slightly change the
+% equation describing it, so as to make the translation to code more obvious and
+% concise. What we do is change the definition of the vector of input samples
+% and delay the computation by one sample. Instead of having the input sample
+% received at time $t$ stored in $x_t$, $x_0$ now always stores the newest
+% sample, and $x_i$ stores the $ith$ previous sample. This changes the equation
+% to the following (note that this is completely equivalent to the original
+% equation, just with a different definition of $x$ that will better suit the
+% transformation to code):
+%
+% \begin{equation}
+% y_t = \sum\nolimits_{i = 0}^{n - 1} {x_i \cdot h_i }
+% \end{equation}
+The complete definition of the \acro{FIR} filter in \CLaSH\ is:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+fir (State (xs,hs)) x =
+ (State (shiftInto x xs,hs), (x +> xs) *+* hs)
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:fir}
+ \end{example}
+\end{minipage}
+
+where the vector \hs{xs} contains the previous input samples, the vector
+\hs{hs} contains the \acro{FIR} coefficients, and \hs{x} is the current input
+sample. The concatenate operator (\hs{+>}) creates a new vector by placing the
+current sample (\hs{x}) in front of the previous samples vector (\hs{xs}). The
+code for the \hs{shiftInto} function, that adds the new input sample (\hs{x})
+to the list of previous input samples (\hs{xs}) and removes the oldest sample,
+is shown below:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+shiftInto x xs = x +> init xs
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:shiftinto}
+ \end{example}
+\end{minipage}
+
+where the \hs{init} function returns all but the last element of a vector.
+The resulting netlist of a 4-taps \acro{FIR} filter, created by specializing
+the vectors of the \acro{FIR} code to a length of 4, is depicted in
+\Cref{img:4tapfir}.
+
+\begin{figure}
+\centerline{\includegraphics{4tapfir.svg}}
+\caption{4-taps \acrotiny{FIR} Filter}
+\label{img:4tapfir}
+\vspace{-1.5em}
+\end{figure}
+
+\subsection{Higher-order CPU}
+%format fun x = "\textit{fu}_" x
+This section discusses a somewhat more elaborate example in which user-defined
+higher-order function, partial application, lambda expressions, and pattern
+matching are exploited. The example concerns a \acro{CPU} which consists of
+four function units, \hs{fun 0,{-"\ldots"-},fun 3}, (see
+\Cref{img:highordcpu}) that each perform some binary operation.
+
+\begin{figure}
+\centerline{\includegraphics{highordcpu.svg}}
+\caption{CPU with higher-order Function Units}
+\label{img:highordcpu}
+\vspace{-1.5em}
+\end{figure}
+
+Every function unit has seven data inputs (of type \hs{Signed 16}), and two
+address inputs (of type \hs{Index 6}) that indicate which data inputs have to
+be chosen as arguments for the binary operation that the unit performs.
+These data inputs consist of one external input \hs{x}, two fixed
+initialization values (0 and 1), and the previous outputs of the four function
+units. The output of the \acro{CPU} as a whole is the previous output of
+\hs{fun 3}.
+
+The function units \hs{fun 1}, \hs{fun 2}, and \hs{fun 3} can perform a fixed
+binary operation, whereas \hs{fun 0} has an additional input for an opcode to
+choose a binary operation out of a few possibilities. Each function unit
+outputs its result into a register, i.e., the state of the \acro{CPU}. This
+state can e.g. be defined as follows:
+
+\begin{code}
+type CpuState = State [Signed 16 | 4]
+\end{code}
+
+Every function unit can now be defined by the following higher-order function,
+\hs{fu}, which takes three arguments: the operation \hs{op} that the function
+unit performs, the seven \hs{inputs}, and the address pair
+\hs{({-"a_0"-},{-"a_1"-})}:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+fu op inputs ({-"a_0"-}, {-"a_1"-}) =
+ op (inputs!{-"a_0"-}) (inputs!{-"a_1"-})
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:functionunit}
+ \end{example}
+\end{minipage}
+
+\noindent Using partial application we now define:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+fun 1 = fu add
+fun 2 = fu sub
+fun 3 = fu mul
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:functionunits1to3}
+ \end{example}
+\end{minipage}
+
+In order to define \hs{fun 0}, the \hs{Opcode} type and the \hs{multiop}
+function that chooses a specific operation given the opcode, are defined
+first. It is assumed that the binary functions \hs{shift} (where \hs{shift a
+b} shifts \hs{a} by the number of bits indicated by \hs{b}) and \hs{xor} (for
+the bitwise \hs{xor}) exist.
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+data Opcode = Shift | Xor | Equal
+
+multiop Shift = shift
+multiop Xor = xor
+multiop Equal = \a b -> if a == b then 1 else 0
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:multiop}
+ \end{example}
+\end{minipage}
+
+Note that the result of \hs{multiop} is a binary function; this is supported
+by \CLaSH. The complete definition of \hs{fun 0}, which takes an opcode as
+additional argument, is:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+fun 0 c = fu (multiop c)
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:functionunit0}
+ \end{example}
+\end{minipage}
+
+\noindent Now comes the definition of the full \acro{CPU}. Its type is:
+
+\begin{code}
+cpu :: CpuState
+ -> (Signed 16, Opcode, [(Index 6, Index 6) | 4])
+ -> (CpuState, Signed 16)
+\end{code}
+
+\noindent Note that this type fits the requirements of the \hs{run} function.
+The actual definition of the \hs{cpu} function is:
+
+\hspace{-1.7em}
+\begin{minipage}{0.93\linewidth}
+\begin{code}
+cpu (State s) (x,opc,addrs) = (State s', out)
+ where
+ inputs = x +> (0 +> (1 +> s))
+ s' = [{-"\;"-}fun 0 opc inputs (addrs!0)
+ ,{-"\;"-}fun 1 inputs (addrs!1)
+ ,{-"\;"-}fun 2 inputs (addrs!2)
+ ,{-"\;"-}fun 3 inputs (addrs!3)
+ ]
+ out = last s
+\end{code}
+\end{minipage}
+\begin{minipage}{0.07\linewidth}
+ \begin{example}
+ \label{code:cpu}
+ \end{example}
+\end{minipage}
+
+Due to space restrictions, \Cref{img:highordcpu} does not depict the actual
+functionality of the \hs{fu}-components, but note that e.g. \hs{multiop} is a
+subcomponent of \hs{fun 0}.
+
+While the \acro{CPU} has a simple (and maybe not very useful) design, it
+illustrates some possibilities that \CLaSH\ offers and suggests how to write
+actual designs.
+
+% Each of the function units has both its operands connected to all data
+% sources, and can be programmed to select any data source for either
+% operand. In addition, the leftmost function unit has an additional
+% opcode input to select the operation it performs. The previous output of the
+% rightmost function unit is the output of the entire \acro{CPU}.
+%
+% The code of the function unit (\ref{code:functionunit}), which arranges the
+% operand selection for the function unit, is shown below. Note that the actual
+% operation that takes place inside the function unit is supplied as the
+% (higher-order) argument \hs{op}, which is a function that takes two arguments.
+%
+%
+%
+% The \hs{multiop} function (\ref{code:multiop}) defines the operation that takes place in the leftmost function unit. It is essentially a simple three operation \acro{ALU} that makes good use of pattern matching and guards in its description. The \hs{shift} function used here shifts its first operand by the number of bits indicated in the second operand, the \hs{xor} function produces
+% the bitwise xor of its operands.
+%
+%
+% The \acro{CPU} function (\ref{code:cpu}) ties everything together. It applies
+% the function unit (\hs{fu}) to several operations, to create a different
+% function unit each time. The first application is interesting, as it does not
+% just pass a function to \hs{fu}, but a partial application of \hs{multiop}.
+% This demonstrates how one function unit can effectively get extra inputs
+% compared to the others.
+%
+% The vector \hs{inputs} is the set of data sources, which is passed to
+% each function unit as a set of possible operants. The \acro{CPU} also receives
+% a vector of address pairs, which are used by each function unit to select
+% their operand.
+% The application of the function units to the \hs{inputs} and
+% \hs{addrs} arguments seems quite repetitive and could be rewritten to use
+% a combination of the \hs{map} and \hs{zipwith} functions instead.
+% However, the prototype compiler does not currently support working with
+% lists of functions, so a more explicit version of the code is given instead.
+
+% While this is still a simple example, it could form the basis of an actual
+% design, in which the same techniques can be reused.