Update many minor details, and include more of Arjan's comments

[matthijs/master-project/dsd-paper.git] / cλash.lhs
diff --git a/cλash.lhs b/cλash.lhs

index 45ea2968cd6ee3a91f3d53a14eb74d8db4225354..a6a3bdb855644912c4a827191ea7b42dc21c6d0e 100644 (file)
--- a/cλash.lhs
+++ b/cλash.lhs
@@ -409,12 +409,12 @@
  % author names and affiliations
  % use a multiple column layout for up to three different
  % affiliations
-\author{\IEEEauthorblockN{Matthijs Kooijman, Christiaan P.R. Baaij, Jan Kuper, Marco E.T. Gerards}%, Bert Molenkamp, Sabih H. Gerez}
+\author{\IEEEauthorblockN{Christiaan P.R. Baaij, Matthijs Kooijman, Jan Kuper, Marco E.T. Gerards}%, Bert Molenkamp, Sabih H. Gerez}
  \IEEEauthorblockA{%Computer Architecture for Embedded Systems (CAES)\\ 
  Department of EEMCS, University of Twente\\
  P.O. Box 217, 7500 AE, Enschede, The Netherlands\\
-matthijs@@stdin.nl, c.p.r.baaij@@utwente.nl, j.kuper@@utwente.nl}
-\thanks{Supported through the FP7 project: S(o)OS (248465)}
+c.p.r.baaij@@utwente.nl, matthijs@@stdin.nl, j.kuper@@utwente.nl}
+% \thanks{Supported through the FP7 project: S(o)OS (248465)}
  }
  % \and
  % \IEEEauthorblockN{Homer Simpson}
@@ -512,56 +512,52 @@ Verilog~\cite{Verilog}, allowed an engineer to describe circuits using a
  detailed hardware properties such as timing behavior, but are generally 
  cumbersome in expressing higher-level abstractions. In an attempt to raise the 
  abstraction level of the descriptions, a great number of approaches based on 
-functional languages has been proposed \cite{Cardelli1981,muFP,DAISY,FHDL,
-T-Ruby,Hydra,HML2,Hawk1,Lava,ForSyDe1,Wired,reFLect}. The idea of using 
+functional languages has been proposed \cite{Cardelli1981,muFP,DAISY,
+T-Ruby,HML2,Hydra,Hawk1,Lava,Wired,ForSyDe1,reFLect}. The idea of using 
  functional languages for hardware descriptions started in the early 1980s 
-\cite{Cardelli1981,muFP,DAISY,FHDL}, a time which also saw the birth of the 
-currently popular hardware description languages such as \VHDL. Functional 
+\cite{Cardelli1981,muFP,DAISY}, a time which also saw the birth of the 
+currently popular \acrop{HDL}, such as \VHDL. Functional 
  languages are especially well suited to describe hardware because 
-combinational circuits can be directly modeled as mathematical functions. 
-Functional languages are very good at describing and composing these 
-mathematical functions.
-
-In an attempt to decrease the amount of work involved in creating all the 
-required tooling, such as parsers and type-checkers, many functional
-\acrop{HDL} \cite{Hydra,Hawk1,Lava,ForSyDe1,Wired} are embedded as a domain 
-specific language (\acro{DSL}) inside the functional language Haskell 
+combinational circuits can be directly modeled as mathematical functions and
+functional languages are very good at describing and composing these
+functions.
+
+In an attempt to ease the prototyping process of the language, such as 
+creating all the required tooling, like parsers and type-checkers, many 
+functional \acrop{HDL} \cite{Hydra,Hawk1,Lava,Wired} are embedded as a domain 
+specific language (\acro{DSL}) within the functional language Haskell 
  \cite{Haskell}. This means that a developer is given a library of Haskell 
  functions and types that together form the language primitives of the 
  \acro{DSL}. The primitive functions used to describe a circuit do not actually 
-process any signals, but instead compose a large domain-specific datatype 
-(which is usually hidden from the designer). This datatype is then further 
-processed by an embedded circuit compiler. As Haskell's choice elements 
-(\hs{if}-expressions, \hs{case}-expressions, pattern matching, etc.) are 
-evaluated at the time the domain-specific datatype is being build, they are no 
-longer visible to the embedded compiler that processes the datatype. 
-Consequently, it is impossible the capture Haskell's choice elements within a 
-circuit description when taking the embedded language approach. However, 
-descriptions can still contain polymorphism and higher-order functions.
-
-The approach taken in this research is not to make another \acro{DSL} embedded 
-in Haskell, but to use (a subset of) the Haskell language \emph{itself} for 
-the purpose of describing hardware. By taking this approach, we \emph{can} 
-capture certain language constructs, such as Haskell's choice elements, within 
-circuit descriptions. To the best knowledge of the authors, supporting 
-polymorphism, higher-order functions and such an extensive array of 
-choice-elements is new in the domain of (functional) \acrop{HDL}. 
+process any signals, they instead compose a large domain-specific graph 
+(which is usually hidden from the designer). This graph is then further 
+processed by an embedded circuit compiler which can perform for example 
+simulation or synthesis. As Haskell's choice elements (\hs{case}-expressions, 
+pattern-matching etc.) are evaluated at the time the domain-specific graph is 
+being build, they are no longer visible to the embedded compiler that 
+processes the datatype. Consequently, it is impossible to capture Haskell's 
+choice elements within a circuit description when taking the embedded language 
+approach. This does not mean that circuits specified in an embedded language 
+can not contain choice, just that choice elements only exists as functions, 
+e.g. a multiplexer function, and not as language elements.
+
+The approach taken in this research is to use (a subset of) the Haskell 
+language \emph{itself} for the purpose of describing hardware. By taking this 
+approach, this research \emph{can} capture certain language constructs, like 
+all of Haskell's choice elements, within circuit descriptions. The more 
+advanced features of Haskel, such as polymorphic typing and higher-order 
+function, are also supported.
+
+% supporting polymorphism, higher-order functions and such an extensive array 
+% of choice-elements, combined with a very concise way of specifying circuits 
+% is new in the domain of (functional) \acrop{HDL}. 
  % As the hardware descriptions are plain Haskell 
  % functions, these descriptions can be compiled to an executable binary
  % for simulation using an optimizing Haskell compiler such as the Glasgow
  % Haskell Compiler (\GHC)~\cite{ghc}.
  
  Where descriptions in a conventional \acro{HDL} have an explicit clock for the 
-purposes state and synchronicity, the clock is implied in the context of the 
-research presented in this paper. A circuit designer describes the behavior of 
-the hardware between clock cycles. Many functional \acrop{HDL} model signals 
-as a stream of all values over time; state is then modeled as a delay on this 
-stream of values. The approach taken in this research is to make the current 
-state an additional input and the updated state a part of the output of a 
-function. The current abstraction of state and time limits the descriptions to 
-synchronous hardware, there is however room within the language to eventually 
-add a different abstraction mechanism that will allow for the modeling of 
-asynchronous systems.
+purposes state and synchronicity, the clock is implicit for the descriptions and research presented in this paper. A circuit designer describes the behavior of the hardware between clock cycles. Many functional \acrop{HDL} model signals as a stream of all values over time; state is then modeled as a delay on this stream of values. Descriptions presented in this research make the current state an additional input and the updated state a part of their output. This abstraction of state and time limits the descriptions to synchronous hardware, there is however room within the language to eventually add a different abstraction mechanism that will allow for the modeling of asynchronous systems.
  
  Like the traditional \acrop{HDL}, descriptions made in a functional \acro{HDL} 
  must eventually be converted into a netlist. This research also features a 
@@ -571,11 +567,16 @@ prototype translator, which has the same name as the language:
  behaving synthesizable \VHDL\ code, ready to be converted to an actual netlist 
  format by an (optimizing) \VHDL\ synthesis tool.
  
-Besides trivial circuits such as variants of both the \acro{FIR} filter and 
-the simple \acro{CPU} shown in \Cref{sec:usecases}, the \CLaSH\ compiler has 
-also been able to successfully translate non-trivial functional descriptions 
-such as a streaming reduction circuit~\cite{reductioncircuit} for floating 
-point numbers.
+Besides simple circuits such as variants of both the \acro{FIR} filter and 
+the higher-order \acro{CPU} shown in \Cref{sec:usecases}, the \CLaSH\ compiler 
+has also been able to translate non-trivial functional descriptions such as a 
+streaming reduction circuit~\cite{reductioncircuit} for floating point 
+numbers.
+
+To the best knowledge of the authors, \CLaSH\ is the only (functional) 
+\acro{HDL} that allows circuit specification to be written in a very concise 
+way and at the same time support such advanced features as polymorphic typing, 
+higher order functions and pattern matching.
  
  \section{Hardware description in Haskell}
  The following section describes the basic language elements of \CLaSH\ and the 
@@ -584,9 +585,8 @@ various subsections, the relation between the language elements and their
  eventual netlist representation is also highlighted. 
  
    \subsection{Function application}
-    Two basic syntactic elements of a functional program are functions
-    and function application. These have a single obvious translation to a 
-    netlist format: 
+    Two basic elements of a functional program are functions and function 
+    application. These have a single obvious translation to a netlist format: 
      \begin{inparaenum}
        \item every function is translated to a component, 
        \item every function argument is translated to an input port,
@@ -601,18 +601,18 @@ eventual netlist representation is also highlighted.
      port of the function is also mapped to a signal, which is used as the 
      result of the application itself. Since every top level function generates 
      its own component, the hierarchy of function calls is reflected in the 
-    final netlist, creating a hierarchical description of the hardware. 
+    final netlist. %, creating a hierarchical description of the hardware. 
      % The separation in different components makes it easier for a developer 
      % to understand and possibly hand-optimize the resulting \VHDL\ output of 
      % the \CLaSH\ compiler.
  
-    The short example (\ref{lst:code1}) demonstrated below gives an indication 
-    of the level of conciseness that can be achieved with functional hardware 
-    description languages when compared with the more traditional hardware 
-    description languages. The example is a combinational multiply-accumulate 
-    circuit that works for \emph{any} word length (this type of polymorphism 
-    will be further elaborated in \Cref{sec:polymorhpism}). The corresponding 
-    netlist is depicted in \Cref{img:mac-comb}.
+    The short example (\ref{lst:code1}) seen below gives a demonstration of 
+    the conciseness that can be achieved with \CLaSH\ when compared with 
+    other (more traditional) \acrop{HDL}. The example is a combinational 
+    multiply-accumulate circuit that works for \emph{any} word length (this 
+    type of polymorphism will be further elaborated in 
+    \Cref{sec:polymorhpism}). The corresponding netlist is depicted in 
+    \Cref{img:mac-comb}.
      
      \hspace{-1.7em}
      \begin{minipage}{0.93\linewidth}
@@ -636,7 +636,7 @@ eventual netlist representation is also highlighted.
      The use of a composite result value is demonstrated in the next example 
      (\ref{lst:code2}), where the multiply-accumulate circuit not only returns 
      the accumulation result, but also the intermediate multiplication result. 
-    Its corresponding netlist can be see in \Cref{img:mac-comb-composite}.
+    Its corresponding netlist can be seen in \Cref{img:mac-comb-composite}.
      
      \hspace{-1.7em}
      \begin{minipage}{0.93\linewidth}
@@ -651,8 +651,10 @@ eventual netlist representation is also highlighted.
        \label{lst:code2}
        \end{example}
      \end{minipage}
+    \vspace{-1.5em}
      
      \begin{figure}
+    \vspace{1em}
      \centerline{\includegraphics{mac-nocurry.svg}}
      \caption{Combinational Multiply-Accumulate (composite output)}
      \label{img:mac-comb-composite}
@@ -666,39 +668,46 @@ eventual netlist representation is also highlighted.
      expressions (\hs{if} expressions can be directly translated to 
      \hs{case} expressions). When transforming a \CLaSH\ description to a   
      netlist, a \hs{case} expression is translated to a multiplexer. The 
-    control value is fed into a number of comparators and their output forms 
-    the selection port of the multiplexer. The result of each alternative in 
-    the \hs{case} expression is linked to the corresponding input port on the 
-    multiplexer.
+    control value of the \hs{case} expression is fed into a number of 
+    comparators and their combined output forms the selection port of the 
+    multiplexer. The result of each alternative in the \hs{case} expression is 
+    linked to the corresponding input port of the multiplexer.
      % A \hs{case} expression can in turn simply be translated to a conditional 
      % assignment in \VHDL, where the conditions use equality comparisons 
      % against the constructors in the \hs{case} expressions. 
-    We can see two versions of a contrived example below, the first  
-    (\ref{lst:code3}) using a \hs{case} expression, and the other 
-    (\ref{lst:code4}) using an \hs{if-then-else} expression . Both examples 
-    sums two values when they are equal or non-equal (depending on the given 
-    predicate, the \hs{pred} variable) and returns 0 otherwise. The \hs{pred} 
-    variable if of the following, user-defined, enumeration datatype:
+    
+    % Two versions of a contrived example are displayed below, the first  
+    % (\ref{lst:code3}) using a \hs{case} expression and the second 
+    % (\ref{lst:code4}) using an \hs{if-then-else} expression. Both examples 
+    % sum two values when they are equal or non-equal (depending on the given 
+    % predicate, the \hs{pred} variable) and return 0 otherwise. 
+    
+    An code example (\ref{lst:code3}) that uses a \hs{case} expression and 
+    \hs{if-then-else} expressions is shown below. The function counts up or 
+    down depending on the \hs{direction} variable, and has a \hs{wrap} 
+    variable that determines both the upper bound and wrap-around point of the 
+    counter. The \hs{direction} variable is of the following, user-defined, 
+    enumeration datatype:
      
      \begin{code}
-    data Pred = Equal | NotEqual
+    data Direction = Up | Down
      \end{code}
  
-    The naive netlist corresponding to both versions of the example is 
-    depicted in \Cref{img:choice}. Note that the \hs{pred} variable is only
-    compared to the \hs{Equal} value, as an inequality immediately implies 
-    that the \hs{pred} variable has a \hs{NotEqual} value.
+    The naive netlist corresponding to this example is depicted in 
+    \Cref{img:counter}. Note that the \hs{direction} variable is only
+    compared to \hs{Up}, as an inequality immediately implies that 
+    \hs{direction} is \hs{Down}.
  
      \hspace{-1.7em}
      \begin{minipage}{0.93\linewidth}
      \begin{code}    
-    sumif pred a b = case pred of
-      Equal -> case a == b of
-        True      -> a + b
-        False     -> 0
-      NotEqual  -> case a != b of
-        True      -> a + b
-        False     -> 0
+    counter direction wrap x = case direction of
+        Up    -> if   x < wrap  then 
+                      x + 1     else 
+                      0
+        Down  -> if   x > 0   then 
+                      x - 1   else 
+                      wrap
      \end{code}
      \end{minipage}
      \begin{minipage}{0.07\linewidth}
@@ -706,29 +715,36 @@ eventual netlist representation is also highlighted.
        \label{lst:code3}
        \end{example}
      \end{minipage}
+    
+    % \hspace{-1.7em}
+    % \begin{minipage}{0.93\linewidth}
+    % \begin{code}
+    % sumif pred a b = 
+    %   if pred == Equal then 
+    %     if a == b then a + b else 0
+    %   else 
+    %     if a != b then a + b else 0
+    % \end{code}
+    % \end{minipage}
+    % \begin{minipage}{0.07\linewidth}
+    %   \begin{example}
+    %   \label{lst:code4}
+    %   \end{example}
+    % \end{minipage}
  
-    \hspace{-1.7em}
-    \begin{minipage}{0.93\linewidth}
-    \begin{code}
-    sumif pred a b = 
-      if pred == Equal then 
-        if a == b then a + b else 0
-      else 
-        if a != b then a + b else 0
-    \end{code}
-    \end{minipage}
-    \begin{minipage}{0.07\linewidth}
-      \begin{example}
-      \label{lst:code4}
-      \end{example}
-    \end{minipage}
+    % \begin{figure}
+    % \vspace{1em}
+    % \centerline{\includegraphics{choice-case.svg}}
+    % \caption{Choice - sumif}
+    % \label{img:choice}
+    % \vspace{-1.5em}
+    % \end{figure}
  
      \begin{figure}
-    \vspace{1em}
-    \centerline{\includegraphics{choice-case.svg}}
-    \caption{Choice - sumif}
-    \label{img:choice}
-    \vspace{-1.5em}
+    \centerline{\includegraphics{counter.svg}}
+    \caption{Counter netlist}
+    \label{img:counter}
+    \vspace{-2em}
      \end{figure}
  
      A user-friendly and also very powerful form of choice that is not found in 
@@ -740,23 +756,22 @@ eventual netlist representation is also highlighted.
      clause if the guard evaluates to false. Like \hs{if-then-else} 
      expressions, pattern matching and guards have a (straightforward) 
      translation to \hs{case} expressions and can as such be mapped to 
-    multiplexers. A third version (\ref{lst:code5}) of the earlier example, 
+    multiplexers. A second version (\ref{lst:code5}) of the earlier example, 
      now using both pattern matching and guards, can be seen below. The guard 
      is the expression that follows the vertical bar (\hs{|}) and precedes the 
      assignment operator (\hs{=}). The \hs{otherwise} guards always evaluate to 
      \hs{true}.
      
      The version using pattern matching and guards corresponds to the same 
-    naive netlist representation (\Cref{img:choice}) as the earlier two 
-    versions of the example.
+    naive netlist representation (\Cref{img:counter}) as the earlier example.
      
      \hspace{-1.7em}
      \begin{minipage}{0.93\linewidth}
      \begin{code}
-    sumif Equal     a b   | a == b      = a + b
-                          | otherwise   = 0
-    sumif NotEqual  a b   | a != b      = a + b
+    counter Up    wrap x  | x < wrap    = x + 1
                            | otherwise   = 0
+    counter Down  wrap x  | x > 0       = x - 1
+                          | otherwise   = wrap
      \end{code}
      \end{minipage}
      \begin{minipage}{0.07\linewidth}
@@ -780,14 +795,14 @@ eventual netlist representation is also highlighted.
      \emph{built-in} types and \emph{user-defined} types. Built-in types are 
      those types for which a fixed translation is defined within the \CLaSH\ 
      compiler. The \CLaSH\ compiler has generic translation rules to
-    translate the user-defined types described later on.
+    translate the user-defined types, which are described later on.
  
      The \CLaSH\ compiler is able to infer unspecified (polymorphic) types,
      meaning that a developer does not have to annotate every function with a 
      type signature. % (even if it is good practice to do so).
      Given that the top-level entity of a circuit design is annotated with 
-    concrete types, the \CLaSH\ compiler can specialize polymorphic functions 
-    to functions with concrete types.
+    concrete/monomorphic types, the \CLaSH\ compiler can specialize 
+    polymorphic functions to functions with concrete types.
    
      % Translation of two most basic functional concepts has been
      % discussed: function application and choice. Before looking further
@@ -839,7 +854,7 @@ eventual netlist representation is also highlighted.
          % \texttt{signed} respectively.
        \item[\bf{Vector}]
          this is a vector type that can contain elements of any other type and
-        has a fixed length. The \hs{Vector} type constructor takes two type 
+        has a static length. The \hs{Vector} type constructor takes two type 
          arguments: the length of the vector and the type of the elements 
          contained in it. The short-hand notation used for the vector type in  
          the rest of paper is: \hs{[a|n]}, where \hs{a} is the element 
@@ -860,13 +875,12 @@ eventual netlist representation is also highlighted.
          % \hs{RegisterState} type is a vector of 8 32-bit words. A fixed size 
          % vector is translated to a \VHDL\ array type.
        \item[\bf{Index}]
-        this is another type to describe integers, but unlike the previous
-        two it has no specific bit-width, but an upper bound. This means that
-        its range is not limited to powers of two, but can be any number.
-        An \hs{Index} only has an upper bound, its lower bound is
-        implicitly zero. If a value of this type exceeds either bounds, an 
-        error will be thrown at simulation-time. The main purpose of the 
-        \hs{Index} type is to be used as an index to a \hs{Vector}.
+        the main purpose of the \hs{Index} type is to be used as an index into 
+        a \hs{Vector}, and has no specified bit-size, but a specified upper 
+        bound. This means that its range is not limited to powers of two, but 
+        can be any number. An \hs{Index} only has an upper bound, its lower 
+        bound is implicitly zero. If a value of this type exceeds either 
+        bounds, an error will be thrown at \emph{simulation}-time. 
  
          % \comment{TODO: Perhaps remove this example?} To define an index for 
          % the 8 element vector above, we would do:
@@ -884,31 +898,33 @@ eventual netlist representation is also highlighted.
      \end{xlist}
  
    \subsubsection{User-defined types}
-    There are three ways to define new types in Haskell: algebraic
-    data-types with the \hs{data} keyword, type synonyms with the \hs{type}
-    keyword and datatype renaming constructs with the \hs{newtype} keyword. 
+    % There are three ways to define new types in Haskell: algebraic
+    % data-types with the \hs{data} keyword, type synonyms with the \hs{type}
+    % keyword and datatype renaming constructs with the \hs{newtype} keyword. 
      % \GHC\ offers a few more advanced ways to introduce types (type families,
      % existential typing, {\acro{GADT}}s, etc.) which are not standard 
      % Haskell. As it is currently unclear how these advanced type constructs 
      % correspond to hardware, they are for now unsupported by the \CLaSH\ 
      % compiler.
-
-    Only an algebraic datatype declaration actually introduces a
-    completely new type. Type synonyms and type renaming only define new 
-    names for existing types, where synonyms are completely interchangeable 
-    and a type renaming requires an explicit conversion. Type synonyms and 
-    type renaming do not need any particular translation, a synonym or 
-    renamed type will just use the same representation as the original type. 
+    A completely new type is introduced by an algebraic datatype declaration 
+    which is defined using the \hs{data} keyword. Type synonyms can be 
+    introduced using the \hs{type} keyword.
+    % Only an algebraic datatype declaration actually introduces a
+    % completely new type. Type synonyms and type renaming only define new 
+    % names for existing types, where synonyms are completely interchangeable 
+    % and a type renaming requires an explicit conversion. 
+    Type synonyms do not need any particular translation, as a synonym  will 
+    just use the same representation as the original type. 
      
      For algebraic types, we can make the following distinctions:
      \begin{xlist}
        \item[\bf{Single constructor}]
          Algebraic datatypes with a single constructor with one or more
-        fields, are essentially a way to pack a few values together in a
-        record-like structure. Haskell's built-in tuple types are also defined 
-        as single constructor algebraic types (but with a bit of
-        syntactic sugar). An example of a single constructor type with 
-        multiple fields is the following pair of integers:
+        fields allow values to be packed together in a record-like structure. 
+        Haskell's built-in tuple types are also defined as single constructor 
+        algebraic types (using a bit of syntactic sugar). An example of a 
+        single constructor type with multiple fields is the following pair of 
+        integers:
          \begin{code}
          data IntPair = IntPair Int Int
          \end{code}
@@ -916,12 +932,11 @@ eventual netlist representation is also highlighted.
          % for every field in the constructor.
        \item[\bf{No fields}]
          Algebraic datatypes with multiple constructors, but without any
-        fields are essentially a way to get an enumeration-like type
-        containing alternatives. Note that Haskell's \hs{Bool} type is also 
-        defined as an enumeration type, but that there is a fixed translation 
-        for that type within the \CLaSH\ compiler. An example of such an 
-        enumeration type is the type that represents the colors in a traffic 
-        light:
+        fields are essentially enumeration types. Note that Haskell's 
+        \hs{Bool} type is also defined as an enumeration type, but that there 
+        is a fixed translation for that type within the \CLaSH\ compiler. An 
+        example of an enumeration type definition is the definition for a 
+        traffic light:
          \begin{code}
          data TrafficLight = Red | Orange | Green
          \end{code}
@@ -932,7 +947,8 @@ eventual netlist representation is also highlighted.
        \item[\bf{Multiple constructors with fields}]
          Algebraic datatypes with multiple constructors, where at least
          one of these constructors has one or more fields are currently not 
-        supported.
+        supported. Additional research is required to allow for the overlap of
+        the fields belonging to the different constructors.
      \end{xlist}
  
    \subsection{Polymorphism}\label{sec:polymorhpism}
@@ -944,51 +960,64 @@ eventual netlist representation is also highlighted.
      any number of new types.
  
      As an example of a parametric polymorphic function, consider the type of 
-    the following \hs{append} function, which appends an element to a
-    vector:\footnote{The \hs{::} operator is used to annotate a function
+    the following \hs{first} function, which returns the first element of a 
+    tuple:\footnote{The \hs{::} operator is used to annotate a function
      with its type.}
      
      \begin{code}
-    append :: [a|n] -> a -> [a|n + 1]
+    first :: (a,b) -> a
      \end{code}
  
-    This type is parameterized by \hs{a}, which can contain any type at
-    all. This means that \hs{append} can append an element to a vector,
-    regardless of the type of the elements in the list (as long as the type of 
-    the value to be added is of the same type as the values in the vector). 
-    This kind of polymorphism is extremely useful in hardware designs to make 
-    operations work on a vector without knowing exactly what elements are 
-    inside, routing signals without knowing exactly what kinds of signals 
-    these are, or working with a vector without knowing exactly how long it 
-    is. Polymorphism also plays an important role in most higher order 
-    functions, as we will see in the next section.
+    This type is parameterized in both \hs{a} and \hs{b}, which can both 
+    represent any type at all (as long as that type is supported by the 
+    \CLaSH\ compiler). This means that \hs{first} works for any tuple, 
+    regardless of what elements it contains. This kind of polymorphism is 
+    extremely useful in hardware designs, for example when routing signals 
+    without knowing their exact type, or specifying vector operations that 
+    work on vectors of any length and element type. Polymorphism also plays an 
+    important role in most higher order functions, as will be shown in the 
+    next section.
  
      Another type of polymorphism is \emph{ad-hoc 
      polymorphism}~\cite{polymorphism}, which refers to polymorphic 
      functions which can be applied to arguments of different types, but which 
      behave differently depending on the type of the argument to which they are 
      applied. In Haskell, ad-hoc polymorphism is achieved through the use of 
-    type classes, where a class definition provides the general interface of a 
-    function, and class instances define the functionality for the specific 
-    types. An example of such a type class is the \hs{Num} class, which 
-    contains all of Haskell's numerical operations. A designer can make use 
-    of this ad-hoc polymorphism by adding a constraint to a parametrically 
-    polymorphic type variable. Such a constraint indicates that the type 
-    variable can only be instantiated to a type whose members supports the 
-    overloaded functions associated with the type class. 
+    \emph{type classes}, where a class definition provides the general 
+    interface of a function, and class \emph{instances} define the 
+    functionality for the specific types. An example of such a type class is 
+    the \hs{Num} class, which contains all of Haskell's numerical operations. 
+    A designer can make use of this ad-hoc polymorphism by adding a 
+    \emph{constraint} to a parametrically polymorphic type variable. Such a 
+    constraint indicates that the type variable can only be instantiated to a 
+    type whose members supports the overloaded functions associated with the 
+    type class. 
      
-    As an example we will take a look at type signature of the function 
-    \hs{sum}, which sums the values in a vector:
+    An example of a type signature that includes such a constraint if the 
+    signature of the \hs{sum} function, which sums the values in a vector:
      \begin{code}
      sum :: Num a => [a|n] -> a
      \end{code}
  
      This type is again parameterized by \hs{a}, but it can only contain
      types that are \emph{instances} of the \emph{type class} \hs{Num}, so that  
-    we know that the addition (+) operator is defined for that type. 
-    \CLaSH's built-in numerical types are also instances of the \hs{Num}
-    class, so we can use the addition operator (and thus the \hs{sum}
-    function) with \hs{Signed} as well as with \hs{Unsigned}.
+    the compiler knows that the addition (+) operator is defined for that 
+    type.
+    
+    A place where class constraints also play a role is in the size and range 
+    parameters of the \hs{Vector} and numeric types. The reason being that 
+    these parameters have to be limited to types that can represent 
+    \emph{natural} numbers. This constraint will also be reflected in any of 
+    the functions that work these types. The complete type of for example the 
+    \hs{Vector} type is:
+    \begin{code}
+    Natural n => Vector n a
+    \end{code}    
+    
+    % \CLaSH's built-in numerical types are also instances of the \hs{Num} 
+    % class. 
+    % so we can use the addition operator (and thus the \hs{sum}
+    % function) with \hs{Signed} as well as with \hs{Unsigned}.
  
      \CLaSH\ supports both parametric polymorphism and ad-hoc polymorphism. Any 
      function defined can have any number of unconstrained type parameters. A
@@ -996,12 +1025,14 @@ eventual netlist representation is also highlighted.
      instances. The \CLaSH\ compiler will infer the type of every polymorphic 
      argument depending on how the function is applied. There is however one 
      constraint: the top level function that is being translated can not have 
-    any polymorphic arguments. The arguments can not be polymorphic as the 
-    function is never applied and consequently there is no way to determine 
-    the actual types for the type parameters. The members of some standard 
-    Haskell type classes are supported as built-in functions, including: 
-    \hs{Num} for numerical operations, \hs{Eq} for the equality operators, and 
-    \hs{Ord} for the comparison/order operators.
+    any polymorphic arguments. The arguments of the top-level can not be 
+    polymorphic as the function is never applied and consequently there is no 
+    way to determine the actual types for the type parameters. 
+    
+    With regard to the built-in types, it should be noted that members of 
+    some of the standard Haskell type classes are supported as built-in 
+    functions. These include: the numerial operators of \hs{Num}, the equality 
+    operators of \hs{Eq}, and the comparison/order operators of \hs{Ord}.
  
    \subsection{Higher-order functions \& values}
      Another powerful abstraction mechanism in functional languages, is
@@ -1037,11 +1068,11 @@ eventual netlist representation is also highlighted.
      type as the first argument of the function passed to \hs{map}. The element 
      type of the resulting vector is equal to the return type of the function 
      passed, which need not necessarily be the same as the element type of the 
-    input vector. All of these characteristics  can readily be inferred from 
-    the type signature belonging to \hs{map}:
+    input vector. All of these characteristics can be inferred from the type 
+    signature belonging to \hs{map}:
  
      \begin{code}
-    map :: (a -> b) -> [a|n] -> [b|n]
+    map :: Natural n => (a -> b) -> [a|n] -> [b|n]
      \end{code}
  
      So far, only functions have been used as higher-order values. In
@@ -1082,24 +1113,22 @@ eventual netlist representation is also highlighted.
        \end{example}
      \end{minipage}
  
-    Finally, not only built-in functions can have higher order
-    arguments, but any function defined in \CLaSH\ may have functions as
-    arguments. This allows the hardware designer to use a powerful
-    abstraction mechanism in his designs and have an optimal amount of
-    code reuse. The only exception is again the top-level function: if a 
-    function-typed argument is not applied with an actual function, no 
-    hardware can be generated.    
+    Finally, not only built-in functions can have higher order arguments (such 
+    as the \hs{map} function), but any function defined in \CLaSH\ may have 
+    functions as arguments. This allows the circuit designer to use a 
+    powerful amount of code reuse. The only exception is again the top-level 
+    function: if a function-typed argument is not applied with an actual 
+    function, no hardware can be generated.    
  
      % \comment{TODO: Describe ALU example (no code)}
  
    \subsection{State}
-    A very important concept in hardware is the concept of state. In a 
-    stateful design, the outputs depend on the history of the inputs, or the 
-    state. State is usually stored in registers, which retain their value 
+    In a stateful design, the outputs depend on the history of the inputs, or 
+    the state. State is usually stored in registers, which retain their value 
      during a clock cycle. As we want to describe more than simple 
      combinational designs, \CLaSH\ needs an abstraction mechanism for state.
  
-    An important property in Haskell, and in most other functional languages, 
+    An important property in Haskell, and in many other functional languages, 
      is \emph{purity}. A function is said to be \emph{pure} if it satisfies two
      conditions:
      \begin{inparaenum}
@@ -1186,9 +1215,10 @@ eventual netlist representation is also highlighted.
      As the \hs{run} function, the hardware description, and the test 
      inputs are also valid Haskell, the complete simulation can be compiled to 
      an executable binary by an optimizing Haskell compiler, or executed in an 
-    Haskell interpreter. Both simulation paths are much faster than first 
-    translating the description to \VHDL\ and then running a \VHDL\ 
-    simulation.
+    Haskell interpreter. Both simulation paths require less effort from a 
+    circuit designer than first translating the description to \VHDL\ and then 
+    running a \VHDL\ simulation; it is also very likely that both simulation 
+    paths are much faster.
      
  \section{The \CLaSH\ compiler}
  An important aspect in this research is the creation of the prototype 
@@ -1212,7 +1242,7 @@ front-end of the prototype compiler pipeline, as seen in
  \end{figure}
  
  The output of the \GHC\ front-end consists of the translation of the original 
-Haskell description in \emph{Core}~\cite{Sulzmann2007}, which is a smaller, 
+Haskell description to \emph{Core}~\cite{Sulzmann2007}, which is a smaller, 
  typed, functional language. This \emph{Core} language is relatively easy to 
  process compared to the larger Haskell language. A description in \emph{Core} 
  can still contain elements which have no direct translation to hardware, such 
@@ -1231,8 +1261,10 @@ first-order functions, and specializing polymorphic types to concrete types.
  The final step in the compiler pipeline is the translation to a \VHDL\ 
  \emph{netlist}, which is a straightforward process due to resemblance of a 
  normalized description and a set of concurrent signal assignments. We call the 
-end-product of the \CLaSH\ compiler a \VHDL\ \emph{netlist} as the resulting 
-\VHDL\ resembles an actual netlist description and not idiomatic \VHDL.
+end-product of the \CLaSH\ compiler a \VHDL\ \emph{netlist} as the result
+resembles an actual netlist description, and the fact that it is \VHDL\ is 
+only an implementation detail; the output could for example also be in 
+Verilog.
  
  \section{Use cases}
  \label{sec:usecases}
@@ -1261,7 +1293,7 @@ using higher-order functions:
  \hspace{-1.7em}
  \begin{minipage}{0.93\linewidth}
  \begin{code}
-as *+* bs = foldl1 (+) (zipWith (*) as bs)
+as *+* bs = fold (+) (zipWith (*) as bs)
  \end{code}
  \end{minipage}
  \begin{minipage}{0.07\linewidth}
@@ -1275,13 +1307,13 @@ earlier: It takes a function, two vectors, and then applies the function to
  each of the elements in the two vectors pairwise (\emph{e.g.}, \hs{zipWith (*) 
  [1, 2] [3, 4]} becomes \hs{[1 * 3, 2 * 4]}).
  
-The \hs{foldl1} function takes a binary function, a single vector, and applies 
+The \hs{fold} function takes a binary function, a single vector, and applies 
  the function to the first two elements of the vector. It then applies the
  function to the result of the first application and the next element in the 
  vector. This continues until the end of the vector is reached. The result of 
-the \hs{foldl1} function is the result of the last application. It is obvious 
+the \hs{fold} function is the result of the last application. It is obvious 
  that the \hs{zipWith (*)} function is pairwise multiplication and that the 
-\hs{foldl1 (+)} function is summation.
+\hs{fold (+)} function is summation.
  % Returning to the actual \acro{FIR} filter, we will slightly change the 
  % equation describing it, so as to make the translation to code more obvious and 
  % concise. What we do is change the definition of the vector of input samples 
@@ -1301,7 +1333,7 @@ The complete definition of the \acro{FIR} filter in code then becomes:
  \begin{minipage}{0.93\linewidth}
  \begin{code}
  fir (State (xs,hs)) x = 
-  (State (x >> xs,hs), (x +> xs) *+* hs)
+  (State (shiftInto x xs,hs), (x +> xs) *+* hs)
  \end{code}
  \end{minipage}
  \begin{minipage}{0.07\linewidth}
@@ -1314,14 +1346,14 @@ Where the vector \hs{xs} contains the previous input samples, the vector
  \hs{hs} contains the \acro{FIR} coefficients, and \hs{x} is the current input 
  sample. The concatenate operator (\hs{+>}) creates a new vector by placing the 
  current sample (\hs{x}) in front of the previous samples vector (\hs{xs}). The 
-code for the shift (\hs{>>}) operator, that adds the new input sample (\hs{x}) 
+code for the \hs{shiftInto} function, that adds the new input sample (\hs{x}) 
  to the list of previous input samples (\hs{xs}) and removes the oldest sample, 
  is shown below:
  
  \hspace{-1.7em}
  \begin{minipage}{0.93\linewidth}
  \begin{code}
-x >> xs = x +> init xs  
+shiftInto x xs = x +> init xs  
  \end{code}
  \end{minipage}
  \begin{minipage}{0.07\linewidth}
@@ -1416,11 +1448,12 @@ compared to the others.
  The vector \hs{inputs} is the set of data sources, which is passed to
  each function unit as a set of possible operants. The \acro{CPU} also receives 
  a vector of address pairs, which are used by each function unit to select 
-their operand. The application of the function units to the \hs{inputs} and
-\hs{addrs} arguments seems quite repetitive and could be rewritten to use
-a combination of the \hs{map} and \hs{zipwith} functions instead.
-However, the prototype compiler does not currently support working with lists 
-of functions, so a more explicit version of the code is given instead.
+their operand. 
+% The application of the function units to the \hs{inputs} and
+% \hs{addrs} arguments seems quite repetitive and could be rewritten to use
+% a combination of the \hs{map} and \hs{zipwith} functions instead.
+% However, the prototype compiler does not currently support working with 
+% lists of functions, so a more explicit version of the code is given instead.
  
  \hspace{-1.7em}
  \begin{minipage}{0.93\linewidth}
@@ -1437,7 +1470,7 @@ cpu (State s) input addrs opc = (State s', out)
                , fu mul            inputs (addrs!3)
                ]
      inputs    =   0 +> (1 +> (input +> s))
-    out       =   head s'
+    out       =   last s
  \end{code}
  \end{minipage}
  \begin{minipage}{0.07\linewidth}
@@ -1454,12 +1487,21 @@ This section describes the features of existing (functional) hardware
  description languages and highlights the advantages that this research has 
  over existing work.
  
-Many functional hardware description languages have been developed over the 
-years. Early work includes such languages as $\mu$\acro{FP}~\cite{muFP}, an 
-extension of Backus' \acro{FP} language to synchronous streams, designed 
-particularly for describing and reasoning about regular circuits. The 
-Ruby~\cite{Ruby} language uses relations, instead of functions, to describe 
-circuits, and has a particular focus on layout. 
+% Many functional hardware description languages have been developed over the 
+% years. Early work includes such languages as $\mu$\acro{FP}~\cite{muFP}, an 
+% extension of Backus' \acro{FP} language to synchronous streams, designed 
+% particularly for describing and reasoning about regular circuits. The 
+% Ruby~\cite{Ruby} language uses relations, instead of functions, to describe 
+% circuits, and has a particular focus on layout. 
+
+\acro{HML}~\cite{HML2} is a hardware modeling language based on the strict 
+functional language \acro{ML}, and has support for polymorphic types and 
+higher-order functions. There is no direct simulation support for \acro{HML}, 
+so a description in \acro{HML} has to be translated to \VHDL\ and that the 
+translated description can then be simulated in a \VHDL\ simulator. Certain 
+aspects of HML, such as higher-order functions are however not supported by 
+the \VHDL\ translator~\cite{HML3}. The \CLaSH\ compiler on the other hand can 
+correctly translate all of the language constructs mentioned in this paper.
  
  \begin{figure}
  \centerline{\includegraphics{highordcpu.svg}}
@@ -1468,22 +1510,13 @@ circuits, and has a particular focus on layout.
  \vspace{-1.5em}
  \end{figure}
  
-\acro{HML}~\cite{HML2} is a hardware modeling language based on the strict 
-functional language \acro{ML}, and has support for polymorphic types and 
-higher-order functions. Published work suggests that there is no direct 
-simulation support for \acro{HML}, but that a description in \acro{HML} has to 
-be translated to \VHDL\ and that the translated description can then be 
-simulated in a \VHDL\ simulator. Certain aspects of HML, such as higher-order
-functions are however not supported by the \VHDL\ translator~\cite{HML3}. The 
-\CLaSH\ compiler on the other hand can correctly translate all of the language 
-constructs mentioned in this paper. % to a netlist format.
-
-Like the work presented in this paper, many functional hardware description 
-languages have some sort of foundation in the functional programming language 
-Haskell. Hawk~\cite{Hawk1} uses Haskell to describe system-level executable 
-specifications used to model the behavior of superscalar microprocessors. Hawk 
-specifications can be simulated; to the best knowledge of the authors there is 
-however no support for automated circuit synthesis. 
+Like the research presented in this paper, many functional hardware 
+description languages have some sort of foundation in the functional 
+programming language Haskell. Hawk~\cite{Hawk1} is a hardware modeling 
+language embedded in Haskell and has sequential environments that make it 
+easier to specify stateful computation. Hawk specifications can be simulated; 
+to the best knowledge of the authors there is however no support for automated 
+circuit synthesis. 
  
  The ForSyDe~\cite{ForSyDe2} system uses Haskell to specify abstract system 
  models. A designer can model systems using heterogeneous models of 
@@ -1492,35 +1525,42 @@ computation. Using so-called domain interfaces a designer can simulate
  electronic systems which have both analog as digital parts. ForSyDe has 
  several backends including simulation and automated synthesis, though 
  automated synthesis is restricted to the synchronous model of computation. 
-Unlike \CLaSH\ there is no support for the automated synthesis of descriptions 
-that contain polymorphism or higher-order functions.
-
-Lava~\cite{Lava} is a hardware description language that focuses on the 
-structural representation of hardware. Besides support for simulation and 
-circuit synthesis, Lava descriptions can be interfaced with formal method 
-tools for formal verification. Lava descriptions are actually circuit 
-generators when viewed from a synthesis viewpoint, in that the language 
-elements of Haskell, such as choice, can be used to guide the circuit 
-generation. If a developer wants to insert a choice element inside an actual 
-circuit he will have to explicitly instantiate a multiplexer-like component. 
-
-In this respect \CLaSH\ differs from Lava, in that all the choice elements, 
-such as case-statements and pattern matching, are synthesized to choice 
-elements in the eventual circuit. As such, richer control structures can both 
-be specified and synthesized in \CLaSH\ compared to any of the embedded 
-languages, such as: Hawk, ForSyDe, or Lava.
-
-The merits of polymorphic typing, combined with higher-order functions, are 
-now also recognized in the `main-stream' hardware description languages, 
-exemplified by the new \VHDL-2008 standard~\cite{VHDL2008}. \VHDL-2008 support 
-for generics has been extended to types and subprograms, allowing a developer 
-to describe components with polymorphic ports and function-valued arguments. 
-Note that the types and subprograms still require an explicit generic map, 
-whereas types can be automatically inferred, and function-values can be 
-automatically propagated by the \CLaSH\ compiler. There are also no (generally 
-available) \VHDL\ synthesis tools that currently support the \VHDL-2008 
-standard, and thus the synthesis of polymorphic types and function-valued 
-arguments.
+Though ForSyDe offers higher-order functions and polymorphism, ForSyDe's 
+choice elements are limited to \hs{if} and \hs{case} expressions. ForSyDe's 
+explicit conversions, where function have to be wrapped in processes and 
+processes have to be wrapped in systems, combined with the explicit 
+instantiations of components, also makes ForSyDe more verbose than \CLaSH.
+
+Lava~\cite{Lava} is a hardware description language, embedded in Haskell, and 
+focuses on the structural representation of hardware. Like \CLaSH, Lava has 
+support for polymorphic types and higher-order functions. Besides support for 
+simulation and circuit synthesis, Lava descriptions can be interfaced with 
+formal method tools for formal verification. As discussed in the introduction, 
+taking the embedded language approach does not allow for Haskell's choice 
+elements to be captured within the circuit descriptions. In this respect 
+\CLaSH\ differs from Lava, in that all of Haskell's choice elements, such as 
+\hs{case}-expressions and pattern matching, are synthesized to choice elements 
+in the eventual circuit. Consequently, descriptions containing rich control 
+structures can be specified in a more user-friendly way in \CLaSH\ than possible within Lava, and are hence less error-prone.
+
+Bluespec~\cite{Bluespec} is a high-level synthesis language that features 
+guarded atomic transactions and allows for the automated derivation of control 
+structures based on these atomic transactions. Bluespec, like \CLaSH, supports 
+polymorphic typing and function-valued arguments. Bluespec's syntax and 
+language features \emph{had} their basis in Haskell. However, in order to 
+appeal to the users of the traditional \acrop{HDL}, Bluespec has adapted 
+imperative features and a syntax that resembles Verilog. As a result, Bluespec 
+is (unnecessarily) verbose when compared to \CLaSH.
+
+The merits of polymorphic typing and function-valued arguments are now also 
+recognized in the traditional \acrop{HDL}, exemplified by the new \VHDL-2008 
+standard~\cite{VHDL2008}. \VHDL-2008 support for generics has been extended to 
+types and subprograms, allowing a designer to describe components with 
+polymorphic ports and function-valued arguments. Note that the types and 
+subprograms still require an explicit generic map, whereas types can be 
+automatically inferred, and function-values can be automatically propagated 
+by the \CLaSH\ compiler. There are also no (generally available) \VHDL\ 
+synthesis tools that currently support the \VHDL-2008 standard.
  
  % Wired~\cite{Wired},, T-Ruby~\cite{T-Ruby}, Hydra~\cite{Hydra}. 
  % 
@@ -1633,10 +1673,10 @@ languages do not offer.
  \section{Future Work}
  The choice of describing state explicitly as extra arguments and results can 
  be seen as a mixed blessing. Even though the description that use state are 
-usually very clear, one finds that dealing with unpacking, passing, receiving 
-and repacking can become tedious and even error-prone, especially in the case 
-of sub-states. Removing this boilerplate, or finding a more suitable 
-abstraction mechanism would make \CLaSH\ easier to use.
+usually very clear, one finds that distributing and collecting substate can 
+become tedious and even error-prone. Removing the required boilerplate for 
+distribution and collection, or finding a more suitable abstraction mechanism 
+for state would make \CLaSH\ easier to use.
  
  The transformations in normalization phase of the prototype compiler were 
  developed in an ad-hoc manner, which makes the existence of many desirable