X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=blobdiff_plain;f=Chapters%2FPrototype.tex;h=1d87eeb50ee1cc49644ba3d787cadc5d7947c1df;hp=9e6893acccee0ff3d319545c78064c8a4365d5ae;hb=2ff5f767f17203c764a8ec9ef6711b234c1deb6d;hpb=6b779650796b6ef5c72ea261238f8576b049d925

diff --git a/Chapters/Prototype.tex b/Chapters/Prototype.tex
index 9e6893a..1d87eeb 100644
--- a/Chapters/Prototype.tex
+++ b/Chapters/Prototype.tex
@@ -11,56 +11,156 @@
   has gone through a number of design iterations which we will not completely
   describe here.
 
-  \section{Choice of language}
-    When implementing this prototype, the first question to ask is: What
-    (functional) language will we use to describe our hardware? (Note that
-    this does not concern the \emph{implementation language} of the compiler,
-    just the language \emph{translated by} the compiler).
+  \section[sec:prototype:input]{Input language}
+    When implementing this prototype, the first question to ask is:
+    Which (functional) language will be used to describe our hardware?
+    (Note that this does not concern the \emph{implementation language}
+    of the compiler, just the language \emph{translated by} the
+    compiler).
 
-    On the highest level, we have two choices:
+    Initially, we have two choices:
 
     \startitemize
       \item Create a new functional language from scratch. This has the
       advantage of having a language that contains exactly those elements that
       are convenient for describing hardware and can contain special
-      constructs that might.
-      \item Use an existing language and create a new backend for it. This has
+      constructs that allows our hardware descriptions to be more powerful or
+      concise.
+      \item Use an existing language and create a new back-end for it. This has
       the advantage that existing tools can be reused, which will speed up
       development.
     \stopitemize
 
+
+    \placeintermezzo{}{
+      \startframedtext[width=8cm,background=box,frame=no]
+      \startalignment[center]
+        {\tfa No \small{EDSL} or Template Haskell}
+      \stopalignment
+      \blank[medium]
+
+      Note that in this consideration, embedded domain-specific
+      languages (\small{EDSL}) and Template Haskell (\small{TH})
+      approaches have not been included. As we have seen in
+      \in{section}[sec:context:fhdls], these approaches have all kinds
+      of limitations on the description language that we would like to
+      avoid.
+      \stopframedtext
+    }
     Considering that we required a prototype which should be working quickly,
     and that implementing parsers, semantic checkers and especially
-    typcheckers isn't exactly the core of this research (but it is lots and
-    lots of work!), using an existing language is the obvious choice. This
+    type-checkers is not exactly the core of this research (but it is lots and
+    lots of work, using an existing language is the obvious choice. This
     also has the advantage that a large set of language features is available
     to experiment with and it is easy to find which features apply well and
-    which don't. A possible second prototype could use a custom language with
-    just the useful features (and possibly extra features that are specific to
+    which do not. Another important advantage of using an existing language, is
+    that simulation of the code becomes trivial. Since there are existing
+    compilers and interpreters that can run the hardware description directly,
+    it can be simulated without also having to write an interpreter for the
+    new language.
+    
+    A possible second prototype could use a custom language with just the useful
+    features (and possibly extra features that are specific to
     the domain of hardware description as well).
 
-    The second choice is to pick one of the many existing languages. As
-    mentioned before, this language is Haskell.  This choice has not been the
+    The second choice to be made is which of the many existing languages to use. As
+    mentioned before, the chosen language is Haskell.  This choice has not been the
     result of a thorough comparison of languages, for the simple reason that
     the requirements on the language were completely unclear at the start of
-    this language. The fact that Haskell is a language with a broad spectrum
+    this research. The fact that Haskell is a language with a broad spectrum
     of features, that it is commonly used in research projects and that the
-    primary compiler, GHC, provides a high level API to its internals, made
+    primary compiler, \GHC, provides a high level API to its internals, made
     Haskell an obvious choice.
 
-    TODO: Was Haskell really a good choice? Perhaps say this somewhere else?
+  \section[sec:prototype:output]{Output format}
+    The second important question is: what will be our output format?
+    This output format should at least allow for programming the
+    hardware design into a field-programmable gate array (\small{FPGA}).
+    The choice of output format is thus limited by what hardware
+    synthesis and programming tools can process.
+
+    Looking at other tools in the industry, the Electronic Design Interchange
+    Format (\small{EDIF}) is commonly used for storing intermediate
+    \emph{netlists} (lists of components and connections between these
+    components) and is commonly the target for \small{VHDL} and Verilog
+    compilers.
+
+    However, \small{EDIF} is not completely tool-independent. It specifies a
+    meta-format, but the hardware components that can be used vary between
+    various tool and hardware vendors, as well as the interpretation of the
+    \small{EDIF} standard. \cite[li89]
+   
+    This means that when working with \small{EDIF}, our prototype would become
+    technology dependent (\eg\ only work with \small{FPGA}s of a specific
+    vendor, or even only with specific chips). This limits the applicability
+    of our prototype. Also, the tools we would like to use for verifying,
+    simulating and draw pretty pictures of our output (like Precision, or
+    QuestaSim) are designed for \small{VHDL} or Verilog input.
+
+    For these reasons, we will not use \small{EDIF}, but \small{VHDL} as our
+    output language.  We choose \VHDL\ over Verilog simply because we are
+    familiar with \small{VHDL} already. The differences between \small{VHDL}
+    and Verilog are on the higher level, while we will be using \small{VHDL}
+    mainly to write low level, netlist-like descriptions anyway.
+
+    An added advantage of using \VHDL\ is that we can profit from existing
+    optimizations in \VHDL\ synthesizers. A lot of optimizations are done on the
+    \VHDL\ level by existing tools. These tools have been under
+    development for years, so it would not be reasonable to assume we
+    could achieve a similar amount of optimization in our prototype (nor
+    should it be a goal, considering this is just a prototype).
+
+    \placeintermezzo{}{
+      \startframedtext[width=8cm,background=box,frame=no]
+      \startalignment[center]
+        {\tfa Translation vs. compilation vs. synthesis}
+      \stopalignment
+      \blank[medium]
+        In this thesis the words \emph{translation}, \emph{compilation} and
+        sometimes \emph{synthesis} will be used interchangeably to refer to the
+        process of translating the hardware description from the Haskell
+        language to the \VHDL\ language.
+
+        Similarly, the prototype created is referred to as both the
+        \emph{translator} as well as the \emph{compiler}.
+
+        The final part of this process is usually referred to as \emph{\VHDL\
+        generation}.
+      \stopframedtext
+    }
+
+    Note that we will be using \small{VHDL} as our output language, but will
+    not use its full expressive power. Our output will be limited to using
+    simple, structural descriptions, without any complex behavioral
+    descriptions like arbitrary sequential statements (which might not
+    be supported by all tools). This ensures that any tool that works
+    with \VHDL\ will understand our output. This also leaves open the
+    option to switch to \small{EDIF} in the future, with minimal changes
+    to the prototype.
+
+  \section{Simulation and synthesis}
+    As mentioned above, by using the Haskell language, we get simulation of
+    our hardware descriptions almost for free. The only thing that is needed
+    is to provide a Haskell implementation of all built-in functions that can
+    be used by the Haskell interpreter to simulate them.
 
-  \section{Prototype design}
-    As stated above, we will use the Glasgow Haskell Compiler (\small{GHC}) to
+    The main topic of this thesis is therefore the path from the Haskell
+    hardware descriptions to \small{FPGA} synthesis, focusing of course on the
+    \VHDL\ generation. Since the \VHDL\ generation process preserves the meaning
+    of the Haskell description exactly, any simulation done in Haskell
+    \emph{should} produce identical results as the synthesized hardware.
+
+  \section[sec:prototype:design]{Prototype design}
+    As suggested above, we will use the Glasgow Haskell Compiler (\small{GHC}) to
     implement our prototype compiler. To understand the design of the
-    compiler, we will first dive into the \small{GHC} compiler a bit. It's
-    compilation consists of the following steps (slightly simplified):
+    prototype, we will first dive into the \small{GHC} compiler a bit. Its
+    compilatprototype consists of the following steps (slightly simplified):
 
     \startuseMPgraphic{ghc-pipeline}
       % Create objects
       save inp, front, desugar, simpl, back, out;
       newEmptyBox.inp(0,0);
-      newBox.front(btex Parser etex);
+      newBox.front(btex Frontend etex);
       newBox.desugar(btex Desugarer etex);
       newBox.simpl(btex Simplifier etex);
       newBox.back(btex Backend etex);
@@ -87,7 +187,7 @@
       % Draw the objects (and deferred labels)
       drawObj (inp, front, desugar, simpl, back, out);
     \stopuseMPgraphic
-    \placefigure[right]{GHC compiler pipeline}{\useMPgraphic{ghc-pipeline}}
+    \placefigure[right]{GHC compiler pipeline}{\startboxed \useMPgraphic{ghc-pipeline}\stopboxed}
 
     \startdesc{Frontend}
       This step takes the Haskell source files and parses them into an
@@ -95,61 +195,66 @@
       complete Haskell language and is thus a very complex one (in contrast
       with the Core \small{AST}, later on). All identifiers in this
       \small{AST} are resolved by the renamer and all types are checked by the
-      typechecker.
+      type-checker.
     \stopdesc
     \startdesc{Desugaring}
-      This steps takes the full \small{AST} and translates it to the
+      This step takes the full \small{AST} and translates it to the
       \emph{Core} language. Core is a very small functional language with lazy
       semantics, that can still express everything Haskell can express. Its
       simpleness makes Core very suitable for further simplification and
-      translation. Core is the language we will be working on as well.
+      translation. Core is the language we will be working with as well.
     \stopdesc
     \startdesc{Simplification}
       Through a number of simplification steps (such as inlining, common
-      subexpression elimination, etc.) the Core program is simplified to make
+      sub-expression elimination, etc.) the Core program is simplified to make
       it faster or easier to process further.
     \stopdesc
     \startdesc{Backend}
       This step takes the simplified Core program and generates an actual
       runnable program for it. This is a big and complicated step we will not
-      discuss it any further, since it is not required for our prototype.
+      discuss it any further, since it is not relevant to our prototype.
     \stopdesc
 
-    In this process, there a number of places where we can start our work.
-    Assuming that we don't want to deal with (or modify) parsing, typechecking
-    and other frontend business and that native code isn't really a useful
+    In this process, there are a number of places where we can start our work.
+    Assuming that we do not want to deal with (or modify) parsing, type-checking
+    and other front end business and that native code is not really a useful
     format anymore, we are left with the choice between the full Haskell
-    \small{AST}, or the smaller (simplified) core representation.
+    \small{AST}, or the smaller (simplified) Core representation.
 
     The advantage of taking the full \small{AST} is that the exact structure
     of the source program is preserved. We can see exactly what the hardware
-    descriiption looks like and which syntax constructs were used. However,
-    the full \small{AST} is a very complicated datastructure. If we are to
+    description looks like and which syntax constructs were used. However,
+    the full \small{AST} is a very complicated data-structure. If we are to
     handle everything it offers, we will quickly get a big compiler.
 
-    Using the core representation gives us a much more compact datastructure
-    (a core expression only uses 9 constructors). Note that this does not mean
-    that the core representation itself is smaller, on the contrary. Since the
-    core language has less constructs, a lot of things will take a larger
-    expression to express.
+    Using the Core representation gives us a much more compact data-structure
+    (a Core expression only uses 9 constructors). Note that this does not mean
+    that the Core representation itself is smaller, on the contrary.
+    Since the Core language has less constructs, most Core expressions
+    are larger than the equivalent versions in Haskell.
 
-    However, the fact that the core language is so much smaller, means it is a
+    However, the fact that the Core language is so much smaller, means it is a
     lot easier to analyze and translate it into something else. For the same
-    reason, \small{GHC} runs its simplifications and optimizations on the core
-    representation as well.
+    reason, \small{GHC} runs its simplifications and optimizations on the Core
+    representation as well \cite[jones96].
 
-    However, we will use the normal core representation, not the simplified
-    core. Reasons for this are detailed below.
-    
-    The final prototype roughly consists of three steps:
+    We will use the normal Core representation, not the simplified Core. Even
+    though the simplified Core version is an equivalent, but simpler
+    definition, some problems were encountered with it in practice. The
+    simplifier restructures some (stateful) functions in a way the normalizer
+    and the \VHDL\ generation cannot handle, leading to uncompilable programs
+    (whereas the non-simplified version more closely resembles the original
+    program, allowing the original to be written in a way that can be
+    handled). This problem is further discussed in
+    \in{section}[sec:normalization:stateproblems].
     
-    \startuseMPgraphic{ghc-pipeline}
+    \startuseMPgraphic{clash-pipeline}
       % Create objects
       save inp, front, norm, vhdl, out;
       newEmptyBox.inp(0,0);
-      newBox.front(btex \small{GHC} frontend + desugarer etex);
+      newBox.front(btex \small{GHC} front-end etex);
       newBox.norm(btex Normalization etex);
-      newBox.vhdl(btex VHDL generation etex);
+      newBox.vhdl(btex \small{VHDL} generation etex);
       newEmptyBox.out(0,0);
 
       % Space the boxes evenly
@@ -166,28 +271,31 @@
       ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
       ObjLabel.front(btex Core etex) "labpathname(core)", "labdir(rt)";
       ObjLabel.norm(btex Normalized core etex) "labpathname(normal)", "labdir(rt)";
-      ObjLabel.vhdl(btex VHDL description etex) "labpathname(vhdl)", "labdir(rt)";
+      ObjLabel.vhdl(btex \small{VHDL} description etex) "labpathname(vhdl)", "labdir(rt)";
 
       % Draw the objects (and deferred labels)
       drawObj (inp, front, norm, vhdl, out);
     \stopuseMPgraphic
-    \placefigure[right]{GHC compiler pipeline}{\useMPgraphic{ghc-pipeline}}
+    \placefigure[right]{CÎ»ash compiler pipeline}{\startboxed \useMPgraphic{clash-pipeline}\stopboxed}
+
+    The final prototype roughly consists of three steps:
 
+    \page[no] % suppress page break here.
     \startdesc{Frontend}
-      This is exactly the frontend and desugarer from the \small{GHC}
-      pipeline, that translates Haskell sources to a core representation.
+      This is exactly the front-end from the \small{GHC} pipeline, that
+      translates Haskell sources to a typed Core representation.
     \stopdesc
     \startdesc{Normalization}
-      This is a step that transforms the core representation into a normal
-      form. This normal form is still expressed in the core language, but has
-      to adhere to an extra set of constraints. This normal form is less
-      expressive than the full core language (e.g., it can have limited higher
-      order expressions, has a specific structure, etc.), but is also very
-      close to directly describing hardware.
-    \stopdesc
-    \startdesc{VHDL generation}
-      The last step takes the normal formed core representation and generates
-      VHDL for it. Since the normal form has a specific, hardware-like
+      This is a step that transforms the Core representation into a normal
+      form. This normal form is still expressed in the Core language, but has
+      to adhere to an additional set of constraints. This normal form is less
+      expressive than the full Core language (e.g., it can have limited 
+      higher-order expressions, has a specific structure, etc.), but is
+      also very close to directly describing hardware.
+    \stopdesc
+    \startdesc{\small{VHDL} generation}
+      The last step takes the normal formed Core representation and generates
+      \small{VHDL} for it. Since the normal form has a specific, hardware-like
       structure, this final step is very straightforward.
     \stopdesc
     
@@ -196,15 +304,1160 @@
     hardware interpretation, are removed and translated into hardware
     constructs. This step is described in a lot of detail at
     \in{chapter}[chap:normalization].
+
+    
+    \defref{entry function}Translation of a hardware description always
+    starts at a single function, which is referred to as the \emph{entry
+    function}. \VHDL\ is generated for this function first, followed by
+    any functions used by the entry functions (recursively).
+    
+  \section[sec:prototype:core]{The Core language}
+    \defreftxt{Core}{the Core language}
+    Most of the prototype deals with handling the program in the Core
+    language. In this section we will show what this language looks like and
+    how it works.
+
+    The Core language is a functional language that describes
+    \emph{expressions}. Every identifier used in Core is called a
+    \emph{binder}, since it is bound to a value somewhere. On the highest
+    level, a Core program is a collection of functions, each of which bind a
+    binder (the function name) to an expression (the function value, which has
+    a function type).
+
+    The Core language itself does not prescribe any program structure
+    (like modules, declarations, imports, etc.), only expression
+    structure. In the \small{GHC} compiler, the Haskell module structure
+    is used for the resulting Core code as well. Since this is not so
+    relevant for understanding the Core language or the Normalization
+    process, we will only look at the Core expression language here.
+
+    Each Core expression consists of one of these possible expressions.
+
+    \startdesc{Variable reference}
+      \defref{variable reference}
+      \startlambda
+      bndr :: T
+      \stoplambda
+      This is a reference to a binder. It is written down as the
+      name of the binder that is being referred to along with its type. The
+      binder name should of course be bound in a containing scope
+      (including top level scope, so a reference to a top level function
+      is also a variable reference). Additionally, constructors from
+      algebraic data-types also become variable references (\eg\
+      \lam{True}).
+
+      In our examples, binders will commonly consist of a single
+      characters, but they can have any length.
+
+      The value of this expression is the value bound to the given
+      binder.
+
+      Each binder also carries around its type (explicitly shown above), but
+      this is usually not shown in the Core expressions. Only when the type is
+      relevant (when a new binder is introduced, for example) will it be
+      shown. In other cases, the type is either not relevant, or easily
+      derived from the context of the expression. \todo{Ref sidenote on type
+      annotations}
+    \stopdesc
+
+    \startdesc{Literal}
+      \defref{literal}
+      \startlambda
+      10
+      \stoplambda
+      This is a literal. Only primitive types are supported, like
+      chars, strings, integers and doubles. The types of these literals are the
+      \quote{primitive}, unboxed versions, like \lam{Char\#} and \lam{Word\#}, not the
+      normal Haskell versions (but there are built-in conversion
+      functions). Without going into detail about these types, note that
+      a few conversion functions exist to convert these to the normal
+      (boxed) Haskell equivalents. See
+      \in{section}[sec:normalization:literals] for an example.
+    \stopdesc
+
+    \startdesc{Application}
+      \defref{application}
+      \startlambda
+      func arg
+      \stoplambda
+      This is function application. Each application consists of two
+      parts: the function part and the argument part. Applications are used
+      for normal function \quote{calls}, but also for applying type
+      abstractions and data constructors.
+
+      The value of an application is the value of the function part, with the
+      first argument binder bound to the argument part.
+
+      In Core, there is no distinction between an operator and a
+      function. This means that, for example the addition of two numbers
+      looks like the following in Core:
+      
+      \startlambda
+      (+) 1 2
+      \stoplambda
+
+      Where the function \quote{\lam{(+)}} is applied to the numbers 1
+      and 2. However, to increase readability, an application of an
+      operator like \lam{(+)} is sometimes written infix. In this case,
+      the parenthesis are also left out, just like in Haskell. In other
+      words, the following means exactly the same as the addition above:
+
+      \startlambda
+      1 + 2
+      \stoplambda
+    \stopdesc
+
+    \startdesc{Lambda abstraction}
+      \defref{lambda abstraction}
+      \startlambda
+      Î»bndr.body
+      \stoplambda
+      This is the basic lambda abstraction, as it occurs in lambda calculus.
+      It consists of a binder part and a body part.  A lambda abstraction
+      creates a function, that can be applied to an argument. The binder is
+      usually a value binder, but it can also be a \emph{type binder} (or
+      \emph{type variable}). The latter case introduces a new polymorphic
+      variable, which can be used in types later on. See
+      \in{section}[sec:prototype:coretypes] for details.
+     
+      The body of a lambda abstraction extends all the way to the end of
+      the expression, or the closing bracket surrounding the lambda. In
+      other words, the lambda abstraction \quote{operator} has the
+      lowest priority of all.
+
+      The value of an application is the value of the body part, with the
+      binder bound to the value the entire lambda abstraction is applied to.
+    \stopdesc
+
+    \startdesc{Non-recursive let expression}
+      \defref{let expression}
+      \startlambda
+      let bndr = value in body
+      \stoplambda
+      A let expression allows you to bind a binder to some value, while
+      evaluating to some other value (for which that binder is in scope). This
+      allows for sharing of sub-expressions (you can use a binder twice) and
+      explicit \quote{naming} of arbitrary expressions. A binder is not
+      in scope in the value bound it is bound to, so it is not possible
+      to make recursive definitions with a non-recursive let expression
+      (see the recursive form below).
+
+      Even though this let expression is an extension on the basic lambda
+      calculus, it is easily translated to a lambda abstraction. The let
+      expression above would then become:
+
+      \startlambda
+      (Î»bndr.body) value
+      \stoplambda
+
+      This notion might be useful for verifying certain properties on
+      transformations, since a lot of verification work has been done on
+      lambda calculus already.
+
+      The value of a let expression is the value of the body part, with the
+      binder bound to the value. 
+    \stopdesc
+
+    \startdesc{Recursive let expression}
+      \startlambda
+      letrec
+        bndr1 = value1
+        \vdots
+        bndrn = valuen
+      in 
+        body
+      \stoplambda
+      This is the recursive version of the let expression. In \small{GHC}'s
+      Core implementation, non-recursive and recursive lets are not so
+      distinct as we present them here, but this provides a clearer overview.
+      
+      The main difference with the normal let expression is that it can
+      contain multiple bindings (or even none) and each of the binders
+      is in scope in each of the values, in addition to the body. This
+      allows for self-recursive or mutually recursive definitions.
+
+      It is also possible to express a recursive let expression using
+      normal lambda calculus, if we use the \emph{least fixed-point
+      operator}, \lam{Y} (but the details are too complicated to help
+      clarify the let expression, so this will not be explored further).
+    \stopdesc
+
+    \placeintermezzo{}{
+      \startframedtext[width=8cm,background=box,frame=no]
+      \startalignment[center]
+        {\tfa Weak head normal form (\small{WHNF})}
+      \stopalignment
+      \blank[medium]
+        An expression is in weak head normal form if it is either an
+        constructor application or lambda abstraction. \cite[jones87]
+
+        Without going into detail about the differences with head
+        normal form and normal form, note that evaluating the scrutinee
+        of a case expression to normal form (evaluating any function
+        applications, variable references and case expressions) is
+        sufficient to decide which case alternatives should be chosen.
+      \stopframedtext
+
+    }
+
+    \startdesc{Case expression}
+      \defref{case expression}
+      \startlambda
+        case scrutinee of bndr
+          DEFAULT -> defaultbody
+          C0 bndr0,0 ... bndr0,m -> body0
+          \vdots
+          Cn bndrn,0 ... bndrn,m -> bodyn
+      \stoplambda
+
+      A case expression is the only way in Core to choose between values. All
+      \hs{if} expressions and pattern matchings from the original Haskell
+      program have been translated to case expressions by the desugarer. 
+      
+      A case expression evaluates its scrutinee, which should have an
+      algebraic datatype, into weak head normal form (\small{WHNF}) and
+      (optionally) binds it to \lam{bndr}. If bndr is wild, \refdef{wild
+      binders} it is left out.  Every alternative lists a single constructor
+      (\lam{C0 ... Cn}). Based on the actual constructor of the scrutinee, the
+      corresponding alternative is chosen. The binders in the chosen
+      alternative (\lam{bndr0,0 ....  bndr0,m} are bound to the actual
+      arguments to the constructor in the scrutinee.
+
+      This is best illustrated with an example. Assume
+      there is an algebraic datatype declared as follows\footnote{This
+      datatype is not supported by the current CÎ»ash implementation, but
+      serves well to illustrate the case expression}:
+
+      \starthaskell
+      data D = A Word | B Bit
+      \stophaskell
+
+      This is an algebraic datatype with two constructors, each getting
+      a single argument. A case expression scrutinizing this datatype
+      could look like the following:
+
+      \startlambda
+        case s of
+          A word -> High
+          B bit -> bit
+      \stoplambda
+
+      What this expression does is check the constructor of the
+      scrutinee \lam{s}. If it is \lam{A}, it always evaluates to
+      \lam{High}. If the constructor is \lam{B}, the binder \lam{bit} is
+      bound to the argument passed to \lam{B} and the case expression
+      evaluates to this bit.
+      
+      If none of the alternatives match, the \lam{DEFAULT} alternative
+      is chosen. A case expression must always be exhaustive, \ie\ it
+      must cover all possible constructors that the scrutinee can have
+      (if all of them are covered explicitly, the \lam{DEFAULT}
+      alternative can be left out).
+      
+      Since we can only match the top level constructor, there can be no overlap
+      in the alternatives and thus order of alternatives is not relevant (though
+      the \lam{DEFAULT} alternative must appear first for implementation
+      efficiency).
+      
+      To support strictness, the scrutinee is always evaluated into
+      \small{WHNF}, even when there is only a \lam{DEFAULT} alternative. This
+      allows application of the strict function \lam{f} to the argument \lam{a}
+      to be written like:
+
+      \startlambda
+      f (case a of arg DEFAULT -> arg)
+      \stoplambda
+
+      According to the \GHC\ documentation, this is the only use for the extra
+      binder to which the scrutinee is bound.  When not using strictness
+      annotations (which is rather pointless in hardware descriptions),
+      \small{GHC} seems to never generate any code making use of this binder.
+      In fact, \GHC\ has never been observed to generate code using this
+      binder, even when strictness was involved.  Nonetheless, the prototype
+      handles this binder as expected.
+
+      Note that these case expressions are less powerful than the full Haskell
+      case expressions. In particular, they do not support complex patterns like
+      in Haskell. Only the constructor of an expression can be matched,
+      complex patterns are implemented using multiple nested case expressions.
+
+      Case expressions are also used for unpacking of algebraic data-types, even
+      when there is only a single constructor. For examples, to add the elements
+      of a tuple, the following Core is generated:
+
+      \startlambda
+      sum = Î»tuple.case tuple of
+        (,) a b -> a + b
+      \stoplambda
     
+      Here, there is only a single alternative (but no \lam{DEFAULT}
+      alternative, since the single alternative is already exhaustive). When
+      its body is evaluated, the arguments to the tuple constructor \lam{(,)}
+      (\eg, the elements of the tuple) are bound to \lam{a} and \lam{b}.
+    \stopdesc
+
+    \startdesc{Cast expression}
+      \defref{cast expression}
+      \startlambda
+      body â¶ targettype
+      \stoplambda
+      A cast expression allows you to change the type of an expression to an
+      equivalent type. Note that this is not meant to do any actual work, like
+      conversion of data from one format to another, or force a complete type
+      change. Instead, it is meant to change between different representations
+      of the same type, \eg\ switch between types that are provably equal (but
+      look different).
+      
+      In our hardware descriptions, we typically see casts to change between a
+      Haskell newtype and its contained type, since those are effectively
+      different types (so a cast is needed) with the same representation (but
+      no work is done by the cast).
+
+      More complex are types that are proven to be equal by the type-checker,
+      but look different at first glance. To ensure that, once the type-checker
+      has proven equality, this information sticks around, explicit casts are
+      added. In our notation we only write the target type, but in reality a
+      cast expressions carries around a \emph{coercion}, which can be seen as a
+      proof of equality. \todo{Example}
+
+      The value of a cast is the value of its body, unchanged. The type of this
+      value is equal to the target type, not the type of its body.
+    \stopdesc
+
+    \startdesc{Note}
+      The Core language in \small{GHC} allows adding \emph{notes}, which serve
+      as hints to the inliner or add custom (string) annotations to a Core
+      expression. These should not be generated normally, so these are not
+      handled in any way in the prototype.
+    \stopdesc
+
+    \startdesc{Type}
+      \defref{type expression}
+      \startlambda
+      @T
+      \stoplambda
+      It is possibly to use a Core type as a Core expression. To prevent
+      confusion between types and values, the \lam{@} sign is used to
+      explicitly mark a type that is used in a Core expression.
+      
+      For the actual types supported by Core, see
+      \in{section}[sec:prototype:coretypes]. This \quote{lifting} of a
+      type into the value domain is done to allow for type abstractions
+      and applications to be handled as normal lambda abstractions and
+      applications above. This means that a type expression in Core can
+      only ever occur in the argument position of an application, and
+      only if the type of the function that is applied to expects a type
+      as the first argument. This happens in applications of all
+      polymorphic functions. Consider the \lam{fst} function:
+
+      \startlambda
+      fst :: \forall t1. \forall t2. (t1, t2) ->t1 
+      fst = Î»t1.Î»t2.Î»(tup :: (t1, t2)). case tup of (,) a b -> a
+
+      fstint :: (Int, Int) -> Int
+      fstint = Î»a.Î»b.fst @Int @Int a b
+      \stoplambda
+          
+      The type of \lam{fst} has two universally quantified type variables. When
+      \lam{fst} is applied in \lam{fstint}, it is first applied to two types.
+      (which are substituted for \lam{t1} and \lam{t2} in the type of \lam{fst}, so
+      the actual type of arguments and result of \lam{fst} can be found:
+      \lam{fst @Int @Int :: (Int, Int) -> Int}).
+    \stopdesc
+
+    \subsection[sec:prototype:coretypes]{Core type system}
+      Whereas the expression syntax of Core is very simple, its type system is
+      a bit more complicated. It turns out it is harder to \quote{desugar}
+      Haskell's complex type system into something more simple. Most of the
+      type system is thus very similar to that of Haskell.
+
+      We will slightly limit our view on Core's type system, since the more
+      complicated parts of it are only meant to support Haskell's (or rather,
+      \GHC's) type extensions, such as existential types, \small{GADT}s, type
+      families and other non-standard Haskell stuff which we do not (plan to)
+      support.
+
+      \placeintermezzo{}{
+        \defref{id function}
+        \startframedtext[width=8cm,background=box,frame=no]
+        \startalignment[center]
+          {\tfa The \hs{id} function}
+        \stopalignment
+        \blank[medium]
+          A function that is probably present in every functional language, is
+          the \emph{identity} function. This is the function that takes a
+          single argument and simply returns it unmodified. In Haskell this
+          function is called \hs{id} and can take an argument of any type
+          (\ie, it is polymorphic).
+
+          The \hs{id} function will be used in the examples every now and
+          then.
+        \stopframedtext
+      }
+      In Core, every expression is typed. The translation to Core happens
+      after the type-checker, so types in Core are always correct as well
+      (though you could of course construct invalidly typed expressions
+      through the \GHC\ API).
+
+      Any type in Core is one of the following:
+
+      \startdesc{A type variable}
+        \startlambda
+        t
+        \stoplambda
+
+        This is a reference to a type defined elsewhere. This can either be a
+        polymorphic type (like the latter two \lam{t}'s in \lam{id :: \forall t.
+        t -> t}), or a type constructor (like \lam{Bool} in \lam{not :: Bool ->
+        Bool}). Like in Haskell, polymorphic type variables always
+        start with a lowercase letter, while type constructors always start
+        with an uppercase letter.
+
+        \todo{How to define (new) type constructors?}
+
+        A special case of a type constructor is the \emph{function type
+        constructor}, \lam{->}. This is a type constructor taking two arguments
+        (using application below). The function type constructor is commonly
+        written inline, so we write \lam{a -> b} when we really mean \lam{-> a
+        b}, the function type constructor applied to \lam{a} and \lam{b}.
+
+        Polymorphic type variables can only be defined by a lambda
+        abstraction, see the forall type below.
+      \stopdesc
+
+      \startdesc{A type application}
+        \startlambda
+          Maybe Int
+        \stoplambda
+
+        This applies some type to another type. This is particularly used to
+        apply type variables (type constructors) to their arguments.
+
+        As mentioned above, applications of some type constructors have
+        special notation. In particular, these are applications of the
+        \emph{function type constructor} and \emph{tuple type constructors}:
+        \startlambda
+          foo :: t1 -> t2 
+          foo' :: -> t1 t2 
+          bar :: (t1, t2, t3)
+          bar' :: (,,) t1 t2 t3
+        \stoplambda
+      \stopdesc
+
+      \startdesc{The forall type}
+        \startlambda
+          id :: \forall t. t -> t
+        \stoplambda
+        The forall type introduces polymorphism. It is the only way to
+        introduce new type variables, which are completely unconstrained (Any
+        possible type can be assigned to it). Constraints can be added later
+        using predicate types, see below.
+
+        A forall type is always (and only) introduced by a type lambda
+        expression. For example, the Core translation of the
+        id function is:
+        \startlambda
+          id = Î»t.Î»(x :: t).x
+        \stoplambda
+
+        Here, the type of the binder \lam{x} is \lam{t}, referring to the
+        binder in the topmost lambda.
+
+        When using a value with a forall type, the actual type
+        used must be applied first. For example Haskell expression \hs{id
+        True} (the function \hs{id} applied to the data-constructor \hs{True})
+        translates to the following Core:
+
+        \startlambda
+        id @Bool True
+        \stoplambda
+
+        Here, id is first applied to the type to work with. Note that the type
+        then changes from \lam{id :: \forall t. t -> t} to \lam{id @Bool ::
+        Bool -> Bool}. Note that the type variable \lam{a} has been
+        substituted with the actual type.
+
+        In Haskell, forall types are usually not explicitly specified (The use
+        of a lowercase type variable implicitly introduces a forall type for
+        that variable). In fact, in standard Haskell there is no way to
+        explicitly specify forall types. Through a language extension, the
+        \hs{forall} keyword is available, but still optional for normal forall
+        types (it is needed for \emph{existentially quantified types}, which
+        CÎ»ash does not support).
+      \stopdesc
+
+      \startdesc{Predicate type}
+        \startlambda
+          show :: \forall t. Show t â t â String
+        \stoplambda
+       
+        \todo{Sidenote: type classes?}
+
+        A predicate type introduces a constraint on a type variable introduced
+        by a forall type (or type lambda). In the example above, the type
+        variable \lam{t} can only contain types that are an \emph{instance} of
+        the \emph{type class} \lam{Show}.
+
+        There are other sorts of predicate types, used for the type families
+        extension, which we will not discuss here.
+
+        A predicate type is introduced by a lambda abstraction. Unlike with
+        the forall type, this is a value lambda abstraction, that must be
+        applied to a value. We call this value a \emph{dictionary}.
+
+        Without going into the implementation details, a dictionary can be
+        seen as a lookup table all the methods for a given (single) type class
+        instance. This means that all the dictionaries for the same type class
+        look the same (\eg\ contain methods with the same names). However,
+        dictionaries for different instances of the same class contain
+        different methods, of course.
+
+        A dictionary is introduced by \small{GHC} whenever it encounters an
+        instance declaration. This dictionary, as well as the binder
+        introduced by a lambda that introduces a dictionary, have the
+        predicate type as their type. These binders are usually named starting
+        with a \lam{\$}. Usually the name of the type concerned is not
+        reflected in the name of the dictionary, but the name of the type
+        class is. The Haskell expression \hs{show True} thus becomes:
+
+        \startlambda
+        show @Bool \$dShow True
+        \stoplambda
+      \stopdesc
+
+      Using this set of types, all types in basic Haskell can be represented.
+      \todo{Overview of polymorphism with more examples (or move examples
+      here)}
+        
+  \section[sec:prototype:statetype]{State annotations in Haskell}
+    As noted in \in{section}[sec:description:stateann], CÎ»ash needs some
+    way to let the programmer explicitly specify which of a function's
+    arguments and which part of a function's result represent the
+    function's state.
+
+    Using the Haskell type systems, there are a few ways we can tackle this.
+
+    \subsection{Type synonyms}
+      Haskell provides type synonyms as a way to declare a new type that is
+      equal to an existing type (or rather, a new name for an existing type).
+      This allows both the original type and the synonym to be used
+      interchangeably in a Haskell program. This means no explicit conversion
+      is needed. For example, a simple accumulator would become:
+
+      \starthaskell
+      -- This type synonym would become part of CÎ»ash, it is shown here
+      -- just for clarity.
+      type State s = s
+
+      acc :: Word -> State Word -> (State Word, Word)
+      acc i s = let sum = s + i in (sum, sum)
+      \stophaskell
+
+      This looks nice in Haskell, but turns out to be hard to implement. There
+      is no explicit conversion in Haskell, but not in Core either. This
+      means the type of a value might be shown as \hs{State Word} in
+      some places, but \hs{Word} in others (and this can even change due
+      to transformations). Since every binder has an explicit type
+      associated with it, the type of every function type will be
+      properly preserved and could be used to track down the
+      statefulness of each value by the compiler. However, this would make
+      the implementation a lot more complicated than when using type
+      renamings as described in the next section.
+
+    % Use \type instead of \hs here, since the latter breaks inside
+    % section headings.
+    \subsection{Type renaming (\type{newtype})}
+      Haskell also supports type renamings as a way to declare a new type that
+      has the same (run-time) representation as an existing type (but is in
+      fact a different type to the type-checker). With type renaming,
+      explicit conversion between values of the two types is needed. The
+      accumulator would then become:
+
+      \starthaskell
+      -- This type renaming would become part of CÎ»ash, it is shown here
+      -- just for clarity.
+      newtype State s = State s
+
+      acc :: Word -> State Word -> (State Word, Word)
+      acc i (State s) = let sum = s + i in (State sum, sum)
+      \stophaskell
+
+      The \hs{newtype} line declares a new type \hs{State} that has one type
+      argument, \hs{s}. This type contains one \quote{constructor} \hs{State}
+      with a single argument of type \hs{s}. It is customary to name the
+      constructor the same as the type, which is allowed (since types can
+      never cause name collisions with values). The difference with the type
+      synonym example is in the explicit conversion between the \hs{State
+      Word} and \hs{Word} types by pattern matching and by using the explicit
+      the \hs{State} constructor.
+
+      This explicit conversion makes the \VHDL\ generation easier: whenever we
+      remove (unpack) the \hs{State} type, this means we are accessing the
+      current state (\ie, accessing the register output). Whenever we are
+      adding (packing) the \hs{State} type, we are producing a new value for
+      the state (\ie, providing the register input).
+
+      When dealing with nested states (a stateful function that calls stateful
+      functions, which might call stateful functions, etc.) the state type
+      could quickly grow complex because of all the \hs{State} type constructors
+      needed. For example, consider the following state type (this is just the
+      state type, not the entire function type):
+
+      \starthaskell
+      State (State Bit, State (State Word, Bit), Word)
+      \stophaskell
+
+      We cannot leave all these \hs{State} type constructors out, since that
+      would change the type (unlike when using type synonyms). However, when
+      using type synonyms to hide away sub-states (see
+      \in{section}[sec:prototype:substatesynonyms] below), this
+      disadvantage should be limited.
+
+      \subsubsection{Different input and output types}
+        An alternative could be to use different types for input and output
+        state (\ie\ current and updated state). The accumulator example would
+        then become something like:
+
+        \starthaskell
+        -- These type renamings would become part of CÎ»ash, it is shown
+        -- here just for clarity.
+        newtype StateIn s = StateIn s
+        newtype StateOut s = StateOut s
+
+        acc :: Word -> StateIn Word -> (StateIn Word, Word)
+        acc i (StateIn s) = let sum = s + i in (StateIn sum, sum)
+        \stophaskell
+        
+        This could make the implementation easier and the hardware
+        descriptions less error-prone (you can no longer \quote{forget} to
+        unpack and repack a state variable and just return it directly, which
+        can be a problem in the current prototype). However, it also means we
+        need twice as many type synonyms to hide away sub-states, making this
+        approach a bit cumbersome. It also makes it harder to compare input
+        and output state types, possible reducing the type-safety of the
+        descriptions.
+
+    \subsection[sec:prototype:substatesynonyms]{Type synonyms for sub-states}
+      As noted above, when using nested (hierarchical) states, the state types
+      of the \quote{upper} functions (those that call other functions, which
+      call other functions, etc.) quickly become complicated. Also, when the
+      state type of one of the \quote{lower} functions changes, the state
+      types of all the upper functions changes as well. If the state type for
+      each function is explicitly and completely specified, this means that a
+      lot of code needs updating whenever a state type changes.
+
+      To prevent this, it is recommended (but not enforced) to use a type
+      synonym for the state type of every function. Every function calling
+      other functions will then use the state type synonym of the called
+      functions in its own type, requiring no code changes when the state type
+      of a called function changes. This approach is used in
+      \in{example}[ex:AvgState] below. The \hs{AccState} and \hs{AvgState}
+      are examples of such state type synonyms.
+
+    \subsection{Chosen approach}
+      To keep implementation simple, the current prototype uses the type
+      renaming approach, with a single type for both input and output
+      states. In the future, it might be worthwhile to revisit this
+      approach if more complicated flow analysis is implemented for
+      state variables. This analysis is needed to add proper error
+      checking anyway and might allow the use of type synonyms without
+      losing any expressivity.
+
+      \subsubsection{Example}
+        As an example of the used approach, a simple averaging circuit
+        is shown in \in{example}[ex:AvgState]. This circuit lets the
+        accumulation of the inputs be done by a sub-component, \hs{acc},
+        but keeps a count of value accumulated in its own
+        state.\footnote{Currently, the prototype is not able to compile
+        this example, since there is no built-in function for division.}
+        
+        \startbuffer[AvgState]
+          -- This type renaming would become part of CÎ»ash, it is shown
+          -- here just for clarity
+          newtype State s = State s
+
+          -- The accumulator state type
+          type AccState = State Word
+          -- The accumulator
+          acc :: Word -> AccState -> (AccState, Word)
+          acc i (State s) = let sum = s + i in (State sum, sum)
+
+          -- The averaging circuit state type
+          type AvgState = State (AccState, Word)
+          -- The averaging circuit
+          avg :: Word -> AvgState -> (AvgState, Word)
+          avg i (State s) = (State s', o)
+            where
+              (accs, count) = s
+              -- Pass our input through the accumulator, which outputs a sum
+              (accs', sum) = acc i accs
+              -- Increment the count (which will be our new state)
+              count' = count + 1
+              -- Compute the average
+              o = sum / count'
+              s' = (accs', count')
+        \stopbuffer
+
+        \placeexample[here][ex:AvgState]{Simple stateful averaging circuit.}
+          %\startcombination[2*1]
+            {\typebufferhs{AvgState}}%{Haskell description using function applications.}
+          %  {\boxedgraphic{AvgState}}{The architecture described by the Haskell description.}
+          %\stopcombination
+        \todo{Picture}
+
+  \section{\VHDL\ generation for state}  
+    Now its clear how to put state annotations in the Haskell source,
+    there is the question of how to implement this state translation. As
+    we have seen in \in{section}[sec:prototype:design], the translation to
+    \VHDL\ happens as a simple, final step in the compilation process.
+    This step works on a Core expression in normal form. The specifics
+    of normal form will be explained in
+    \in{chapter}[chap:normalization], but the examples given should be
+    easy to understand using the definition of Core given above. The
+    conversion to and from the \hs{State} type is done using the cast
+    operator, \lam{â¶}.
+
+        \startbuffer[AvgStateNormal]
+          acc = Î»i.Î»spacked.
+            let
+              -- Remove the State newtype
+              s = spacked â¶ Word
+              sum = s + i
+              -- Add the State newtype again
+              spacked' = sum â¶ State Word
+              res = (spacked', sum)
+            in
+              res
+
+          avg = Î»i.Î»spacked.
+            let
+              s = spacked â¶ (AccState, Word)
+              accs = case s of (a, b) -> a
+              count = case s of (c, d) -> d
+              accres = acc i accs
+              accs' = case accres of (e, f) -> e
+              sum = case accres of (g, h) -> h
+              count' = count + 1
+              o = sum / count'
+              s' = (accs', count')
+              spacked' = s' â¶ State (AccState, Word)
+              res = (spacked', o)
+            in
+              res
+        \stopbuffer
+
+        \placeexample[here][ex:AvgStateNormal]{Normalized version of \in{example}[ex:AvgState]}
+            {\typebufferlam{AvgStateNormal}}
+
+    \subsection[sec:prototype:statelimits]{State in normal form}
+      Before describing how to translate state from normal form to
+      \VHDL, we will first see how state handling looks in normal form.
+      How must their use be limited to guarantee that proper \VHDL\ can
+      be generated?
+
+      We will formulate a number of rules about what operations are
+      allowed with state variables. These rules apply to the normalized Core
+      representation, but will in practice apply to the original Haskell
+      hardware description as well. Ideally, these rules would become part
+      of the intended normal form definition \refdef{intended normal form
+      definition}, but this is not the case right now. This can cause some
+      problems, which are detailed in
+      \in{section}[sec:normalization:stateproblems].
+
+      In these rules we use the terms \emph{state variable} to refer to any
+      variable that has a \lam{State} type. A \emph{state-containing
+      variable} is any variable whose type contains a \lam{State} type,
+      but is not one itself (like \lam{(AccState, Word)} in the example,
+      which is a tuple type, but contains \lam{AccState}, which is again
+      equal to \lam{State Word}).
+
+      We also use a distinction between \emph{input} and \emph{output
+      (state) variables} and \emph{sub-state variables}, which will be
+      defined in the rules themselves.
+
+      These rules describe everything that can be done with state
+      variables and state-containing variables. Everything else is
+      invalid. For every rule, the corresponding part of
+      \in{example}[ex:AvgStateNormal] is shown.
+
+      \startdesc{State variables can appear as an argument.}
+        \startlambda
+          avg = Î»i.Î»spacked. ...
+        \stoplambda
+
+        Any lambda that binds a variable with a state type, creates a new
+        input state variable.
+      \stopdesc
+
+      \startdesc{Input state variables can be unpacked.}
+        \startlambda
+          s = spacked â¶ (AccState, Word)
+        \stoplambda
+
+        An input state variable may be unpacked using a cast operation. This
+        removes the \lam{State} type renaming and the result has no longer a
+        \lam{State} type.
+
+        If the result of this unpacking does not have a state type and does
+        not contain state variables, there are no limitations on its
+        use (this is the function's own state).  Otherwise if it does
+        not have a state type but does contain sub-states, we refer to it
+        as a \emph{state-containing input variable} and the limitations
+        below apply. If it has a state type itself, we refer to it as an
+        \emph{input sub-state variable} and the below limitations apply
+        as well.
+
+        It may seem strange to consider a variable that still has a state
+        type directly after unpacking, but consider the case where a
+        function does not have any state of its own, but does call a single
+        stateful function. This means it must have a state argument that
+        contains just a sub-state. The function signature of such a function
+        could look like:
+
+        \starthaskell
+          type FooState = State AccState
+        \stophaskell
+
+        Which is of course equivalent to \lam{State (State Word)}.
+      \stopdesc
+
+      \startdesc{Variables can be extracted from state-containing input variables.}
+        \startlambda
+          accs = case s of (a, b) -> a
+        \stoplambda
+
+        A state-containing input variable is typically a tuple containing
+        multiple elements (like the current function's state, sub-states or
+        more tuples containing sub-states). All of these can be extracted
+        from an input variable using an extractor case (or possibly
+        multiple, when the input variable is nested).
+
+        If the result has no state type and does not contain any state
+        variables either, there are no further limitations on its use
+        (this is the function's own state). If the result has no state
+        type but does contain state variables we refer to it as a
+        \emph{state-containing input variable} and this limitation keeps
+        applying. If the variable has a state type itself, we refer to
+        it as an \emph{input sub-state variable} and below limitations
+        apply.
+
+      \startdesc{Input sub-state variables can be passed to functions.} 
+        \startlambda
+          accres = acc i accs
+          accs' = case accres of (e, f) -> e
+        \stoplambda
+        
+        An input sub-state variable can (only) be passed to a function.
+        Additionally, every input sub-state variable must be used in exactly
+        \emph{one} application, no more and no less.
+
+        The function result should contain exactly one state variable, which
+        can be extracted using (multiple) case expressions. The extracted
+        state variable is referred to the \emph{output sub-state}
+
+        The type of this output sub-state must be identical to the type of
+        the input sub-state passed to the function.
+      \stopdesc
+
+      \startdesc{Variables can be inserted into a state-containing output variable.}
+        \startlambda
+          s' = (accs', count')
+        \stoplambda
+        
+        A function's output state is usually a tuple containing its own
+        updated state variables and all output sub-states. This result is
+        built up using any single-constructor algebraic datatype
+        (possibly nested).
+
+        The result of these expressions is referred to as a
+        \emph{state-containing output variable}, which are subject to these
+        limitations.
+      \stopdesc
+
+      \startdesc{State containing output variables can be packed.}
+        \startlambda
+          spacked' = s' â¶ State (AccState, Word)
+        \stoplambda
+
+        As soon as all a functions own update state and output sub-state
+        variables have been joined together, the resulting
+        state-containing output variable can be packed into an output
+        state variable. Packing is done by casting to a state type.
+      \stopdesc
+
+      \startdesc{Output state variables can appear as (part of) a function result.}
+        \startlambda
+          avg = Î»i.Î»spacked.
+            let
+            \vdots
+            res = (spacked', o)
+          in
+            res
+        \stoplambda
+        When the output state is packed, it can be returned as a part
+        of the function result. Nothing else can be done with this
+        value (or any value that contains it).
+      \stopdesc
+
+      There is one final limitation that is hard to express in the above
+      itemization. Whenever sub-states are extracted from the input state
+      to be passed to functions, the corresponding output sub-states
+      should be inserted into the output state in the same way. In other
+      words, each pair of corresponding sub-states in the input and
+      output states should be passed to / returned from the same called
+      function.
+
+      The prototype currently does not check much of the above
+      conditions. This means that if the conditions are violated,
+      sometimes a compile error is generated, but in other cases output
+      can be generated that is not valid \VHDL\ or at the very least does
+      not correspond to the input.
+
+    \subsection{Translating to \VHDL}
+      As noted above, the basic approach when generating \VHDL\ for stateful
+      functions is to generate a single register for every stateful function.
+      We look around the normal form to find the let binding that removes the
+      \lam{State} type renaming (using a cast). We also find the let binding that
+      adds a \lam{State} type. These are connected to the output and the input
+      of the generated let binding respectively. This means that there can
+      only be one let binding that adds and one that removes the \lam{State}
+      type. It is easy to violate this constraint. This problem is detailed in
+      \in{section}[sec:normalization:stateproblems].
+
+      This approach seems simple enough, but will this also work for more
+      complex stateful functions involving sub-states?  Observe that any
+      component of a function's state that is a sub-state, \ie\ passed on as
+      the state of another function, should have no influence on the
+      hardware generated for the calling function. Any state-specific
+      \small{VHDL} for this component can be generated entirely within the
+      called function.  So, we can completely ignore sub-states when
+      generating \VHDL\ for a function.
+      
+      From this observation it might seem logical to remove the
+      sub-states from a function's states altogether and leave only the
+      state components which are actual states of the current function.
+      While doing this would not remove any information needed to
+      generate \small{VHDL} from the function, it would cause the
+      function definition to become invalid (since we will not have any
+      sub-state to pass to the functions anymore).  We could solve the
+      syntactic problems by passing \type{undefined} for state
+      variables, but that would still break the code on the semantic
+      level (\ie, the function would no longer be semantically
+      equivalent to the original input).
+
+      To keep the function definition correct until the very end of the
+      process, we will not deal with (sub)states until we get to the
+      \small{VHDL} generation.  Then, we are translating from Core to
+      \small{VHDL}, and we can simply generate no \VHDL for sub-states,
+      effectively removing them altogether.
+
+      But, how will we know what exactly is a sub-state? Since any state
+      argument or return value that represents state must be of the
+      \type{State} type, we can look at the type of a value. However, we
+      must be careful to ignore only \emph{sub-states}, and not a
+      function's own state.
+
+      For \in{example}[ex:AvgStateNormal] above, we should generate a register
+      with its output connected to \lam{s} and its input connected
+      to \lam{s'}. However, \lam{s'} is build up from both \lam{accs'} and
+      \lam{count'}, while only \lam{count'} should end up in the register.
+      \lam{accs'} is a sub-state for the \lam{acc} function, for which a
+      register will be created when generating \VHDL\ for the \lam{acc}
+      function.
+
+      Fortunately, the \lam{accs'} variable (and any other sub-state) has a
+      property that we can easily check: it has a \lam{State} type. This
+      means that whenever \VHDL\ is generated for a tuple (or other
+      algebraic type), we can simply leave out all elements that have a
+      \lam{State} type. This will leave just the parts of the state that
+      do not have a \lam{State} type themselves, like \lam{count'},
+      which is exactly a function's own state. This approach also means
+      that the state part of the result (\eg\ \lam{s'} in \lam{res}) is
+      automatically excluded when generating the output port, which is
+      also required.
+
+      We can formalize this translation a bit, using the following
+      rules.
+
+      \startitemize
+        \item A state unpack operation should not generate any \small{VHDL}.
+        The binder to which the unpacked state is bound should still be
+        declared, this signal will become the register and will hold the
+        current state.
+        \item A state pack operation should not generate any \small{VHDL}.
+        The binder to which the packed state is bound should not be
+        declared. The binder that is packed is the signal that will hold the
+        new state.
+        \item Any values of a State type should not be translated to
+        \small{VHDL}. In particular, State elements should be removed from
+        tuples (and other data-types) and arguments with a state type should
+        not generate ports.
+        \item To make the state actually work, a simple \small{VHDL}
+        (sequential) process should be generated. This process updates
+        the state at every clock cycle, by assigning the new state to the
+        current state. This will be recognized by synthesis tools as a
+        register specification.
+      \stopitemize
+
+      When applying these rules to the function \lam{avg} from
+      \in{example}[ex:AvgStateNormal], we be left with the description
+      in \in{example}[ex:AvgStateRemoved]. All the parts that do not
+      generate any \VHDL\ directly are crossed out, leaving just the
+      actual flow of values in the final hardware. To illustrate the
+      change of the types of \lam{s} and \lam{s'}, their types are also
+      shown.
+      
+      \startbuffer[AvgStateRemoved]
+        avg = iÎ».Î»--spacked.--
+          let 
+            s :: (--AccState,-- Word)
+            s = --spacked â¶ (AccState, Word)--
+            --accs = case s of (a, b) -> a--
+            count = case s of (--c,-- d) -> d
+            accres = acc i --accs--
+            --accs' = case accres of (e, f) -> e--
+            sum = case accres of (--g,-- h) -> h
+            count' = count + 1
+            o = sum / count'
+            s' :: (--AccState,-- Word)
+            s' = (--accs',-- count')
+            --spacked' = s' â¶ State (AccState, Word)--
+            res = (--spacked',-- o)
+          in
+            res
+      \stopbuffer
+      \placeexample[here][ex:AvgStateRemoved]{Normalized version of \in{example}[ex:AvgState] with ignored parts crossed out}
+          {\typebufferlam{AvgStateRemoved}}
+              
+      When we actually leave out the crossed out parts, we get a slightly
+      weird program: there is a variable \lam{s} which has no value, and there
+      is a variable \lam{s'} that is never used. But together, these two will form
+      the state process of the function. \lam{s} contains the "current" state,
+      \lam{s'} is assigned the "next" state. So, at the end of each clock
+      cycle, \lam{s'} should be assigned to \lam{s}.
+
+      As an illustration of the result of this function,
+      \in{example}[ex:AccStateVHDL] and \in{example}[ex:AvgStateVHDL] show the the \VHDL\ that is
+      generated by CÎ»ash from the examples is this section.
+
+      \startbuffer[AvgStateVHDL]
+        entity avgComponent_0 is
+             port (\izAlE2\ : in \unsigned_31\;
+                   \foozAo1zAo12\ : out \(,)unsigned_31\;
+                   clock : in std_logic;
+                   resetn : in std_logic);
+        end entity avgComponent_0;
+
+
+        architecture structural of avgComponent_0 is
+             signal \szAlG2\ : \(,)unsigned_31\;
+             signal \countzAlW2\ : \unsigned_31\;
+             signal \dszAm62\ : \(,)unsigned_31\;
+             signal \sumzAmk3\ : \unsigned_31\;
+             signal \reszAnCzAnM2\ : \unsigned_31\;
+             signal \foozAnZzAnZ2\ : \unsigned_31\;
+             signal \reszAnfzAnj3\ : \unsigned_31\;
+             signal \s'zAmC2\ : \(,)unsigned_31\;
+        begin
+             \countzAlW2\ <= \szAlG2\.A;
+
+             \comp_ins_dszAm62\ : entity accComponent_1
+                                       port map (\izAob3\ => \izAlE2\,
+                                                 \foozAoBzAoB2\ => \dszAm62\,
+                                                 clock => clock,
+                                                 resetn => resetn);
+
+             \sumzAmk3\ <= \dszAm62\.A;
+
+             \reszAnCzAnM2\ <= to_unsigned(1, 32);
+
+             \foozAnZzAnZ2\ <= \countzAlW2\ + \reszAnCzAnM2\;
+
+             \reszAnfzAnj3\ <= \sumzAmk3\ * \foozAnZzAnZ2\;
+
+             \s'zAmC2\.A <= \foozAnZzAnZ2\;
+
+             \foozAo1zAo12\.A <= \reszAnfzAnj3\;
+
+             state : process (clock, resetn)
+             begin
+                  if resetn = '0' then
+                  elseif rising_edge(clock) then
+                       \szAlG2\ <= \s'zAmC2\;
+                  end if;
+             end process state;
+        end architecture structural;
+      \stopbuffer
+
+      \startbuffer[AvgStateTypes]
+        package types is
+             subtype \unsigned_31\ is unsigned (0 to 31);
+                  
+             type \(,)unsigned_31\ is record 
+                   A : \unsigned_31\;
+             end record;
+        end package types;
+      \stopbuffer
+
+      \startbuffer[AccStateVHDL]
+        entity accComponent_1 is
+             port (\izAob3\ : in \unsigned_31\;
+                   \foozAoBzAoB2\ : out \(,)unsigned_31\;
+                   clock : in std_logic;
+                   resetn : in std_logic);
+        end entity accComponent_1;
+
+        architecture structural of accComponent_1 is
+             signal \szAod3\ : \unsigned_31\;
+             signal \reszAonzAor3\ : \unsigned_31\;
+        begin
+             \reszAonzAor3\ <= \szAod3\ + \izAob3\;
+             
+             \foozAoBzAoB2\.A <= \reszAonzAor3\;
+             
+             state : process (clock, resetn)
+             begin
+                  if resetn = '0' then
+                  elseif rising_edge(clock) then
+                       \szAod3\ <= \reszAonzAor3\;
+                  end if;
+             end process state;
+        end architecture structural;
+      \stopbuffer 
+    
+      \placeexample[][ex:AvgStateTypes]{\VHDL\ types generated for \hs{acc} and \hs{avg} from \in{example}[ex:AvgState]}
+          {\typebuffervhdl{AvgStateTypes}}
+      \placeexample[][ex:AccStateVHDL]{\VHDL\ generated for \hs{acc} from \in{example}[ex:AvgState]}
+          {\typebuffervhdl{AccStateVHDL}}
+      \placeexample[][ex:AvgStateVHDL]{\VHDL\ generated for \hs{avg} from \in{example}[ex:AvgState]}
+          {\typebuffervhdl{AvgStateVHDL}}
+  \section{Prototype implementation}
+    The prototype has been implemented using Haskell as its
+    implementation language, just like \GHC. This allows the prototype
+    do directly use parts of \GHC\ through the \small{API} it exposes
+    (which essentially taps directly into the internals of \GHC, making
+    this \small{API} not really a stable interface).
+
+    CÎ»ash can be run from a separate library, but has also been
+    integrated into \type{ghci} \cite[baaij09]. The latter does requires
+    a custom \GHC\ build, however.
 
-        Core - description of the language (appendix?)
-        Implementation issues
-        Simplified core?
+    The latest version and all history of the CÎ»ash code can be browsed
+    on-line or retrieved using the \type{git} program.
 
-        Haskell language coverage / constraints
-                Recursion
-                Builtin types
-                Custom types (Sum types, product types)
-                Function types / higher order expressions
+    http://git.stderr.nl/gitweb?p=matthijs/projects/cÎ»ash.git
 
+%    \subsection{Initial state}
+%      How to specify the initial state? Cannot be done inside a hardware
+%      function, since the initial state is its own state argument for the first
+%      call (unless you add an explicit, synchronous reset port).
+%
+%      External init state is natural for simulation.
+%
+%      External init state works for hardware generation as well.
+%
+%      Implementation issues: state splitting, linking input to output state,
+%      checking usage constraints on state variables.
+%
+%
+% vim: set sw=2 sts=2 expandtab: