Chapters/Prototype.tex

   1 \chapter[chap:prototype]{Prototype}
   2   An important step in this research is the creation of a prototype compiler.
   3   Having this prototype allows us to apply the ideas from the previous chapter
   4   to actual hardware descriptions and evaluate their usefulness. Having a
   5   prototype also helps to find new techniques and test possible
   6   interpretations.
   7
   8   Obviously the prototype was not created after all research
   9   ideas were formed, but its implementation has been interleaved with the
  10   research itself. Also, the prototype described here is the final version, it
  11   has gone through a number of design iterations which we will not completely
  12   describe here.
  13
  14   \section[sec:prototype:input]{Input language}
  15     When implementing this prototype, the first question to ask is:
  16     Which (functional) language will be used to describe our hardware?
  17     (Note that this does not concern the \emph{implementation language}
  18     of the compiler, just the language \emph{translated by} the
  19     compiler).
  20
  21     Initially, we have two choices:
  22
  23     \startitemize
  24       \item Create a new functional language from scratch. This has the
  25       advantage of having a language that contains exactly those elements that
  26       are convenient for describing hardware and can contain special
  27       constructs that allows our hardware descriptions to be more powerful or
  28       concise.
  29       \item Use an existing language and create a new backend for it. This has
  30       the advantage that existing tools can be reused, which will speed up
  31       development.
  32     \stopitemize
  33
  34
  35     \placeintermezzo{}{
  36       \startframedtext[width=8cm,background=box,frame=no]
  37       \startalignment[center]
  38         {\tfa No \small{EDSL} or Template Haskell}
  39       \stopalignment
  40       \blank[medium]
  41
  42       Note that in this consideration, embedded domain-specific
  43       languages (\small{EDSL}) and Template Haskell (\small{TH})
  44       approaches have not been included. As we have seen in
  45       \in{section}[sec:context:fhdls], these approaches have all kinds
  46       of limitations on the description language that we would like to
  47       avoid.
  48       \stopframedtext
  49     }
  50     Considering that we required a prototype which should be working quickly,
  51     and that implementing parsers, semantic checkers and especially
  52     typcheckers is not exactly the core of this research (but it is lots and
  53     lots of work!), using an existing language is the obvious choice. This
  54     also has the advantage that a large set of language features is available
  55     to experiment with and it is easy to find which features apply well and
  56     which do not. Another import advantage of using an existing language, is
  57     that simulation of the code becomes trivial. Since there are existing
  58     compilers and interpreters that can run the hardware description directly,
  59     it can be simulated without also having to write an interpreter for the
  60     new language.
  61
  62     A possible second prototype could use a custom language with just the useful
  63     features (and possibly extra features that are specific to
  64     the domain of hardware description as well).
  65
  66     The second choice to be made is which of the many existing languages to use. As
  67     mentioned before, the chosen language is Haskell.  This choice has not been the
  68     result of a thorough comparison of languages, for the simple reason that
  69     the requirements on the language were completely unclear at the start of
  70     this research. The fact that Haskell is a language with a broad spectrum
  71     of features, that it is commonly used in research projects and that the
  72     primary compiler, \GHC, provides a high level API to its internals, made
  73     Haskell an obvious choice.
  74
  75   \section[sec:prototype:output]{Output format}
  76     The second important question is: what will be our output format?
  77     This output format should at least allow for programming the
  78     hardware design into a field-programmable gate array (\small{FPGA}).
  79     The choice of output format is thus limited by what hardware
  80     synthesis and programming tools can process.
  81
  82     Looking at other tools in the industry, the Electronic Design Interchange
  83     Format (\small{EDIF}) is commonly used for storing intermediate
  84     \emph{netlists} (lists of components and connections between these
  85     components) and is commonly the target for \small{VHDL} and Verilog
  86     compilers.
  87
  88     However, \small{EDIF} is not completely tool-independent. It specifies a
  89     meta-format, but the hardware components that can be used vary between
  90     various tool and hardware vendors, as well as the interpretation of the
  91     \small{EDIF} standard. \cite[li89]
  92
  93     This means that when working with \small{EDIF}, our prototype would become
  94     technology dependent (\eg\ only work with \small{FPGA}s of a specific
  95     vendor, or even only with specific chips). This limits the applicability
  96     of our prototype. Also, the tools we would like to use for verifying,
  97     simulating and draw pretty pictures of our output (like Precision, or
  98     QuestaSim) are designed for \small{VHDL} or Verilog input.
  99
 100     For these reasons, we will not use \small{EDIF}, but \small{VHDL} as our
 101     output language.  We choose \VHDL\ over Verilog simply because we are
 102     familiar with \small{VHDL} already. The differences between \small{VHDL}
 103     and Verilog are on the higher level, while we will be using \small{VHDL}
 104     mainly to write low level, netlist-like descriptions anyway.
 105
 106     An added advantage of using VHDL is that we can profit from existing
 107     optimizations in VHDL synthesizers. A lot of optimizations are done on the
 108     VHDL level by existing tools. These tools have been under
 109     development for years, so it would not be reasonable to assume we
 110     could achieve a similar amount of optimization in our prototype (nor
 111     should it be a goal, considering this is just a prototype).
 112
 113     \placeintermezzo{}{
 114       \startframedtext[width=8cm,background=box,frame=no]
 115       \startalignment[center]
 116         {\tfa Translation vs. compilation vs. synthesis}
 117       \stopalignment
 118       \blank[medium]
 119         In this thesis the words \emph{translation}, \emph{compilation} and
 120         sometimes \emph{synthesis} will be used interchangedly to refer to the
 121         process of translating the hardware description from the Haskell
 122         language to the \VHDL\ language.
 123
 124         Similarly, the prototype created is referred to as both the
 125         \emph{translator} as well as the \emph{compiler}.
 126
 127         The final part of this process is usually referred to as \emph{\VHDL\
 128         generation}.
 129       \stopframedtext
 130     }
 131
 132     Note that we will be using \small{VHDL} as our output language, but will
 133     not use its full expressive power. Our output will be limited to using
 134     simple, structural descriptions, without any complex behavioural
 135     descriptions like arbitrary sequential statements (which might not
 136     be supported by all tools). This ensures that any tool that works
 137     with \VHDL\ will understand our output (most tools do not support
 138     synthesis of more complex \VHDL).  This also leaves open the option
 139     to switch to \small{EDIF} in the future, with minimal changes to the
 140     prototype.
 141
 142   \section{Simulation and synthesis}
 143     As mentioned above, by using the Haskell language, we get simulation of
 144     our hardware descriptions almost for free. The only thing that is needed
 145     is to provide a Haskell implementation of all built-in functions that can
 146     be used by the Haskell interpreter to simulate them.
 147
 148     The main topic of this thesis is therefore the path from the Haskell
 149     hardware descriptions to \small{FPGA} synthesis, focusing of course on the
 150     \VHDL\ generation. Since the \VHDL\ generation process preserves the meaning
 151     of the Haskell description exactly, any simulation done in Haskell
 152     \emph{should} produce identical results as the synthesized hardware.
 153
 154   \section[sec:prototype:design]{Prototype design}
 155     As suggested above, we will use the Glasgow Haskell Compiler (\small{GHC}) to
 156     implement our prototype compiler. To understand the design of the
 157     compiler, we will first dive into the \small{GHC} compiler a bit. Its
 158     compilation consists of the following steps (slightly simplified):
 159
 160     \startuseMPgraphic{ghc-pipeline}
 161       % Create objects
 162       save inp, front, desugar, simpl, back, out;
 163       newEmptyBox.inp(0,0);
 164       newBox.front(btex Frontend etex);
 165       newBox.desugar(btex Desugarer etex);
 166       newBox.simpl(btex Simplifier etex);
 167       newBox.back(btex Backend etex);
 168       newEmptyBox.out(0,0);
 169
 170       % Space the boxes evenly
 171       inp.c - front.c = front.c - desugar.c = desugar.c - simpl.c
 172         = simpl.c - back.c = back.c - out.c = (0, 1.5cm);
 173       out.c = origin;
 174
 175       % Draw lines between the boxes. We make these lines "deferred" and give
 176       % them a name, so we can use ObjLabel to draw a label beside them.
 177       ncline.inp(inp)(front) "name(haskell)";
 178       ncline.front(front)(desugar) "name(ast)";
 179       ncline.desugar(desugar)(simpl) "name(core)";
 180       ncline.simpl(simpl)(back) "name(simplcore)";
 181       ncline.back(back)(out) "name(native)";
 182       ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
 183       ObjLabel.front(btex Haskell AST etex) "labpathname(ast)", "labdir(rt)";
 184       ObjLabel.desugar(btex Core etex) "labpathname(core)", "labdir(rt)";
 185       ObjLabel.simpl(btex Simplified core etex) "labpathname(simplcore)", "labdir(rt)";
 186       ObjLabel.back(btex Native code etex) "labpathname(native)", "labdir(rt)";
 187
 188       % Draw the objects (and deferred labels)
 189       drawObj (inp, front, desugar, simpl, back, out);
 190     \stopuseMPgraphic
 191     \placefigure[right]{GHC compiler pipeline}{\startboxed \useMPgraphic{ghc-pipeline}\stopboxed}
 192
 193     \startdesc{Frontend}
 194       This step takes the Haskell source files and parses them into an
 195       abstract syntax tree (\small{AST}). This \small{AST} can express the
 196       complete Haskell language and is thus a very complex one (in contrast
 197       with the Core \small{AST}, later on). All identifiers in this
 198       \small{AST} are resolved by the renamer and all types are checked by the
 199       typechecker.
 200     \stopdesc
 201     \startdesc{Desugaring}
 202       This steps takes the full \small{AST} and translates it to the
 203       \emph{Core} language. Core is a very small functional language with lazy
 204       semantics, that can still express everything Haskell can express. Its
 205       simpleness makes Core very suitable for further simplification and
 206       translation. Core is the language we will be working with as well.
 207     \stopdesc
 208     \startdesc{Simplification}
 209       Through a number of simplification steps (such as inlining, common
 210       subexpression elimination, etc.) the Core program is simplified to make
 211       it faster or easier to process further.
 212     \stopdesc
 213     \startdesc{Backend}
 214       This step takes the simplified Core program and generates an actual
 215       runnable program for it. This is a big and complicated step we will not
 216       discuss it any further, since it is not required for our prototype.
 217     \stopdesc
 218
 219     In this process, there are a number of places where we can start our work.
 220     Assuming that we do not want to deal with (or modify) parsing, typechecking
 221     and other frontend business and that native code is not really a useful
 222     format anymore, we are left with the choice between the full Haskell
 223     \small{AST}, or the smaller (simplified) core representation.
 224
 225     The advantage of taking the full \small{AST} is that the exact structure
 226     of the source program is preserved. We can see exactly what the hardware
 227     description looks like and which syntax constructs were used. However,
 228     the full \small{AST} is a very complicated datastructure. If we are to
 229     handle everything it offers, we will quickly get a big compiler.
 230
 231     Using the core representation gives us a much more compact datastructure
 232     (a core expression only uses 9 constructors). Note that this does not mean
 233     that the core representation itself is smaller, on the contrary.
 234     Since the core language has less constructs, most Core expressions
 235     are larger than the equivalent versions in Haskell.
 236
 237     However, the fact that the core language is so much smaller, means it is a
 238     lot easier to analyze and translate it into something else. For the same
 239     reason, \small{GHC} runs its simplifications and optimizations on the core
 240     representation as well \cite[jones96].
 241
 242     We will use the normal Core representation, not the simplified Core. Even
 243     though the simplified Core version is an equivalent, but simpler
 244     definition, some problems were encountered with it in practice. The
 245     simplifier restructures some (stateful) functions in a way the normalizer
 246     and the \VHDL\ generation cannot handle, leading to uncompilable programs
 247     (whereas the non-simplified version more closely resembles the original
 248     program, allowing the original to be written in a way that can be
 249     handled). This problem is further discussed in
 250     \in{section}[sec:normalization:stateproblems].
 251
 252     The final prototype roughly consists of three steps:
 253
 254     \startuseMPgraphic{clash-pipeline}
 255       % Create objects
 256       save inp, front, norm, vhdl, out;
 257       newEmptyBox.inp(0,0);
 258       newBox.front(btex \small{GHC} frontend etex);
 259       newBox.norm(btex Normalization etex);
 260       newBox.vhdl(btex \small{VHDL} generation etex);
 261       newEmptyBox.out(0,0);
 262
 263       % Space the boxes evenly
 264       inp.c - front.c = front.c - norm.c = norm.c - vhdl.c
 265         = vhdl.c - out.c = (0, 1.5cm);
 266       out.c = origin;
 267
 268       % Draw lines between the boxes. We make these lines "deferred" and give
 269       % them a name, so we can use ObjLabel to draw a label beside them.
 270       ncline.inp(inp)(front) "name(haskell)";
 271       ncline.front(front)(norm) "name(core)";
 272       ncline.norm(norm)(vhdl) "name(normal)";
 273       ncline.vhdl(vhdl)(out) "name(vhdl)";
 274       ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
 275       ObjLabel.front(btex Core etex) "labpathname(core)", "labdir(rt)";
 276       ObjLabel.norm(btex Normalized core etex) "labpathname(normal)", "labdir(rt)";
 277       ObjLabel.vhdl(btex \small{VHDL} description etex) "labpathname(vhdl)", "labdir(rt)";
 278
 279       % Draw the objects (and deferred labels)
 280       drawObj (inp, front, norm, vhdl, out);
 281     \stopuseMPgraphic
 282     \placefigure[right]{Cλash compiler pipeline}{\startboxed \useMPgraphic{clash-pipeline}\stopboxed}
 283
 284     \startdesc{Frontend}
 285       This is exactly the frontend from the \small{GHC} pipeline, that
 286       translates Haskell sources to a typed Core representation.
 287     \stopdesc
 288     \startdesc{Normalization}
 289       This is a step that transforms the core representation into a normal
 290       form. This normal form is still expressed in the core language, but has
 291       to adhere to an additional set of constraints. This normal form is less
 292       expressive than the full core language (e.g., it can have limited
 293       higher-order expressions, has a specific structure, etc.), but is
 294       also very close to directly describing hardware.
 295     \stopdesc
 296     \startdesc{\small{VHDL} generation}
 297       The last step takes the normal formed core representation and generates
 298       \small{VHDL} for it. Since the normal form has a specific, hardware-like
 299       structure, this final step is very straightforward.
 300     \stopdesc
 301
 302     The most interesting step in this process is the normalization step. That
 303     is where more complicated functional constructs, which have no direct
 304     hardware interpretation, are removed and translated into hardware
 305     constructs. This step is described in a lot of detail at
 306     \in{chapter}[chap:normalization].
 307
 308
 309     \defref{entry function}Translation of a hardware description always
 310     starts at a single function, which is referred to as the \emph{entry
 311     function}. \VHDL\ is generated for this function first, followed by
 312     any functions used by the entry functions (recursively).
 313
 314   \section[sec:prototype:core]{The Core language}
 315     \defreftxt{core}{the Core language}
 316     Most of the prototype deals with handling the program in the Core
 317     language. In this section we will show what this language looks like and
 318     how it works.
 319
 320     The Core language is a functional language that describes
 321     \emph{expressions}. Every identifier used in Core is called a
 322     \emph{binder}, since it is bound to a value somewhere. On the highest
 323     level, a Core program is a collection of functions, each of which bind a
 324     binder (the function name) to an expression (the function value, which has
 325     a function type).
 326
 327     The Core language itself does not prescribe any program structure
 328     (like modules, declarations, imports, etc.), only expression
 329     structure. In the \small{GHC} compiler, the Haskell module structure
 330     is used for the resulting Core code as well. Since this is not so
 331     relevant for understanding the Core language or the Normalization
 332     process, we will only look at the Core expression language here.
 333
 334     Each Core expression consists of one of these possible expressions.
 335
 336     \startdesc{Variable reference}
 337       \defref{variable reference}
 338       \startlambda
 339       bndr :: T
 340       \stoplambda
 341       This is a reference to a binder. It is written down as the
 342       name of the binder that is being referred to along with its type. The
 343       binder name should of course be bound in a containing scope
 344       (including top level scope, so a reference to a top level function
 345       is also a variable reference). Additionally, constructors from
 346       algebraic datatypes also become variable references.
 347
 348       In our examples, binders will commonly consist of a single
 349       characters, but they can have any length.
 350
 351       The value of this expression is the value bound to the given
 352       binder.
 353
 354       Each binder also carries around its type (explicitly shown above), but
 355       this is usually not shown in the Core expressions. Only when the type is
 356       relevant (when a new binder is introduced, for example) will it be
 357       shown. In other cases, the binder is either not relevant, or easily
 358       derived from the context of the expression. \todo{Ref sidenote on type
 359       annotations}
 360     \stopdesc
 361
 362     \startdesc{Literal}
 363       \defref{literal}
 364       \startlambda
 365       10
 366       \stoplambda
 367       This is a literal. Only primitive types are supported, like
 368       chars, strings, ints and doubles. The types of these literals are the
 369       \quote{primitive}, unboxed versions, like \lam{Char\#} and \lam{Word\#}, not the
 370       normal Haskell versions (but there are built-in conversion
 371       functions). Without going into detail about these types, note that
 372       a few conversion functions exist to convert these to the normal
 373       (boxed) Haskell equivalents.
 374     \stopdesc
 375
 376     \startdesc{Application}
 377       \defref{application}
 378       \startlambda
 379       func arg
 380       \stoplambda
 381       This is function application. Each application consists of two
 382       parts: the function part and the argument part. Applications are used
 383       for normal function \quote{calls}, but also for applying type
 384       abstractions and data constructors.
 385
 386       In core, there is no distinction between an operator and a
 387       function. This means that, for example the addition of two numbers
 388       looks like the following in Core:
 389
 390       \startlambda
 391       (+) 1 2
 392       \stoplambda
 393
 394       Where the function \quote{\lam{(+)}} is applied to the numbers 1
 395       and 2. However, to increase readability, an application of an
 396       operator like \lam{(+)} is sometimes written infix. In this case,
 397       the parenthesis are also left out, just like in Haskell. In other
 398       words, the following means exactly the same as the addition above:
 399
 400       \startlambda
 401       1 + 2
 402       \stoplambda
 403
 404       The value of an application is the value of the function part, with the
 405       first argument binder bound to the argument part.
 406     \stopdesc
 407
 408     \startdesc{Lambda abstraction}
 409       \defref{lambda abstraction}
 410       \startlambda
 411       λbndr.body
 412       \stoplambda
 413       This is the basic lambda abstraction, as it occurs in lambda calculus.
 414       It consists of a binder part and a body part.  A lambda abstraction
 415       creates a function, that can be applied to an argument. The binder is
 416       usually a value binder, but it can also be a \emph{type binder} (or
 417       \emph{type variable}). The latter case introduces a new polymorphic
 418       variable, which can be used in types later on. See
 419       \in{section}[sec:prototype:coretypes] for details.
 420
 421       The body of a lambda abstraction extends all the way to the end of
 422       the expression, or the closing bracket surrounding the lambda. In
 423       other words, the lambda abstraction \quote{operator} has the
 424       lowest priority of all.
 425
 426       The value of an application is the value of the body part, with the
 427       binder bound to the value the entire lambda abstraction is applied to.
 428     \stopdesc
 429
 430     \startdesc{Non-recursive let expression}
 431       \defref{let expression}
 432       \startlambda
 433       let bndr = value in body
 434       \stoplambda
 435       A let expression allows you to bind a binder to some value, while
 436       evaluating to some other value (for which that binder is in scope). This
 437       allows for sharing of subexpressions (you can use a binder twice) and
 438       explicit \quote{naming} of arbitrary expressions. A binder is not
 439       in scope in the value bound it is bound to, so it is not possible
 440       to make recursive definitions with a non-recursive let expression
 441       (see the recursive form below).
 442
 443       Even though this let expression is an extension on the basic lambda
 444       calculus, it is easily translated to a lambda abstraction. The let
 445       expression above would then become:
 446
 447       \startlambda
 448       (λbndr.body) value
 449       \stoplambda
 450
 451       This notion might be useful for verifying certain properties on
 452       transformations, since a lot of verification work has been done on
 453       lambda calculus already.
 454
 455       The value of a let expression is the value of the body part, with the
 456       binder bound to the value.
 457     \stopdesc
 458
 459     \startdesc{Recursive let expression}
 460       \startlambda
 461       letrec
 462         bndr1 = value1
 463         \vdots
 464         bndrn = valuen
 465       in
 466         body
 467       \stoplambda
 468       This is the recursive version of the let expression. In \small{GHC}'s
 469       Core implementation, non-recursive and recursive lets are not so
 470       distinct as we present them here, but this provides a clearer overview.
 471
 472       The main difference with the normal let expression is that it can
 473       contain multiple bindings (or even none) and each of the binders
 474       is in scope in each of the values, in addition to the body. This
 475       allows for self-recursive or mutually recursive definitions.
 476
 477       It is also possible to express a recursive let expression using
 478       normal lambda calculus, if we use the \emph{least fixed-point
 479       operator}, \lam{Y} (but the details are too complicated to help
 480       clarify the let expression, so this will not be explored further).
 481     \stopdesc
 482
 483     \placeintermezzo{}{
 484       \startframedtext[width=8cm,background=box,frame=no]
 485       \startalignment[center]
 486         {\tfa Weak head normal form (\small{WHNF})}
 487       \stopalignment
 488       \blank[medium]
 489         An expression is in weak head normal form if it is either an
 490         constructor application or lambda abstraction. \todo{How about
 491         atoms?}
 492
 493         Without going into detail about the differences with head
 494         normal form and normal form, note that evaluating the scrutinee
 495         of a case expression to normal form (evaluating any function
 496         applications, variable references and case expressions) is
 497         sufficient to decide which case alternatives should be chosen.
 498         \todo{ref?}
 499       \stopframedtext
 500
 501     }
 502
 503     \startdesc{Case expression}
 504       \defref{case expression}
 505       \startlambda
 506         case scrutinee of bndr
 507           DEFAULT -> defaultbody
 508           C0 bndr0,0 ... bndr0,m -> body0
 509           \vdots
 510           Cn bndrn,0 ... bndrn,m -> bodyn
 511       \stoplambda
 512
 513       A case expression is the only way in Core to choose between values. All
 514       \hs{if} expressions and pattern matchings from the original Haskell
 515       PRogram have been translated to case expressions by the desugarer.
 516
 517       A case expression evaluates its scrutinee, which should have an
 518       algebraic datatype, into weak head normal form (\small{WHNF}) and
 519       (optionally) binds it to \lam{bndr}. If bndr is wild, \refdef{wild
 520       binders} it is left out.  Every alternative lists a single constructor
 521       (\lam{C0 ... Cn}). Based on the actual constructor of the scrutinee, the
 522       corresponding alternative is chosen. The binders in the chosen
 523       alternative (\lam{bndr0,0 ....  bndr0,m} are bound to the actual
 524       arguments to the constructor in the scrutinee.
 525
 526       This is best illustrated with an example. Assume
 527       there is an algebraic datatype declared as follows\footnote{This
 528       datatype is not suported by the current Cλash implementation, but
 529       serves well to illustrate the case expression}:
 530
 531       \starthaskell
 532       data D = A Word | B Bit
 533       \stophaskell
 534
 535       This is an algebraic datatype with two constructors, each getting
 536       a single argument. A case expression scrutinizing this datatype
 537       could look like the following:
 538
 539       \startlambda
 540         case s of
 541           A word -> High
 542           B bit -> bit
 543       \stoplambda
 544
 545       What this expression does is check the constructor of the
 546       scrutinee \lam{s}. If it is \lam{A}, it always evaluates to
 547       \lam{High}. If the constructor is \lam{B}, the binder \lam{bit} is
 548       bound to the argument passed to \lam{B} and the case expression
 549       evaluates to this bit.
 550
 551       If none of the alternatives match, the \lam{DEFAULT} alternative
 552       is chosen. A case expression must always be exhaustive, \ie\ it
 553       must cover all possible constructors that the scrutinee can have
 554       (if all of them are covered explicitly, the \lam{DEFAULT}
 555       alternative can be left out).
 556
 557       Since we can only match the top level constructor, there can be no overlap
 558       in the alternatives and thus order of alternatives is not relevant (though
 559       the \lam{DEFAULT} alternative must appear first for implementation
 560       efficiency).
 561
 562       To support strictness, the scrutinee is always evaluated into
 563       \small{WHNF}, even when there is only a \lam{DEFAULT} alternative. This
 564       allows aplication of the strict function \lam{f} to the argument \lam{a}
 565       to be written like:
 566
 567       \startlambda
 568       f (case a of arg DEFAULT -> arg)
 569       \stoplambda
 570
 571       According to the \GHC\ documentation, this is the only use for the extra
 572       binder to which the scrutinee is bound.  When not using strictness
 573       annotations (which is rather pointless in hardware descriptions),
 574       \small{GHC} seems to never generate any code making use of this binder.
 575       In fact, \GHC\ has never been observed to generate code using this
 576       binder, even when strictness was involved.  Nonetheless, the prototype
 577       handles this binder as expected.
 578
 579       Note that these case expressions are less powerful than the full Haskell
 580       case expressions. In particular, they do not support complex patterns like
 581       in Haskell. Only the constructor of an expression can be matched,
 582       complex patterns are implemented using multiple nested case expressions.
 583
 584       Case expressions are also used for unpacking of algebraic datatypes, even
 585       when there is only a single constructor. For examples, to add the elements
 586       of a tuple, the following Core is generated:
 587
 588       \startlambda
 589       sum = λtuple.case tuple of
 590         (,) a b -> a + b
 591       \stoplambda
 592
 593       Here, there is only a single alternative (but no \lam{DEFAULT}
 594       alternative, since the single alternative is already exhaustive). When
 595       its body is evaluated, the arguments to the tuple constructor \lam{(,)}
 596       (\eg, the elements of the tuple) are bound to \lam{a} and \lam{b}.
 597     \stopdesc
 598
 599     \startdesc{Cast expression}
 600       \defref{cast expression}
 601       \startlambda
 602       body ▶ targettype
 603       \stoplambda
 604       A cast expression allows you to change the type of an expression to an
 605       equivalent type. Note that this is not meant to do any actual work, like
 606       conversion of data from one format to another, or force a complete type
 607       change. Instead, it is meant to change between different representations
 608       of the same type, \eg\ switch between types that are provably equal (but
 609       look different).
 610
 611       In our hardware descriptions, we typically see casts to change between a
 612       Haskell newtype and its contained type, since those are effectively
 613       different types (so a cast is needed) with the same representation (but
 614       no work is done by the cast).
 615
 616       More complex are types that are proven to be equal by the typechecker,
 617       but look different at first glance. To ensure that, once the typechecker
 618       has proven equality, this information sticks around, explicit casts are
 619       added. In our notation we only write the target type, but in reality a
 620       cast expressions carries around a \emph{coercion}, which can be seen as a
 621       proof of equality. \todo{Example}
 622
 623       The value of a cast is the value of its body, unchanged. The type of this
 624       value is equal to the target type, not the type of its body.
 625     \stopdesc
 626
 627     \startdesc{Note}
 628       The Core language in \small{GHC} allows adding \emph{notes}, which serve
 629       as hints to the inliner or add custom (string) annotations to a core
 630       expression. These should not be generated normally, so these are not
 631       handled in any way in the prototype.
 632     \stopdesc
 633
 634     \startdesc{Type}
 635       \defref{type expression}
 636       \startlambda
 637       @T
 638       \stoplambda
 639       It is possibly to use a Core type as a Core expression. To prevent
 640       confusion between types and values, the \lam{@} sign is used to
 641       explicitly mark a type that is used in a Core expression.
 642
 643       For the actual types supported by Core, see
 644       \in{section}[sec:prototype:coretypes]. This \quote{lifting} of a
 645       type into the value domain is done to allow for type abstractions
 646       and applications to be handled as normal lambda abstractions and
 647       applications above. This means that a type expression in Core can
 648       only ever occur in the argument position of an application, and
 649       only if the type of the function that is applied to expects a type
 650       as the first argument. This happens in applications of all
 651       polymorphic functions. Consider the \lam{fst} function:
 652
 653       \startlambda
 654       fst :: \forall t1. \forall t2. (t1, t2) ->t1
 655       fst = λt1.λt2.λ(tup :: (t1, t2)). case tup of (,) a b -> a
 656
 657       fstint :: (Int, Int) -> Int
 658       fstint = λa.λb.fst @Int @Int a b
 659       \stoplambda
 660
 661       The type of \lam{fst} has two universally quantified type variables. When
 662       \lam{fst} is applied in \lam{fstint}, it is first applied to two types.
 663       (which are substitued for \lam{t1} and \lam{t2} in the type of \lam{fst}, so
 664       the actual type of arguments and result of \lam{fst} can be found:
 665       \lam{fst @Int @Int :: (Int, Int) -> Int}).
 666     \stopdesc
 667
 668     \subsection[sec:prototype:coretypes]{Core type system}
 669       Whereas the expression syntax of Core is very simple, its type system is
 670       a bit more complicated. It turns out it is harder to \quote{desugar}
 671       Haskell's complex type system into something more simple. Most of the
 672       type system is thus very similar to that of Haskell.
 673
 674       We will slightly limit our view on Core's type system, since the more
 675       complicated parts of it are only meant to support Haskell's (or rather,
 676       \GHC's) type extensions, such as existential types, \small{GADT}s, type
 677       families and other non-standard Haskell stuff which we do not (plan to)
 678       support.
 679
 680       \placeintermezzo{}{
 681         \startframedtext[width=8cm,background=box,frame=no]
 682         \startalignment[center]
 683           {\tfa The \hs{id} function}
 684         \stopalignment
 685         \blank[medium]
 686           A function that is probably present in every functional language, is
 687           the \emph{identity} function. This is the function that takes a
 688           single argument and simply returns it unmodified. In Haskell this
 689           function is called \hs{id} and can take an argument of any type
 690           (\ie, it is polymorphic).
 691
 692           The \hs{id} function will be used in the examples every now and
 693           then.
 694         \stopframedtext
 695       }
 696       In Core, every expression is typed. The translation to Core happens
 697       after the typechecker, so types in Core are always correct as well
 698       (though you could of course construct invalidly typed expressions
 699       through the \GHC\ API).
 700
 701       Any type in core is one of the following:
 702
 703       \startdesc{A type variable}
 704         \startlambda
 705         t
 706         \stoplambda
 707
 708         This is a reference to a type defined elsewhere. This can either be a
 709         polymorphic type (like the latter two \lam{t}'s in \lam{id :: \forall t.
 710         t -> t}), or a type constructor (like \lam{Bool} in \lam{not :: Bool ->
 711         Bool}). Like in Haskell, polymorphic type variables always
 712         start with a lowercase letter, while type constructors always start
 713         with an uppercase letter.
 714
 715         \todo{How to define (new) type constructors?}
 716
 717         A special case of a type constructor is the \emph{function type
 718         constructor}, \lam{->}. This is a type constructor taking two arguments
 719         (using application below). The function type constructor is commonly
 720         written inline, so we write \lam{a -> b} when we really mean \lam{-> a
 721         b}, the function type constructor applied to \lam{a} and \lam{b}.
 722
 723         Polymorphic type variables can only be defined by a lambda
 724         abstraction, see the forall type below.
 725       \stopdesc
 726
 727       \startdesc{A type application}
 728         \startlambda
 729           Maybe Int
 730         \stoplambda
 731
 732         This applies some type to another type. This is particularly used to
 733         apply type variables (type constructors) to their arguments.
 734
 735         As mentioned above, applications of some type constructors have
 736         special notation. In particular, these are applications of the
 737         \emph{function type constructor} and \emph{tuple type constructors}:
 738         \startlambda
 739           foo :: t1 -> t2
 740           foo' :: -> t1 t2
 741           bar :: (t1, t2, t3)
 742           bar' :: (,,) t1 t2 t3
 743         \stoplambda
 744       \stopdesc
 745
 746       \startdesc{The forall type}
 747         \startlambda
 748           id :: \forall t. t -> t
 749         \stoplambda
 750         The forall type introduces polymorphism. It is the only way to
 751         introduce new type variables, which are completely unconstrained (Any
 752         possible type can be assigned to it). Constraints can be added later
 753         using predicate types, see below.
 754
 755         A forall type is always (and only) introduced by a type lambda
 756         expression. For example, the Core translation of the
 757         id function is:
 758         \startlambda
 759           id = λt.λ(x :: t).x
 760         \stoplambda
 761
 762         Here, the type of the binder \lam{x} is \lam{t}, referring to the
 763         binder in the topmost lambda.
 764
 765         When using a value with a forall type, the actual type
 766         used must be applied first. For example Haskell expression \hs{id
 767         True} (the function \hs{id} appleid to the dataconstructor \hs{True})
 768         translates to the following Core:
 769
 770         \startlambda
 771         id @Bool True
 772         \stoplambda
 773
 774         Here, id is first applied to the type to work with. Note that the type
 775         then changes from \lam{id :: \forall t. t -> t} to \lam{id @Bool ::
 776         Bool -> Bool}. Note that the type variable \lam{a} has been
 777         substituted with the actual type.
 778
 779         In Haskell, forall types are usually not explicitly specified (The use
 780         of a lowercase type variable implicitly introduces a forall type for
 781         that variable). In fact, in standard Haskell there is no way to
 782         explicitly specify forall types. Through a language extension, the
 783         \hs{forall} keyword is available, but still optional for normal forall
 784         types (it is needed for \emph{existentially quantified types}, which
 785         Cλash does not support).
 786       \stopdesc
 787
 788       \startdesc{Predicate type}
 789         \startlambda
 790           show :: \forall t. Show t ⇒ t → String
 791         \stoplambda
 792
 793         \todo{Sidenote: type classes?}
 794
 795         A predicate type introduces a constraint on a type variable introduced
 796         by a forall type (or type lambda). In the example above, the type
 797         variable \lam{t} can only contain types that are an \emph{instance} of
 798         the \emph{type class} \lam{Show}. \refdef{type class}
 799
 800         There are other sorts of predicate types, used for the type families
 801         extension, which we will not discuss here.
 802
 803         A predicate type is introduced by a lambda abstraction. Unlike with
 804         the forall type, this is a value lambda abstraction, that must be
 805         applied to a value. We call this value a \emph{dictionary}.
 806
 807         Without going into the implementation details, a dictionary can be
 808         seen as a lookup table all the methods for a given (single) type class
 809         instance. This means that all the dictionaries for the same type class
 810         look the same (\eg\ contain methods with the same names). However,
 811         dictionaries for different instances of the same class contain
 812         different methods, of course.
 813
 814         A dictionary is introduced by \small{GHC} whenever it encounters an
 815         instance declaration. This dictionary, as well as the binder
 816         introduced by a lambda that introduces a dictionary, have the
 817         predicate type as their type. These binders are usually named starting
 818         with a \lam{\$}. Usually the name of the type concerned is not
 819         reflected in the name of the dictionary, but the name of the type
 820         class is. The Haskell expression \hs{show True} thus becomes:
 821
 822         \startlambda
 823         show @Bool \$dShow True
 824         \stoplambda
 825       \stopdesc
 826
 827       Using this set of types, all types in basic Haskell can be represented.
 828       \todo{Overview of polymorphism with more examples (or move examples
 829       here)}
 830
 831   \section[sec:prototype:statetype]{State annotations in Haskell}
 832     As noted in \in{section}[sec:description:stateann], Cλash needs some
 833     way to let the programmer explicitly specify which of a function's
 834     arguments and which part of a function's result represent the
 835     function's state.
 836
 837     Using the Haskell type systems, there are a few ways we can tackle this.
 838
 839     \subsection{Type synonyms}
 840       Haskell provides type synonyms as a way to declare a new type that is
 841       equal to an existing type (or rather, a new name for an existing type).
 842       This allows both the original type and the synonym to be used
 843       interchangedly in a Haskell program. This means no explicit conversion
 844       is needed. For example, a simple accumulator would become:
 845
 846       \starthaskell
 847       -- This type synonym would become part of Cλash, it is shown here
 848       -- just for clarity.
 849       type State s = s
 850
 851       acc :: Word -> State Word -> (State Word, Word)
 852       acc i s = let sum = s + i in (sum, sum)
 853       \stophaskell
 854
 855       This looks nice in Haskell, but turns out to be hard to implement. There
 856       is no explicit conversion in Haskell, but not in Core either. This
 857       means the type of a value might be shown as \hs{State Word} in
 858       some places, but \hs{Word} in others (and this can even change due
 859       to transformations). Since every binder has an explicit type
 860       associated with it, the type of every function type will be
 861       properly preserved and could be used to track down the
 862       statefulness of each value by the compiler. However, this would make
 863       the implementation a lot more complicated than when using type
 864       renamings as described in the next section.
 865
 866     % Use \type instead of \hs here, since the latter breaks inside
 867     % section headings.
 868     \subsection{Type renaming (\type{newtype})}
 869       Haskell also supports type renamings as a way to declare a new type that
 870       has the same (runtime) representation as an existing type (but is in
 871       fact a different type to the typechecker). With type renaming,
 872       explicit conversion between values of the two types is needed. The
 873       accumulator would then become:
 874
 875       \starthaskell
 876       -- This type renaming would become part of Cλash, it is shown here
 877       -- just for clarity.
 878       newtype State s = State s
 879
 880       acc :: Word -> State Word -> (State Word, Word)
 881       acc i (State s) = let sum = s + i in (State sum, sum)
 882       \stophaskell
 883
 884       The \hs{newtype} line declares a new type \hs{State} that has one type
 885       argument, \hs{s}. This type contains one \quote{constructor} \hs{State}
 886       with a single argument of type \hs{s}. It is customary to name the
 887       constructor the same as the type, which is allowed (since types can
 888       never cause name collisions with values). The difference with the type
 889       synonym example is in the explicit conversion between the \hs{State
 890       Word} and \hs{Word} types by pattern matching and by using the explicit
 891       the \hs{State} constructor.
 892
 893       This explicit conversion makes the \VHDL\ generation easier: whenever we
 894       remove (unpack) the \hs{State} type, this means we are accessing the
 895       current state (\ie, accessing the register output). Whenever we are
 896       adding (packing) the \hs{State} type, we are producing a new value for
 897       the state (\ie, providing the register input).
 898
 899       When dealing with nested states (a stateful function that calls stateful
 900       functions, which might call stateful functions, etc.) the state type
 901       could quickly grow complex because of all the \hs{State} type constructors
 902       needed. For example, consider the following state type (this is just the
 903       state type, not the entire function type):
 904
 905       \starttyping
 906       State (State Bit, State (State Word, Bit), Word)
 907       \stoptyping
 908
 909       We cannot leave all these \hs{State} type constructors out, since that
 910       would change the type (unlike when using type synonyms). However, when
 911       using type synonyms to hide away substates (see
 912       \in{section}[sec:prototype:substatesynonyms] below), this
 913       disadvantage should be limited.
 914
 915       \subsubsection{Different input and output types}
 916         An alternative could be to use different types for input and output
 917         state (\ie\ current and updated state). The accumulator example would
 918         then become something like:
 919
 920         \starthaskell
 921         -- These type renaminges would become part of Cλash, it is shown
 922         -- here just for clarity.
 923         newtype StateIn s = StateIn s
 924         newtype StateOut s = StateOut s
 925
 926         acc :: Word -> StateIn Word -> (StateIn Word, Word)
 927         acc i (StateIn s) = let sum = s + i in (StateIn sum, sum)
 928         \stophaskell
 929
 930         This could make the implementation easier and the hardware
 931         descriptions less error-prone (you can no longer \quote{forget} to
 932         unpack and repack a state variable and just return it directly, which
 933         can be a problem in the current prototype). However, it also means we
 934         need twice as many type synonyms to hide away substates, making this
 935         approach a bit cumbersome. It also makes it harder to compare input
 936         and output state types, possible reducing the type-safety of the
 937         descriptions.
 938
 939     \subsection[sec:prototype:substatesynonyms]{Type synonyms for substates}
 940       As noted above, when using nested (hierarchical) states, the state types
 941       of the \quote{upper} functions (those that call other functions, which
 942       call other functions, etc.) quickly become complicated. Also, when the
 943       state type of one of the \quote{lower} functions changes, the state
 944       types of all the upper functions changes as well. If the state type for
 945       each function is explicitly and completely specified, this means that a
 946       lot of code needs updating whenever a state type changes.
 947
 948       To prevent this, it is recommended (but not enforced) to use a type
 949       synonym for the state type of every function. Every function calling
 950       other functions will then use the state type synonym of the called
 951       functions in its own type, requiring no code changes when the state type
 952       of a called function changes. This approach is used in
 953       \in{example}[ex:AvgState] below. The \hs{AccState} and \hs{AvgState}
 954       are examples of such state type synonyms.
 955
 956     \subsection{Chosen approach}
 957       To keep implementation simple, the current prototype uses the type
 958       renaming approach, with a single type for both input and output
 959       states. In the future, it might be worthwhile to revisit this
 960       approach if more complicated flow analysis is implemented for
 961       state variables. This analysis is needed to add proper error
 962       checking anyway and might allow the use of type synonyms without
 963       losing any expressivity.
 964
 965       \subsubsection{Example}
 966         As an example of the used approach, a simple averaging circuit
 967         is shown in \in{example}[ex:AvgState]. This circuit lets the
 968         accumulation of the inputs be done by a subcomponent, \hs{acc},
 969         but keeps a count of value accumulated in its own
 970         state.\footnote{Currently, the prototype is not able to compile
 971         this example, since there is no built-in function for division.}
 972
 973         \startbuffer[AvgState]
 974           -- This type renaming would become part of Cλash, it is shown
 975           -- here just for clarity
 976           newtype State s = State s
 977
 978           -- The accumulator state type
 979           type AccState = State Word
 980           -- The accumulator
 981           acc :: Word -> AccState -> (AccState, Word)
 982           acc i (State s) = let sum = s + i in (State sum, sum)
 983
 984           -- The averaging circuit state type
 985           type AvgState = State (AccState, Word)
 986           -- The averaging circuit
 987           avg :: Word -> AvgState -> (AvgState, Word)
 988           avg i (State s) = (State s', o)
 989             where
 990               (accs, count) = s
 991               -- Pass our input through the accumulator, which outputs a sum
 992               (accs', sum) = acc i accs
 993               -- Increment the count (which will be our new state)
 994               count' = count + 1
 995               -- Compute the average
 996               o = sum / count'
 997               s' = (accs', count')
 998         \stopbuffer
 999
1000         \placeexample[here][ex:AvgState]{Simple stateful averaging circuit.}
1001           %\startcombination[2*1]
1002             {\typebufferhs{AvgState}}%{Haskell description using function applications.}
1003           %  {\boxedgraphic{AvgState}}{The architecture described by the Haskell description.}
1004           %\stopcombination
1005         \todo{Picture}
1006
1007   \section{\VHDL\ generation for state}
1008     Now its clear how to put state annotations in the Haskell source,
1009     there is the question of how to implement this state translation. As
1010     we have seen in \in{section}[sec:prototype:design], the translation to
1011     \VHDL\ happens as a simple, final step in the compilation process.
1012     This step works on a Core expression in normal form. The specifics
1013     of normal form will be explained in
1014     \in{chapter}[chap:normalization], but the examples given should be
1015     easy to understand using the definition of Core given above. The
1016     conversion to and from the \hs{State} type is done using the cast
1017     operator, \lam{▶}.
1018
1019         \startbuffer[AvgStateNormal]
1020           acc = λi.λspacked.
1021             let
1022               -- Remove the State newtype
1023               s = spacked ▶ Word
1024               sum = s + i
1025               -- Add the State newtype again
1026               spacked' = sum ▶ State Word
1027               res = (spacked', sum)
1028             in
1029               res
1030
1031           avg = λi.λspacked.
1032             let
1033               s = spacked ▶ (AccState, Word)
1034               accs = case s of (a, b) -> a
1035               count = case s of (c, d) -> d
1036               accres = acc i accs
1037               accs' = case accres of (e, f) -> e
1038               sum = case accres of (g, h) -> h
1039               count' = count + 1
1040               o = sum / count'
1041               s' = (accs', count')
1042               spacked' = s' ▶ State (AccState, Word)
1043               res = (spacked', o)
1044             in
1045               res
1046         \stopbuffer
1047
1048         \placeexample[here][ex:AvgStateNormal]{Normalized version of \in{example}[ex:AvgState]}
1049             {\typebufferlam{AvgStateNormal}}
1050
1051     \subsection[sec:prototype:statelimits]{State in normal form}
1052       Before describing how to translate state from normal form to
1053       \VHDL, we will first see how state handling looks in normal form.
1054       How must their use be limited to guarantee that proper \VHDL\ can
1055       be generated?
1056
1057       We will formulate a number of rules about what operations are
1058       allowed with state variables. These rules apply to the normalized Core
1059       representation, but will in practice apply to the original Haskell
1060       hardware description as well. Ideally, these rules would become part
1061       of the intended normal form definition \refdef{intended normal form
1062       definition}, but this is not the case right now. This can cause some
1063       problems, which are detailed in
1064       \in{section}[sec:normalization:stateproblems].
1065
1066       In these rules we use the terms \emph{state variable} to refer to any
1067       variable that has a \lam{State} type. A \emph{state-containing
1068       variable} is any variable whose type contains a \lam{State} type,
1069       but is not one itself (like \lam{(AccState, Word)} in the example,
1070       which is a tuple type, but contains \lam{AccState}, which is again
1071       equal to \lam{State Word}).
1072
1073       We also use a distinction between \emph{input} and \emph{output
1074       (state) variables} and \emph{substate variables}, which will be
1075       defined in the rules themselves.
1076
1077       These rules describe everything that can be done with state
1078       variables and state-containing variables. Everything else is
1079       invalid. For every rule, the corresponding part of
1080       \in{example}[ex:AvgStateNormal] is shown.
1081
1082       \startdesc{State variables can appear as an argument.}
1083         \startlambda
1084           avg = λi.λspacked. ...
1085         \stoplambda
1086
1087         Any lambda that binds a variable with a state type, creates a new
1088         input state variable.
1089       \stopdesc
1090
1091       \startdesc{Input state variables can be unpacked.}
1092         \startlambda
1093           s = spacked ▶ (AccState, Word)
1094         \stoplambda
1095
1096         An input state variable may be unpacked using a cast operation. This
1097         removes the \lam{State} type renaming and the result has no longer a
1098         \lam{State} type.
1099
1100         If the result of this unpacking does not have a state type and does
1101         not contain state variables, there are no limitations on its
1102         use (this is the function's own state).  Otherwise if it does
1103         not have a state type but does contain substates, we refer to it
1104         as a \emph{state-containing input variable} and the limitations
1105         below apply. If it has a state type itself, we refer to it as an
1106         \emph{input substate variable} and the below limitations apply
1107         as well.
1108
1109         It may seem strange to consider a variable that still has a state
1110         type directly after unpacking, but consider the case where a
1111         function does not have any state of its own, but does call a single
1112         stateful function. This means it must have a state argument that
1113         contains just a substate. The function signature of such a function
1114         could look like:
1115
1116         \starthaskell
1117           type FooState = State AccState
1118         \stophaskell
1119
1120         Which is of course equivalent to \lam{State (State Word)}.
1121       \stopdesc
1122
1123       \startdesc{Variables can be extracted from state-containing input variables.}
1124         \startlambda
1125           accs = case s of (a, b) -> a
1126         \stoplambda
1127
1128         A state-containing input variable is typically a tuple containing
1129         multiple elements (like the current function's state, substates or
1130         more tuples containing substates). All of these can be extracted
1131         from an input variable using an extractor case (or possibly
1132         multiple, when the input variable is nested).
1133
1134         If the result has no state type and does not contain any state
1135         variables either, there are no further limitations on its use
1136         (this is the function's own state). If the result has no state
1137         type but does contain state variables we refer to it as a
1138         \emph{state-containing input variable} and this limitation keeps
1139         applying. If the variable has a state type itself, we refer to
1140         it as an \emph{input substate variable} and below limitations
1141         apply.
1142
1143       \startdesc{Input substate variables can be passed to functions.}
1144         \startlambda
1145           accres = acc i accs
1146           accs' = case accres of (e, f) -> e
1147         \stoplambda
1148
1149         An input substate variable can (only) be passed to a function.
1150         Additionally, every input substate variable must be used in exactly
1151         \emph{one} application, no more and no less.
1152
1153         The function result should contain exactly one state variable, which
1154         can be extracted using (multiple) case expressions. The extracted
1155         state variable is referred to the \emph{output substate}
1156
1157         The type of this output substate must be identical to the type of
1158         the input substate passed to the function.
1159       \stopdesc
1160
1161       \startdesc{Variables can be inserted into a state-containing output variable.}
1162         \startlambda
1163           s' = (accs', count')
1164         \stoplambda
1165
1166         A function's output state is usually a tuple containing its own
1167         updated state variables and all output substates. This result is
1168         built up using any single-constructor algebraic datatype
1169         (possibly nested).
1170
1171         The result of these expressions is referred to as a
1172         \emph{state-containing output variable}, which are subject to these
1173         limitations.
1174       \stopdesc
1175
1176       \startdesc{State containing output variables can be packed.}
1177         \startlambda
1178           spacked' = s' ▶ State (AccState, Word)
1179         \stoplambda
1180
1181         As soon as all a functions own update state and output substate
1182         variables have been joined together, the resulting
1183         state-containing output variable can be packed into an output
1184         state variable. Packing is done by casting to a state type.
1185       \stopdesc
1186
1187       \startdesc{Output state variables can appear as (part of) a function result.}
1188         \startlambda
1189           avg = λi.λspacked.
1190             let
1191             \vdots
1192             res = (spacked', o)
1193           in
1194             res
1195         \stoplambda
1196         When the output state is packed, it can be returned as a part
1197         of the function result. Nothing else can be done with this
1198         value (or any value that contains it).
1199       \stopdesc
1200
1201       There is one final limitation that is hard to express in the above
1202       itemization. Whenever substates are extracted from the input state
1203       to be passed to functions, the corresponding output substates
1204       should be inserted into the output state in the same way. In other
1205       words, each pair of corresponding substates in the input and
1206       output states should be passed to / returned from the same called
1207       function.
1208
1209       The prototype currently does not check much of the above
1210       conditions. This means that if the conditions are violated,
1211       sometimes a compile error is generated, but in other cases output
1212       can be generated that is not valid \VHDL\ or at the very least does
1213       not correspond to the input.
1214
1215     \subsection{Translating to \VHDL}
1216       As noted above, the basic approach when generating \VHDL\ for stateful
1217       functions is to generate a single register for every stateful function.
1218       We look around the normal form to find the let binding that removes the
1219       \lam{State} type renaming (using a cast). We also find the let binding that
1220       adds a \lam{State} type. These are connected to the output and the input
1221       of the generated let binding respectively. This means that there can
1222       only be one let binding that adds and one that removes the \lam{State}
1223       type. It is easy to violate this constraint. This problem is detailed in
1224       \in{section}[sec:normalization:stateproblems].
1225
1226       This approach seems simple enough, but will this also work for more
1227       complex stateful functions involving substates?  Observe that any
1228       component of a function's state that is a substate, \ie\ passed on as
1229       the state of another function, should have no influence on the
1230       hardware generated for the calling function. Any state-specific
1231       \small{VHDL} for this component can be generated entirely within the
1232       called function.  So, we can completely ignore substates when
1233       generating \VHDL\ for a function.
1234
1235       From this observation it might seem logical to remove the
1236       substates from a function's states altogether and leave only the
1237       state components which are actual states of the current function.
1238       While doing this would not remove any information needed to
1239       generate \small{VHDL} from the function, it would cause the
1240       function definition to become invalid (since we will not have any
1241       substate to pass to the functions anymore).  We could solve the
1242       syntactic problems by passing \type{undefined} for state
1243       variables, but that would still break the code on the semantic
1244       level (\ie, the function would no longer be semantically
1245       equivalent to the original input).
1246
1247       To keep the function definition correct until the very end of the
1248       process, we will not deal with (sub)states until we get to the
1249       \small{VHDL} generation.  Then, we are translating from Core to
1250       \small{VHDL}, and we can simply generate no \VHDL for substates,
1251       effectively removing them altogether.
1252
1253       But, how will we know what exactly is a substate? Since any state
1254       argument or return value that represents state must be of the
1255       \type{State} type, we can look at the type of a value. However, we
1256       must be careful to ignore only \emph{substates}, and not a
1257       function's own state.
1258
1259       For \in{example}[ex:AvgStateNormal] above, we should generate a register
1260       with its output connected to \lam{s} and its input connected
1261       to \lam{s'}. However, \lam{s'} is build up from both \lam{accs'} and
1262       \lam{count'}, while only \lam{count'} should end up in the register.
1263       \lam{accs'} is a substate for the \lam{acc} function, for which a
1264       register will be created when generating \VHDL\ for the \lam{acc}
1265       function.
1266
1267       Fortunately, the \lam{accs'} variable (and any other substate) has a
1268       property that we can easily check: it has a \lam{State} type. This
1269       means that whenever \VHDL\ is generated for a tuple (or other
1270       algebraic type), we can simply leave out all elements that have a
1271       \lam{State} type. This will leave just the parts of the state that
1272       do not have a \lam{State} type themselves, like \lam{count'},
1273       which is exactly a function's own state. This approach also means
1274       that the state part of the result (\eg\ \lam{s'} in \lam{res}) is
1275       automatically excluded when generating the output port, which is
1276       also required.
1277
1278       We can formalize this translation a bit, using the following
1279       rules.
1280
1281       \startitemize
1282         \item A state unpack operation should not generate any \small{VHDL}.
1283         The binder to which the unpacked state is bound should still be
1284         declared, this signal will become the register and will hold the
1285         current state.
1286         \item A state pack operation should not generate any \small{VHDL}.
1287         The binder to which the packed state is bound should not be
1288         declared. The binder that is packed is the signal that will hold the
1289         new state.
1290         \item Any values of a State type should not be translated to
1291         \small{VHDL}. In particular, State elements should be removed from
1292         tuples (and other datatypes) and arguments with a state type should
1293         not generate ports.
1294         \item To make the state actually work, a simple \small{VHDL}
1295         (sequential) process should be generated. This process updates
1296         the state at every clockcycle, by assigning the new state to the
1297         current state. This will be recognized by synthesis tools as a
1298         register specification.
1299       \stopitemize
1300
1301       When applying these rules to the function \lam{avg} from
1302       \in{example}[ex:AvgStateNormal], we be left with the description
1303       in \in{example}[ex:AvgStateRemoved]. All the parts that do not
1304       generate any \VHDL\ directly are crossed out, leaving just the
1305       actual flow of values in the final hardware. To illustrate the
1306       change of the types of \lam{s} and \lam{s'}, their types are also
1307       shown.
1308
1309       \startbuffer[AvgStateRemoved]
1310         avg = iλ.λ--spacked.--
1311           let
1312             s :: (--AccState,-- Word)
1313             s = --spacked ▶ (AccState, Word)--
1314             --accs = case s of (a, b) -> a--
1315             count = case s of (--c,-- d) -> d
1316             accres = acc i --accs--
1317             --accs' = case accres of (e, f) -> e--
1318             sum = case accres of (--g,-- h) -> h
1319             count' = count + 1
1320             o = sum / count'
1321             s' :: (--AccState,-- Word)
1322             s' = (--accs',-- count')
1323             --spacked' = s' ▶ State (AccState, Word)--
1324             res = (--spacked',-- o)
1325           in
1326             res
1327       \stopbuffer
1328       \placeexample[here][ex:AvgStateRemoved]{Normalized version of \in{example}[ex:AvgState] with ignored parts crossed out}
1329           {\typebufferlam{AvgStatRemoved}}
1330
1331       When we actually leave out the crossed out parts, we get a slightly
1332       weird program: there is a variable \lam{s} which has no value, and there
1333       is a variable \lam{s'} that is never used. But together, these two will form
1334       the state process of the function. \lam{s} contains the "current" state,
1335       \lam{s'} is assigned the "next" state. So, at the end of each clock
1336       cycle, \lam{s'} should be assigned to \lam{s}.
1337
1338       As an illustration of the result of this function,
1339       \in{example}[ex:AccStateVHDL] and \in{example}[ex:AvgStateVHDL] show the the \VHDL\ that is
1340       generated by Cλash from the examples is this section.
1341
1342       \startbuffer[AvgStateVHDL]
1343         entity avgComponent_0 is
1344              port (\izAlE2\ : in \unsigned_31\;
1345                    \foozAo1zAo12\ : out \(,)unsigned_31\;
1346                    clock : in std_logic;
1347                    resetn : in std_logic);
1348         end entity avgComponent_0;
1349
1350
1351         architecture structural of avgComponent_0 is
1352              signal \szAlG2\ : \(,)unsigned_31\;
1353              signal \countzAlW2\ : \unsigned_31\;
1354              signal \dszAm62\ : \(,)unsigned_31\;
1355              signal \sumzAmk3\ : \unsigned_31\;
1356              signal \reszAnCzAnM2\ : \unsigned_31\;
1357              signal \foozAnZzAnZ2\ : \unsigned_31\;
1358              signal \reszAnfzAnj3\ : \unsigned_31\;
1359              signal \s'zAmC2\ : \(,)unsigned_31\;
1360         begin
1361              \countzAlW2\ <= \szAlG2\.A;
1362
1363              \comp_ins_dszAm62\ : entity accComponent_1
1364                                        port map (\izAob3\ => \izAlE2\,
1365                                                  \foozAoBzAoB2\ => \dszAm62\,
1366                                                  clock => clock,
1367                                                  resetn => resetn);
1368
1369              \sumzAmk3\ <= \dszAm62\.A;
1370
1371              \reszAnCzAnM2\ <= to_unsigned(1, 32);
1372
1373              \foozAnZzAnZ2\ <= \countzAlW2\ + \reszAnCzAnM2\;
1374
1375              \reszAnfzAnj3\ <= \sumzAmk3\ * \foozAnZzAnZ2\;
1376
1377              \s'zAmC2\.A <= \foozAnZzAnZ2\;
1378
1379              \foozAo1zAo12\.A <= \reszAnfzAnj3\;
1380
1381              state : process (clock, resetn)
1382              begin
1383                   if resetn = '0' then
1384                   elseif rising_edge(clock) then
1385                        \szAlG2\ <= \s'zAmC2\;
1386                   end if;
1387              end process state;
1388         end architecture structural;
1389       \stopbuffer
1390
1391       \startbuffer[AvgStateTypes]
1392         package types is
1393              subtype \unsigned_31\ is unsigned (0 to 31);
1394
1395              type \(,)unsigned_31\ is record
1396                    A : \unsigned_31\;
1397              end record;
1398         end package types;
1399       \stopbuffer
1400
1401       \startbuffer[AccStateVHDL]
1402         entity accComponent_1 is
1403              port (\izAob3\ : in \unsigned_31\;
1404                    \foozAoBzAoB2\ : out \(,)unsigned_31\;
1405                    clock : in std_logic;
1406                    resetn : in std_logic);
1407         end entity accComponent_1;
1408
1409         architecture structural of accComponent_1 is
1410              signal \szAod3\ : \unsigned_31\;
1411              signal \reszAonzAor3\ : \unsigned_31\;
1412         begin
1413              \reszAonzAor3\ <= \szAod3\ + \izAob3\;
1414
1415              \foozAoBzAoB2\.A <= \reszAonzAor3\;
1416
1417              state : process (clock, resetn)
1418              begin
1419                   if resetn = '0' then
1420                   elseif rising_edge(clock) then
1421                        \szAod3\ <= \reszAonzAor3\;
1422                   end if;
1423              end process state;
1424         end architecture structural;
1425       \stopbuffer
1426
1427       \placeexample[][ex:AvgStateTypes]{\VHDL\ types generated for acc and avg from \in{example}[ex:AvgState]}
1428           {\typebuffervhdl{AvgStateTypes}}
1429       \placeexample[][ex:AccStateVHDL]{\VHDL\ generated for acc from \in{example}[ex:AvgState]}
1430           {\typebuffervhdl{AccStateVHDL}}
1431       \placeexample[][ex:AvgStateVHDL]{\VHDL\ generated for avg from \in{example}[ex:AvgState]}
1432           {\typebuffervhdl{AvgStateVHDL}}
1433 %    \subsection{Initial state}
1434 %      How to specify the initial state? Cannot be done inside a hardware
1435 %      function, since the initial state is its own state argument for the first
1436 %      call (unless you add an explicit, synchronous reset port).
1437 %
1438 %      External init state is natural for simulation.
1439 %
1440 %      External init state works for hardware generation as well.
1441 %
1442 %      Implementation issues: state splitting, linking input to output state,
1443 %      checking usage constraints on state variables.
1444 %
1445 %
1446 % vim: set sw=2 sts=2 expandtab: