Chapters/Prototype.tex

   1 \chapter[chap:prototype]{Prototype}
   2   An important step in this research is the creation of a prototype compiler.
   3   Having this prototype allows us to apply the ideas from the previous chapter
   4   to actual hardware descriptions and evaluate their usefulness. Having a
   5   prototype also helps to find new techniques and test possible
   6   interpretations.
   7
   8   Obviously the prototype was not created after all research
   9   ideas were formed, but its implementation has been interleaved with the
  10   research itself. Also, the prototype described here is the final version, it
  11   has gone through a number of design iterations which we will not completely
  12   describe here.
  13
  14   \section[sec:prototype:input]{Input language}
  15     When implementing this prototype, the first question to ask is:
  16     Which (functional) language will be used to describe our hardware?
  17     (Note that this does not concern the \emph{implementation language}
  18     of the compiler, just the language \emph{translated by} the
  19     compiler).
  20
  21     Initially, we have two choices:
  22
  23     \startitemize
  24       \item Create a new functional language from scratch. This has the
  25       advantage of having a language that contains exactly those elements that
  26       are convenient for describing hardware and can contain special
  27       constructs that allows our hardware descriptions to be more powerful or
  28       concise.
  29       \item Use an existing language and create a new back-end for it. This has
  30       the advantage that existing tools can be reused, which will speed up
  31       development.
  32     \stopitemize
  33
  34
  35     \placeintermezzo{}{
  36       \startframedtext[width=8cm,background=box,frame=no]
  37       \startalignment[center]
  38         {\tfa No \small{EDSL} or Template Haskell}
  39       \stopalignment
  40       \blank[medium]
  41
  42       Note that in this consideration, embedded domain-specific
  43       languages (\small{EDSL}) and Template Haskell (\small{TH})
  44       approaches have not been included. As we have seen in
  45       \in{section}[sec:context:fhdls], these approaches have all kinds
  46       of limitations on the description language that we would like to
  47       avoid.
  48       \stopframedtext
  49     }
  50     Considering that we required a prototype which should be working quickly,
  51     and that implementing parsers, semantic checkers and especially
  52     type-checkers is not exactly the core of this research (but it is lots and
  53     lots of work, using an existing language is the obvious choice. This
  54     also has the advantage that a large set of language features is available
  55     to experiment with and it is easy to find which features apply well and
  56     which do not. Another important advantage of using an existing language, is
  57     that simulation of the code becomes trivial. Since there are existing
  58     compilers and interpreters that can run the hardware description directly,
  59     it can be simulated without also having to write an interpreter for the
  60     new language.
  61
  62     A possible second prototype could use a custom language with just the useful
  63     features (and possibly extra features that are specific to
  64     the domain of hardware description as well).
  65
  66     The second choice to be made is which of the many existing languages to use. As
  67     mentioned before, the chosen language is Haskell.  This choice has not been the
  68     result of a thorough comparison of languages, for the simple reason that
  69     the requirements on the language were completely unclear at the start of
  70     this research. The fact that Haskell is a language with a broad spectrum
  71     of features, that it is commonly used in research projects and that the
  72     primary compiler, \GHC, provides a high level API to its internals, made
  73     Haskell an obvious choice.
  74
  75   \section[sec:prototype:output]{Output format}
  76     The second important question is: what will be our output format?
  77     This output format should at least allow for programming the
  78     hardware design into a field-programmable gate array (\small{FPGA}).
  79     The choice of output format is thus limited by what hardware
  80     synthesis and programming tools can process.
  81
  82     Looking at other tools in the industry, the Electronic Design Interchange
  83     Format (\small{EDIF}) is commonly used for storing intermediate
  84     \emph{netlists} (lists of components and connections between these
  85     components) and is commonly the target for \small{VHDL} and Verilog
  86     compilers.
  87
  88     However, \small{EDIF} is not completely tool-independent. It specifies a
  89     meta-format, but the hardware components that can be used vary between
  90     various tool and hardware vendors, as well as the interpretation of the
  91     \small{EDIF} standard. \cite[li89]
  92
  93     This means that when working with \small{EDIF}, our prototype would become
  94     technology dependent (\eg\ only work with \small{FPGA}s of a specific
  95     vendor, or even only with specific chips). This limits the applicability
  96     of our prototype. Also, the tools we would like to use for verifying,
  97     simulating and draw pretty pictures of our output (like Precision, or
  98     QuestaSim) are designed for \small{VHDL} or Verilog input.
  99
 100     For these reasons, we will not use \small{EDIF}, but \small{VHDL} as our
 101     output language.  We choose \VHDL\ over Verilog simply because we are
 102     familiar with \small{VHDL} already. The differences between \small{VHDL}
 103     and Verilog are on the higher level, while we will be using \small{VHDL}
 104     mainly to write low level, netlist-like descriptions anyway.
 105
 106     An added advantage of using \VHDL\ is that we can profit from existing
 107     optimizations in \VHDL\ synthesizers. A lot of optimizations are done on the
 108     \VHDL\ level by existing tools. These tools have been under
 109     development for years, so it would not be reasonable to assume we
 110     could achieve a similar amount of optimization in our prototype (nor
 111     should it be a goal, considering this is just a prototype).
 112
 113     \placeintermezzo{}{
 114       \startframedtext[width=8cm,background=box,frame=no]
 115       \startalignment[center]
 116         {\tfa Translation vs. compilation vs. synthesis}
 117       \stopalignment
 118       \blank[medium]
 119         In this thesis the words \emph{translation}, \emph{compilation} and
 120         sometimes \emph{synthesis} will be used interchangeably to refer to the
 121         process of translating the hardware description from the Haskell
 122         language to the \VHDL\ language.
 123
 124         Similarly, the prototype created is referred to as both the
 125         \emph{translator} as well as the \emph{compiler}.
 126
 127         The final part of this process is usually referred to as \emph{\VHDL\
 128         generation}.
 129       \stopframedtext
 130     }
 131
 132     Note that we will be using \small{VHDL} as our output language, but will
 133     not use its full expressive power. Our output will be limited to using
 134     simple, structural descriptions, without any complex behavioral
 135     descriptions like arbitrary sequential statements (which might not
 136     be supported by all tools). This ensures that any tool that works
 137     with \VHDL\ will understand our output. This also leaves open the
 138     option to switch to \small{EDIF} in the future, with minimal changes
 139     to the prototype.
 140
 141   \section{Simulation and synthesis}
 142     As mentioned above, by using the Haskell language, we get simulation of
 143     our hardware descriptions almost for free. The only thing that is needed
 144     is to provide a Haskell implementation of all built-in functions that can
 145     be used by the Haskell interpreter to simulate them.
 146
 147     The main topic of this thesis is therefore the path from the Haskell
 148     hardware descriptions to \small{FPGA} synthesis, focusing of course on the
 149     \VHDL\ generation. Since the \VHDL\ generation process preserves the meaning
 150     of the Haskell description exactly, any simulation done in Haskell
 151     \emph{should} produce identical results as the synthesized hardware.
 152
 153   \section[sec:prototype:design]{Prototype design}
 154     As suggested above, we will use the Glasgow Haskell Compiler (\small{GHC}) to
 155     implement our prototype compiler. To understand the design of the
 156     prototype, we will first dive into the \small{GHC} compiler a bit. Its
 157     compilatprototype consists of the following steps (slightly simplified):
 158
 159     \startuseMPgraphic{ghc-pipeline}
 160       % Create objects
 161       save inp, front, desugar, simpl, back, out;
 162       newEmptyBox.inp(0,0);
 163       newBox.front(btex Frontend etex);
 164       newBox.desugar(btex Desugarer etex);
 165       newBox.simpl(btex Simplifier etex);
 166       newBox.back(btex Backend etex);
 167       newEmptyBox.out(0,0);
 168
 169       % Space the boxes evenly
 170       inp.c - front.c = front.c - desugar.c = desugar.c - simpl.c
 171         = simpl.c - back.c = back.c - out.c = (0, 1.5cm);
 172       out.c = origin;
 173
 174       % Draw lines between the boxes. We make these lines "deferred" and give
 175       % them a name, so we can use ObjLabel to draw a label beside them.
 176       ncline.inp(inp)(front) "name(haskell)";
 177       ncline.front(front)(desugar) "name(ast)";
 178       ncline.desugar(desugar)(simpl) "name(core)";
 179       ncline.simpl(simpl)(back) "name(simplcore)";
 180       ncline.back(back)(out) "name(native)";
 181       ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
 182       ObjLabel.front(btex Haskell AST etex) "labpathname(ast)", "labdir(rt)";
 183       ObjLabel.desugar(btex Core etex) "labpathname(core)", "labdir(rt)";
 184       ObjLabel.simpl(btex Simplified core etex) "labpathname(simplcore)", "labdir(rt)";
 185       ObjLabel.back(btex Native code etex) "labpathname(native)", "labdir(rt)";
 186
 187       % Draw the objects (and deferred labels)
 188       drawObj (inp, front, desugar, simpl, back, out);
 189     \stopuseMPgraphic
 190     \placefigure[right]{GHC compiler pipeline}{\startboxed \useMPgraphic{ghc-pipeline}\stopboxed}
 191
 192     \startdesc{Frontend}
 193       This step takes the Haskell source files and parses them into an
 194       abstract syntax tree (\small{AST}). This \small{AST} can express the
 195       complete Haskell language and is thus a very complex one (in contrast
 196       with the Core \small{AST}, later on). All identifiers in this
 197       \small{AST} are resolved by the renamer and all types are checked by the
 198       type-checker.
 199     \stopdesc
 200     \startdesc{Desugaring}
 201       This step takes the full \small{AST} and translates it to the
 202       \emph{Core} language. Core is a very small functional language with lazy
 203       semantics, that can still express everything Haskell can express. Its
 204       simpleness makes Core very suitable for further simplification and
 205       translation. Core is the language we will be working with as well.
 206     \stopdesc
 207     \startdesc{Simplification}
 208       Through a number of simplification steps (such as inlining, common
 209       sub-expression elimination, etc.) the Core program is simplified to make
 210       it faster or easier to process further.
 211     \stopdesc
 212     \startdesc{Backend}
 213       This step takes the simplified Core program and generates an actual
 214       runnable program for it. This is a big and complicated step we will not
 215       discuss it any further, since it is not relevant to our prototype.
 216     \stopdesc
 217
 218     In this process, there are a number of places where we can start our work.
 219     Assuming that we do not want to deal with (or modify) parsing, type-checking
 220     and other front end business and that native code is not really a useful
 221     format anymore, we are left with the choice between the full Haskell
 222     \small{AST}, or the smaller (simplified) Core representation.
 223
 224     The advantage of taking the full \small{AST} is that the exact structure
 225     of the source program is preserved. We can see exactly what the hardware
 226     description looks like and which syntax constructs were used. However,
 227     the full \small{AST} is a very complicated data-structure. If we are to
 228     handle everything it offers, we will quickly get a big compiler.
 229
 230     Using the Core representation gives us a much more compact data-structure
 231     (a Core expression only uses 9 constructors). Note that this does not mean
 232     that the Core representation itself is smaller, on the contrary.
 233     Since the Core language has less constructs, most Core expressions
 234     are larger than the equivalent versions in Haskell.
 235
 236     However, the fact that the Core language is so much smaller, means it is a
 237     lot easier to analyze and translate it into something else. For the same
 238     reason, \small{GHC} runs its simplifications and optimizations on the Core
 239     representation as well \cite[jones96].
 240
 241     We will use the normal Core representation, not the simplified Core. Even
 242     though the simplified Core version is an equivalent, but simpler
 243     definition, some problems were encountered with it in practice. The
 244     simplifier restructures some (stateful) functions in a way the normalizer
 245     and the \VHDL\ generation cannot handle, leading to uncompilable programs
 246     (whereas the non-simplified version more closely resembles the original
 247     program, allowing the original to be written in a way that can be
 248     handled). This problem is further discussed in
 249     \in{section}[sec:normalization:stateproblems].
 250
 251     \startuseMPgraphic{clash-pipeline}
 252       % Create objects
 253       save inp, front, norm, vhdl, out;
 254       newEmptyBox.inp(0,0);
 255       newBox.front(btex \small{GHC} front-end etex);
 256       newBox.norm(btex Normalization etex);
 257       newBox.vhdl(btex \small{VHDL} generation etex);
 258       newEmptyBox.out(0,0);
 259
 260       % Space the boxes evenly
 261       inp.c - front.c = front.c - norm.c = norm.c - vhdl.c
 262         = vhdl.c - out.c = (0, 1.5cm);
 263       out.c = origin;
 264
 265       % Draw lines between the boxes. We make these lines "deferred" and give
 266       % them a name, so we can use ObjLabel to draw a label beside them.
 267       ncline.inp(inp)(front) "name(haskell)";
 268       ncline.front(front)(norm) "name(core)";
 269       ncline.norm(norm)(vhdl) "name(normal)";
 270       ncline.vhdl(vhdl)(out) "name(vhdl)";
 271       ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
 272       ObjLabel.front(btex Core etex) "labpathname(core)", "labdir(rt)";
 273       ObjLabel.norm(btex Normalized core etex) "labpathname(normal)", "labdir(rt)";
 274       ObjLabel.vhdl(btex \small{VHDL} description etex) "labpathname(vhdl)", "labdir(rt)";
 275
 276       % Draw the objects (and deferred labels)
 277       drawObj (inp, front, norm, vhdl, out);
 278     \stopuseMPgraphic
 279     \placefigure[right]{Cλash compiler pipeline}{\startboxed \useMPgraphic{clash-pipeline}\stopboxed}
 280
 281     The final prototype roughly consists of three steps:
 282
 283     \page[no] % suppress page break here.
 284     \startdesc{Frontend}
 285       This is exactly the front-end from the \small{GHC} pipeline, that
 286       translates Haskell sources to a typed Core representation.
 287     \stopdesc
 288     \startdesc{Normalization}
 289       This is a step that transforms the Core representation into a normal
 290       form. This normal form is still expressed in the Core language, but has
 291       to adhere to an additional set of constraints. This normal form is less
 292       expressive than the full Core language (e.g., it can have limited
 293       higher-order expressions, has a specific structure, etc.), but is
 294       also very close to directly describing hardware.
 295     \stopdesc
 296     \startdesc{\small{VHDL} generation}
 297       The last step takes the normal formed Core representation and generates
 298       \small{VHDL} for it. Since the normal form has a specific, hardware-like
 299       structure, this final step is very straightforward.
 300     \stopdesc
 301
 302     The most interesting step in this process is the normalization step. That
 303     is where more complicated functional constructs, which have no direct
 304     hardware interpretation, are removed and translated into hardware
 305     constructs. This step is described in a lot of detail at
 306     \in{chapter}[chap:normalization].
 307
 308
 309     \defref{entry function}Translation of a hardware description always
 310     starts at a single function, which is referred to as the \emph{entry
 311     function}. \VHDL\ is generated for this function first, followed by
 312     any functions used by the entry functions (recursively).
 313
 314   \section[sec:prototype:core]{The Core language}
 315     \defreftxt{Core}{the Core language}
 316     Most of the prototype deals with handling the program in the Core
 317     language. In this section we will show what this language looks like and
 318     how it works.
 319
 320     The Core language is a functional language that describes
 321     \emph{expressions}. Every identifier used in Core is called a
 322     \emph{binder}, since it is bound to a value somewhere. On the highest
 323     level, a Core program is a collection of functions, each of which bind a
 324     binder (the function name) to an expression (the function value, which has
 325     a function type).
 326
 327     The Core language itself does not prescribe any program structure
 328     (like modules, declarations, imports, etc.), only expression
 329     structure. In the \small{GHC} compiler, the Haskell module structure
 330     is used for the resulting Core code as well. Since this is not so
 331     relevant for understanding the Core language or the Normalization
 332     process, we will only look at the Core expression language here.
 333
 334     Each Core expression consists of one of these possible expressions.
 335
 336     \startdesc{Variable reference}
 337       \defref{variable reference}
 338       \startlambda
 339       bndr :: T
 340       \stoplambda
 341       This is a reference to a binder. It is written down as the
 342       name of the binder that is being referred to along with its type. The
 343       binder name should of course be bound in a containing scope
 344       (including top level scope, so a reference to a top level function
 345       is also a variable reference). Additionally, constructors from
 346       algebraic data-types also become variable references (\eg\
 347       \lam{True}).
 348
 349       In our examples, binders will commonly consist of a single
 350       characters, but they can have any length.
 351
 352       The value of this expression is the value bound to the given
 353       binder.
 354
 355       Each binder also carries around its type (explicitly shown above), but
 356       this is usually not shown in the Core expressions. Only when the type is
 357       relevant (when a new binder is introduced, for example) will it be
 358       shown. In other cases, the type is either not relevant, or easily
 359       derived from the context of the expression. \todo{Ref sidenote on type
 360       annotations}
 361     \stopdesc
 362
 363     \startdesc{Literal}
 364       \defref{literal}
 365       \startlambda
 366       10
 367       \stoplambda
 368       This is a literal. Only primitive types are supported, like
 369       chars, strings, integers and doubles. The types of these literals are the
 370       \quote{primitive}, unboxed versions, like \lam{Char\#} and \lam{Word\#}, not the
 371       normal Haskell versions (but there are built-in conversion
 372       functions). Without going into detail about these types, note that
 373       a few conversion functions exist to convert these to the normal
 374       (boxed) Haskell equivalents. See
 375       \in{section}[sec:normalization:literals] for an example.
 376     \stopdesc
 377
 378     \startdesc{Application}
 379       \defref{application}
 380       \startlambda
 381       func arg
 382       \stoplambda
 383       This is function application. Each application consists of two
 384       parts: the function part and the argument part. Applications are used
 385       for normal function \quote{calls}, but also for applying type
 386       abstractions and data constructors.
 387
 388       The value of an application is the value of the function part, with the
 389       first argument binder bound to the argument part.
 390
 391       In Core, there is no distinction between an operator and a
 392       function. This means that, for example the addition of two numbers
 393       looks like the following in Core:
 394
 395       \startlambda
 396       (+) 1 2
 397       \stoplambda
 398
 399       Where the function \quote{\lam{(+)}} is applied to the numbers 1
 400       and 2. However, to increase readability, an application of an
 401       operator like \lam{(+)} is sometimes written infix. In this case,
 402       the parenthesis are also left out, just like in Haskell. In other
 403       words, the following means exactly the same as the addition above:
 404
 405       \startlambda
 406       1 + 2
 407       \stoplambda
 408     \stopdesc
 409
 410     \startdesc{Lambda abstraction}
 411       \defref{lambda abstraction}
 412       \startlambda
 413       λbndr.body
 414       \stoplambda
 415       This is the basic lambda abstraction, as it occurs in lambda calculus.
 416       It consists of a binder part and a body part.  A lambda abstraction
 417       creates a function, that can be applied to an argument. The binder is
 418       usually a value binder, but it can also be a \emph{type binder} (or
 419       \emph{type variable}). The latter case introduces a new polymorphic
 420       variable, which can be used in types later on. See
 421       \in{section}[sec:prototype:coretypes] for details.
 422
 423       The body of a lambda abstraction extends all the way to the end of
 424       the expression, or the closing bracket surrounding the lambda. In
 425       other words, the lambda abstraction \quote{operator} has the
 426       lowest priority of all.
 427
 428       The value of an application is the value of the body part, with the
 429       binder bound to the value the entire lambda abstraction is applied to.
 430     \stopdesc
 431
 432     \startdesc{Non-recursive let expression}
 433       \defref{let expression}
 434       \startlambda
 435       let bndr = value in body
 436       \stoplambda
 437       A let expression allows you to bind a binder to some value, while
 438       evaluating to some other value (for which that binder is in scope). This
 439       allows for sharing of sub-expressions (you can use a binder twice) and
 440       explicit \quote{naming} of arbitrary expressions. A binder is not
 441       in scope in the value bound it is bound to, so it is not possible
 442       to make recursive definitions with a non-recursive let expression
 443       (see the recursive form below).
 444
 445       Even though this let expression is an extension on the basic lambda
 446       calculus, it is easily translated to a lambda abstraction. The let
 447       expression above would then become:
 448
 449       \startlambda
 450       (λbndr.body) value
 451       \stoplambda
 452
 453       This notion might be useful for verifying certain properties on
 454       transformations, since a lot of verification work has been done on
 455       lambda calculus already.
 456
 457       The value of a let expression is the value of the body part, with the
 458       binder bound to the value.
 459     \stopdesc
 460
 461     \startdesc{Recursive let expression}
 462       \startlambda
 463       letrec
 464         bndr1 = value1
 465         \vdots
 466         bndrn = valuen
 467       in
 468         body
 469       \stoplambda
 470       This is the recursive version of the let expression. In \small{GHC}'s
 471       Core implementation, non-recursive and recursive lets are not so
 472       distinct as we present them here, but this provides a clearer overview.
 473
 474       The main difference with the normal let expression is that it can
 475       contain multiple bindings (or even none) and each of the binders
 476       is in scope in each of the values, in addition to the body. This
 477       allows for self-recursive or mutually recursive definitions.
 478
 479       It is also possible to express a recursive let expression using
 480       normal lambda calculus, if we use the \emph{least fixed-point
 481       operator}, \lam{Y} (but the details are too complicated to help
 482       clarify the let expression, so this will not be explored further).
 483     \stopdesc
 484
 485     \placeintermezzo{}{
 486       \startframedtext[width=8cm,background=box,frame=no]
 487       \startalignment[center]
 488         {\tfa Weak head normal form (\small{WHNF})}
 489       \stopalignment
 490       \blank[medium]
 491         An expression is in weak head normal form if it is either an
 492         constructor application or lambda abstraction. \cite[jones87]
 493
 494         Without going into detail about the differences with head
 495         normal form and normal form, note that evaluating the scrutinee
 496         of a case expression to normal form (evaluating any function
 497         applications, variable references and case expressions) is
 498         sufficient to decide which case alternatives should be chosen.
 499         \todo{ref?}
 500       \stopframedtext
 501
 502     }
 503
 504     \startdesc{Case expression}
 505       \defref{case expression}
 506       \startlambda
 507         case scrutinee of bndr
 508           DEFAULT -> defaultbody
 509           C0 bndr0,0 ... bndr0,m -> body0
 510           \vdots
 511           Cn bndrn,0 ... bndrn,m -> bodyn
 512       \stoplambda
 513
 514       A case expression is the only way in Core to choose between values. All
 515       \hs{if} expressions and pattern matchings from the original Haskell
 516       program have been translated to case expressions by the desugarer.
 517
 518       A case expression evaluates its scrutinee, which should have an
 519       algebraic datatype, into weak head normal form (\small{WHNF}) and
 520       (optionally) binds it to \lam{bndr}. If bndr is wild, \refdef{wild
 521       binders} it is left out.  Every alternative lists a single constructor
 522       (\lam{C0 ... Cn}). Based on the actual constructor of the scrutinee, the
 523       corresponding alternative is chosen. The binders in the chosen
 524       alternative (\lam{bndr0,0 ....  bndr0,m} are bound to the actual
 525       arguments to the constructor in the scrutinee.
 526
 527       This is best illustrated with an example. Assume
 528       there is an algebraic datatype declared as follows\footnote{This
 529       datatype is not supported by the current Cλash implementation, but
 530       serves well to illustrate the case expression}:
 531
 532       \starthaskell
 533       data D = A Word | B Bit
 534       \stophaskell
 535
 536       This is an algebraic datatype with two constructors, each getting
 537       a single argument. A case expression scrutinizing this datatype
 538       could look like the following:
 539
 540       \startlambda
 541         case s of
 542           A word -> High
 543           B bit -> bit
 544       \stoplambda
 545
 546       What this expression does is check the constructor of the
 547       scrutinee \lam{s}. If it is \lam{A}, it always evaluates to
 548       \lam{High}. If the constructor is \lam{B}, the binder \lam{bit} is
 549       bound to the argument passed to \lam{B} and the case expression
 550       evaluates to this bit.
 551
 552       If none of the alternatives match, the \lam{DEFAULT} alternative
 553       is chosen. A case expression must always be exhaustive, \ie\ it
 554       must cover all possible constructors that the scrutinee can have
 555       (if all of them are covered explicitly, the \lam{DEFAULT}
 556       alternative can be left out).
 557
 558       Since we can only match the top level constructor, there can be no overlap
 559       in the alternatives and thus order of alternatives is not relevant (though
 560       the \lam{DEFAULT} alternative must appear first for implementation
 561       efficiency).
 562
 563       To support strictness, the scrutinee is always evaluated into
 564       \small{WHNF}, even when there is only a \lam{DEFAULT} alternative. This
 565       allows application of the strict function \lam{f} to the argument \lam{a}
 566       to be written like:
 567
 568       \startlambda
 569       f (case a of arg DEFAULT -> arg)
 570       \stoplambda
 571
 572       According to the \GHC\ documentation, this is the only use for the extra
 573       binder to which the scrutinee is bound.  When not using strictness
 574       annotations (which is rather pointless in hardware descriptions),
 575       \small{GHC} seems to never generate any code making use of this binder.
 576       In fact, \GHC\ has never been observed to generate code using this
 577       binder, even when strictness was involved.  Nonetheless, the prototype
 578       handles this binder as expected.
 579
 580       Note that these case expressions are less powerful than the full Haskell
 581       case expressions. In particular, they do not support complex patterns like
 582       in Haskell. Only the constructor of an expression can be matched,
 583       complex patterns are implemented using multiple nested case expressions.
 584
 585       Case expressions are also used for unpacking of algebraic data-types, even
 586       when there is only a single constructor. For examples, to add the elements
 587       of a tuple, the following Core is generated:
 588
 589       \startlambda
 590       sum = λtuple.case tuple of
 591         (,) a b -> a + b
 592       \stoplambda
 593
 594       Here, there is only a single alternative (but no \lam{DEFAULT}
 595       alternative, since the single alternative is already exhaustive). When
 596       its body is evaluated, the arguments to the tuple constructor \lam{(,)}
 597       (\eg, the elements of the tuple) are bound to \lam{a} and \lam{b}.
 598     \stopdesc
 599
 600     \startdesc{Cast expression}
 601       \defref{cast expression}
 602       \startlambda
 603       body ▶ targettype
 604       \stoplambda
 605       A cast expression allows you to change the type of an expression to an
 606       equivalent type. Note that this is not meant to do any actual work, like
 607       conversion of data from one format to another, or force a complete type
 608       change. Instead, it is meant to change between different representations
 609       of the same type, \eg\ switch between types that are provably equal (but
 610       look different).
 611
 612       In our hardware descriptions, we typically see casts to change between a
 613       Haskell newtype and its contained type, since those are effectively
 614       different types (so a cast is needed) with the same representation (but
 615       no work is done by the cast).
 616
 617       More complex are types that are proven to be equal by the type-checker,
 618       but look different at first glance. To ensure that, once the type-checker
 619       has proven equality, this information sticks around, explicit casts are
 620       added. In our notation we only write the target type, but in reality a
 621       cast expressions carries around a \emph{coercion}, which can be seen as a
 622       proof of equality. \todo{Example}
 623
 624       The value of a cast is the value of its body, unchanged. The type of this
 625       value is equal to the target type, not the type of its body.
 626     \stopdesc
 627
 628     \startdesc{Note}
 629       The Core language in \small{GHC} allows adding \emph{notes}, which serve
 630       as hints to the inliner or add custom (string) annotations to a Core
 631       expression. These should not be generated normally, so these are not
 632       handled in any way in the prototype.
 633     \stopdesc
 634
 635     \startdesc{Type}
 636       \defref{type expression}
 637       \startlambda
 638       @T
 639       \stoplambda
 640       It is possibly to use a Core type as a Core expression. To prevent
 641       confusion between types and values, the \lam{@} sign is used to
 642       explicitly mark a type that is used in a Core expression.
 643
 644       For the actual types supported by Core, see
 645       \in{section}[sec:prototype:coretypes]. This \quote{lifting} of a
 646       type into the value domain is done to allow for type abstractions
 647       and applications to be handled as normal lambda abstractions and
 648       applications above. This means that a type expression in Core can
 649       only ever occur in the argument position of an application, and
 650       only if the type of the function that is applied to expects a type
 651       as the first argument. This happens in applications of all
 652       polymorphic functions. Consider the \lam{fst} function:
 653
 654       \startlambda
 655       fst :: \forall t1. \forall t2. (t1, t2) ->t1
 656       fst = λt1.λt2.λ(tup :: (t1, t2)). case tup of (,) a b -> a
 657
 658       fstint :: (Int, Int) -> Int
 659       fstint = λa.λb.fst @Int @Int a b
 660       \stoplambda
 661
 662       The type of \lam{fst} has two universally quantified type variables. When
 663       \lam{fst} is applied in \lam{fstint}, it is first applied to two types.
 664       (which are substituted for \lam{t1} and \lam{t2} in the type of \lam{fst}, so
 665       the actual type of arguments and result of \lam{fst} can be found:
 666       \lam{fst @Int @Int :: (Int, Int) -> Int}).
 667     \stopdesc
 668
 669     \subsection[sec:prototype:coretypes]{Core type system}
 670       Whereas the expression syntax of Core is very simple, its type system is
 671       a bit more complicated. It turns out it is harder to \quote{desugar}
 672       Haskell's complex type system into something more simple. Most of the
 673       type system is thus very similar to that of Haskell.
 674
 675       We will slightly limit our view on Core's type system, since the more
 676       complicated parts of it are only meant to support Haskell's (or rather,
 677       \GHC's) type extensions, such as existential types, \small{GADT}s, type
 678       families and other non-standard Haskell stuff which we do not (plan to)
 679       support.
 680
 681       \placeintermezzo{}{
 682         \defref{id function}
 683         \startframedtext[width=8cm,background=box,frame=no]
 684         \startalignment[center]
 685           {\tfa The \hs{id} function}
 686         \stopalignment
 687         \blank[medium]
 688           A function that is probably present in every functional language, is
 689           the \emph{identity} function. This is the function that takes a
 690           single argument and simply returns it unmodified. In Haskell this
 691           function is called \hs{id} and can take an argument of any type
 692           (\ie, it is polymorphic).
 693
 694           The \hs{id} function will be used in the examples every now and
 695           then.
 696         \stopframedtext
 697       }
 698       In Core, every expression is typed. The translation to Core happens
 699       after the type-checker, so types in Core are always correct as well
 700       (though you could of course construct invalidly typed expressions
 701       through the \GHC\ API).
 702
 703       Any type in Core is one of the following:
 704
 705       \startdesc{A type variable}
 706         \startlambda
 707         t
 708         \stoplambda
 709
 710         This is a reference to a type defined elsewhere. This can either be a
 711         polymorphic type (like the latter two \lam{t}'s in \lam{id :: \forall t.
 712         t -> t}), or a type constructor (like \lam{Bool} in \lam{not :: Bool ->
 713         Bool}). Like in Haskell, polymorphic type variables always
 714         start with a lowercase letter, while type constructors always start
 715         with an uppercase letter.
 716
 717         \todo{How to define (new) type constructors?}
 718
 719         A special case of a type constructor is the \emph{function type
 720         constructor}, \lam{->}. This is a type constructor taking two arguments
 721         (using application below). The function type constructor is commonly
 722         written inline, so we write \lam{a -> b} when we really mean \lam{-> a
 723         b}, the function type constructor applied to \lam{a} and \lam{b}.
 724
 725         Polymorphic type variables can only be defined by a lambda
 726         abstraction, see the forall type below.
 727       \stopdesc
 728
 729       \startdesc{A type application}
 730         \startlambda
 731           Maybe Int
 732         \stoplambda
 733
 734         This applies some type to another type. This is particularly used to
 735         apply type variables (type constructors) to their arguments.
 736
 737         As mentioned above, applications of some type constructors have
 738         special notation. In particular, these are applications of the
 739         \emph{function type constructor} and \emph{tuple type constructors}:
 740         \startlambda
 741           foo :: t1 -> t2
 742           foo' :: -> t1 t2
 743           bar :: (t1, t2, t3)
 744           bar' :: (,,) t1 t2 t3
 745         \stoplambda
 746       \stopdesc
 747
 748       \startdesc{The forall type}
 749         \startlambda
 750           id :: \forall t. t -> t
 751         \stoplambda
 752         The forall type introduces polymorphism. It is the only way to
 753         introduce new type variables, which are completely unconstrained (Any
 754         possible type can be assigned to it). Constraints can be added later
 755         using predicate types, see below.
 756
 757         A forall type is always (and only) introduced by a type lambda
 758         expression. For example, the Core translation of the
 759         id function is:
 760         \startlambda
 761           id = λt.λ(x :: t).x
 762         \stoplambda
 763
 764         Here, the type of the binder \lam{x} is \lam{t}, referring to the
 765         binder in the topmost lambda.
 766
 767         When using a value with a forall type, the actual type
 768         used must be applied first. For example Haskell expression \hs{id
 769         True} (the function \hs{id} applied to the data-constructor \hs{True})
 770         translates to the following Core:
 771
 772         \startlambda
 773         id @Bool True
 774         \stoplambda
 775
 776         Here, id is first applied to the type to work with. Note that the type
 777         then changes from \lam{id :: \forall t. t -> t} to \lam{id @Bool ::
 778         Bool -> Bool}. Note that the type variable \lam{a} has been
 779         substituted with the actual type.
 780
 781         In Haskell, forall types are usually not explicitly specified (The use
 782         of a lowercase type variable implicitly introduces a forall type for
 783         that variable). In fact, in standard Haskell there is no way to
 784         explicitly specify forall types. Through a language extension, the
 785         \hs{forall} keyword is available, but still optional for normal forall
 786         types (it is needed for \emph{existentially quantified types}, which
 787         Cλash does not support).
 788       \stopdesc
 789
 790       \startdesc{Predicate type}
 791         \startlambda
 792           show :: \forall t. Show t ⇒ t → String
 793         \stoplambda
 794
 795         \todo{Sidenote: type classes?}
 796
 797         A predicate type introduces a constraint on a type variable introduced
 798         by a forall type (or type lambda). In the example above, the type
 799         variable \lam{t} can only contain types that are an \emph{instance} of
 800         the \emph{type class} \lam{Show}.
 801
 802         There are other sorts of predicate types, used for the type families
 803         extension, which we will not discuss here.
 804
 805         A predicate type is introduced by a lambda abstraction. Unlike with
 806         the forall type, this is a value lambda abstraction, that must be
 807         applied to a value. We call this value a \emph{dictionary}.
 808
 809         Without going into the implementation details, a dictionary can be
 810         seen as a lookup table all the methods for a given (single) type class
 811         instance. This means that all the dictionaries for the same type class
 812         look the same (\eg\ contain methods with the same names). However,
 813         dictionaries for different instances of the same class contain
 814         different methods, of course.
 815
 816         A dictionary is introduced by \small{GHC} whenever it encounters an
 817         instance declaration. This dictionary, as well as the binder
 818         introduced by a lambda that introduces a dictionary, have the
 819         predicate type as their type. These binders are usually named starting
 820         with a \lam{\$}. Usually the name of the type concerned is not
 821         reflected in the name of the dictionary, but the name of the type
 822         class is. The Haskell expression \hs{show True} thus becomes:
 823
 824         \startlambda
 825         show @Bool \$dShow True
 826         \stoplambda
 827       \stopdesc
 828
 829       Using this set of types, all types in basic Haskell can be represented.
 830       \todo{Overview of polymorphism with more examples (or move examples
 831       here)}
 832
 833   \section[sec:prototype:statetype]{State annotations in Haskell}
 834     As noted in \in{section}[sec:description:stateann], Cλash needs some
 835     way to let the programmer explicitly specify which of a function's
 836     arguments and which part of a function's result represent the
 837     function's state.
 838
 839     Using the Haskell type systems, there are a few ways we can tackle this.
 840
 841     \subsection{Type synonyms}
 842       Haskell provides type synonyms as a way to declare a new type that is
 843       equal to an existing type (or rather, a new name for an existing type).
 844       This allows both the original type and the synonym to be used
 845       interchangeably in a Haskell program. This means no explicit conversion
 846       is needed. For example, a simple accumulator would become:
 847
 848       \starthaskell
 849       -- This type synonym would become part of Cλash, it is shown here
 850       -- just for clarity.
 851       type State s = s
 852
 853       acc :: Word -> State Word -> (State Word, Word)
 854       acc i s = let sum = s + i in (sum, sum)
 855       \stophaskell
 856
 857       This looks nice in Haskell, but turns out to be hard to implement. There
 858       is no explicit conversion in Haskell, but not in Core either. This
 859       means the type of a value might be shown as \hs{State Word} in
 860       some places, but \hs{Word} in others (and this can even change due
 861       to transformations). Since every binder has an explicit type
 862       associated with it, the type of every function type will be
 863       properly preserved and could be used to track down the
 864       statefulness of each value by the compiler. However, this would make
 865       the implementation a lot more complicated than when using type
 866       renamings as described in the next section.
 867
 868     % Use \type instead of \hs here, since the latter breaks inside
 869     % section headings.
 870     \subsection{Type renaming (\type{newtype})}
 871       Haskell also supports type renamings as a way to declare a new type that
 872       has the same (run-time) representation as an existing type (but is in
 873       fact a different type to the type-checker). With type renaming,
 874       explicit conversion between values of the two types is needed. The
 875       accumulator would then become:
 876
 877       \starthaskell
 878       -- This type renaming would become part of Cλash, it is shown here
 879       -- just for clarity.
 880       newtype State s = State s
 881
 882       acc :: Word -> State Word -> (State Word, Word)
 883       acc i (State s) = let sum = s + i in (State sum, sum)
 884       \stophaskell
 885
 886       The \hs{newtype} line declares a new type \hs{State} that has one type
 887       argument, \hs{s}. This type contains one \quote{constructor} \hs{State}
 888       with a single argument of type \hs{s}. It is customary to name the
 889       constructor the same as the type, which is allowed (since types can
 890       never cause name collisions with values). The difference with the type
 891       synonym example is in the explicit conversion between the \hs{State
 892       Word} and \hs{Word} types by pattern matching and by using the explicit
 893       the \hs{State} constructor.
 894
 895       This explicit conversion makes the \VHDL\ generation easier: whenever we
 896       remove (unpack) the \hs{State} type, this means we are accessing the
 897       current state (\ie, accessing the register output). Whenever we are
 898       adding (packing) the \hs{State} type, we are producing a new value for
 899       the state (\ie, providing the register input).
 900
 901       When dealing with nested states (a stateful function that calls stateful
 902       functions, which might call stateful functions, etc.) the state type
 903       could quickly grow complex because of all the \hs{State} type constructors
 904       needed. For example, consider the following state type (this is just the
 905       state type, not the entire function type):
 906
 907       \starthaskell
 908       State (State Bit, State (State Word, Bit), Word)
 909       \stophaskell
 910
 911       We cannot leave all these \hs{State} type constructors out, since that
 912       would change the type (unlike when using type synonyms). However, when
 913       using type synonyms to hide away sub-states (see
 914       \in{section}[sec:prototype:substatesynonyms] below), this
 915       disadvantage should be limited.
 916
 917       \subsubsection{Different input and output types}
 918         An alternative could be to use different types for input and output
 919         state (\ie\ current and updated state). The accumulator example would
 920         then become something like:
 921
 922         \starthaskell
 923         -- These type renamings would become part of Cλash, it is shown
 924         -- here just for clarity.
 925         newtype StateIn s = StateIn s
 926         newtype StateOut s = StateOut s
 927
 928         acc :: Word -> StateIn Word -> (StateIn Word, Word)
 929         acc i (StateIn s) = let sum = s + i in (StateIn sum, sum)
 930         \stophaskell
 931
 932         This could make the implementation easier and the hardware
 933         descriptions less error-prone (you can no longer \quote{forget} to
 934         unpack and repack a state variable and just return it directly, which
 935         can be a problem in the current prototype). However, it also means we
 936         need twice as many type synonyms to hide away sub-states, making this
 937         approach a bit cumbersome. It also makes it harder to compare input
 938         and output state types, possible reducing the type-safety of the
 939         descriptions.
 940
 941     \subsection[sec:prototype:substatesynonyms]{Type synonyms for sub-states}
 942       As noted above, when using nested (hierarchical) states, the state types
 943       of the \quote{upper} functions (those that call other functions, which
 944       call other functions, etc.) quickly become complicated. Also, when the
 945       state type of one of the \quote{lower} functions changes, the state
 946       types of all the upper functions changes as well. If the state type for
 947       each function is explicitly and completely specified, this means that a
 948       lot of code needs updating whenever a state type changes.
 949
 950       To prevent this, it is recommended (but not enforced) to use a type
 951       synonym for the state type of every function. Every function calling
 952       other functions will then use the state type synonym of the called
 953       functions in its own type, requiring no code changes when the state type
 954       of a called function changes. This approach is used in
 955       \in{example}[ex:AvgState] below. The \hs{AccState} and \hs{AvgState}
 956       are examples of such state type synonyms.
 957
 958     \subsection{Chosen approach}
 959       To keep implementation simple, the current prototype uses the type
 960       renaming approach, with a single type for both input and output
 961       states. In the future, it might be worthwhile to revisit this
 962       approach if more complicated flow analysis is implemented for
 963       state variables. This analysis is needed to add proper error
 964       checking anyway and might allow the use of type synonyms without
 965       losing any expressivity.
 966
 967       \subsubsection{Example}
 968         As an example of the used approach, a simple averaging circuit
 969         is shown in \in{example}[ex:AvgState]. This circuit lets the
 970         accumulation of the inputs be done by a sub-component, \hs{acc},
 971         but keeps a count of value accumulated in its own
 972         state.\footnote{Currently, the prototype is not able to compile
 973         this example, since there is no built-in function for division.}
 974
 975         \startbuffer[AvgState]
 976           -- This type renaming would become part of Cλash, it is shown
 977           -- here just for clarity
 978           newtype State s = State s
 979
 980           -- The accumulator state type
 981           type AccState = State Word
 982           -- The accumulator
 983           acc :: Word -> AccState -> (AccState, Word)
 984           acc i (State s) = let sum = s + i in (State sum, sum)
 985
 986           -- The averaging circuit state type
 987           type AvgState = State (AccState, Word)
 988           -- The averaging circuit
 989           avg :: Word -> AvgState -> (AvgState, Word)
 990           avg i (State s) = (State s', o)
 991             where
 992               (accs, count) = s
 993               -- Pass our input through the accumulator, which outputs a sum
 994               (accs', sum) = acc i accs
 995               -- Increment the count (which will be our new state)
 996               count' = count + 1
 997               -- Compute the average
 998               o = sum / count'
 999               s' = (accs', count')
1000         \stopbuffer
1001
1002         \placeexample[here][ex:AvgState]{Simple stateful averaging circuit.}
1003           %\startcombination[2*1]
1004             {\typebufferhs{AvgState}}%{Haskell description using function applications.}
1005           %  {\boxedgraphic{AvgState}}{The architecture described by the Haskell description.}
1006           %\stopcombination
1007         \todo{Picture}
1008
1009   \section{\VHDL\ generation for state}
1010     Now its clear how to put state annotations in the Haskell source,
1011     there is the question of how to implement this state translation. As
1012     we have seen in \in{section}[sec:prototype:design], the translation to
1013     \VHDL\ happens as a simple, final step in the compilation process.
1014     This step works on a Core expression in normal form. The specifics
1015     of normal form will be explained in
1016     \in{chapter}[chap:normalization], but the examples given should be
1017     easy to understand using the definition of Core given above. The
1018     conversion to and from the \hs{State} type is done using the cast
1019     operator, \lam{▶}.
1020
1021         \startbuffer[AvgStateNormal]
1022           acc = λi.λspacked.
1023             let
1024               -- Remove the State newtype
1025               s = spacked ▶ Word
1026               sum = s + i
1027               -- Add the State newtype again
1028               spacked' = sum ▶ State Word
1029               res = (spacked', sum)
1030             in
1031               res
1032
1033           avg = λi.λspacked.
1034             let
1035               s = spacked ▶ (AccState, Word)
1036               accs = case s of (a, b) -> a
1037               count = case s of (c, d) -> d
1038               accres = acc i accs
1039               accs' = case accres of (e, f) -> e
1040               sum = case accres of (g, h) -> h
1041               count' = count + 1
1042               o = sum / count'
1043               s' = (accs', count')
1044               spacked' = s' ▶ State (AccState, Word)
1045               res = (spacked', o)
1046             in
1047               res
1048         \stopbuffer
1049
1050         \placeexample[here][ex:AvgStateNormal]{Normalized version of \in{example}[ex:AvgState]}
1051             {\typebufferlam{AvgStateNormal}}
1052
1053     \subsection[sec:prototype:statelimits]{State in normal form}
1054       Before describing how to translate state from normal form to
1055       \VHDL, we will first see how state handling looks in normal form.
1056       How must their use be limited to guarantee that proper \VHDL\ can
1057       be generated?
1058
1059       We will formulate a number of rules about what operations are
1060       allowed with state variables. These rules apply to the normalized Core
1061       representation, but will in practice apply to the original Haskell
1062       hardware description as well. Ideally, these rules would become part
1063       of the intended normal form definition \refdef{intended normal form
1064       definition}, but this is not the case right now. This can cause some
1065       problems, which are detailed in
1066       \in{section}[sec:normalization:stateproblems].
1067
1068       In these rules we use the terms \emph{state variable} to refer to any
1069       variable that has a \lam{State} type. A \emph{state-containing
1070       variable} is any variable whose type contains a \lam{State} type,
1071       but is not one itself (like \lam{(AccState, Word)} in the example,
1072       which is a tuple type, but contains \lam{AccState}, which is again
1073       equal to \lam{State Word}).
1074
1075       We also use a distinction between \emph{input} and \emph{output
1076       (state) variables} and \emph{sub-state variables}, which will be
1077       defined in the rules themselves.
1078
1079       These rules describe everything that can be done with state
1080       variables and state-containing variables. Everything else is
1081       invalid. For every rule, the corresponding part of
1082       \in{example}[ex:AvgStateNormal] is shown.
1083
1084       \startdesc{State variables can appear as an argument.}
1085         \startlambda
1086           avg = λi.λspacked. ...
1087         \stoplambda
1088
1089         Any lambda that binds a variable with a state type, creates a new
1090         input state variable.
1091       \stopdesc
1092
1093       \startdesc{Input state variables can be unpacked.}
1094         \startlambda
1095           s = spacked ▶ (AccState, Word)
1096         \stoplambda
1097
1098         An input state variable may be unpacked using a cast operation. This
1099         removes the \lam{State} type renaming and the result has no longer a
1100         \lam{State} type.
1101
1102         If the result of this unpacking does not have a state type and does
1103         not contain state variables, there are no limitations on its
1104         use (this is the function's own state).  Otherwise if it does
1105         not have a state type but does contain sub-states, we refer to it
1106         as a \emph{state-containing input variable} and the limitations
1107         below apply. If it has a state type itself, we refer to it as an
1108         \emph{input sub-state variable} and the below limitations apply
1109         as well.
1110
1111         It may seem strange to consider a variable that still has a state
1112         type directly after unpacking, but consider the case where a
1113         function does not have any state of its own, but does call a single
1114         stateful function. This means it must have a state argument that
1115         contains just a sub-state. The function signature of such a function
1116         could look like:
1117
1118         \starthaskell
1119           type FooState = State AccState
1120         \stophaskell
1121
1122         Which is of course equivalent to \lam{State (State Word)}.
1123       \stopdesc
1124
1125       \startdesc{Variables can be extracted from state-containing input variables.}
1126         \startlambda
1127           accs = case s of (a, b) -> a
1128         \stoplambda
1129
1130         A state-containing input variable is typically a tuple containing
1131         multiple elements (like the current function's state, sub-states or
1132         more tuples containing sub-states). All of these can be extracted
1133         from an input variable using an extractor case (or possibly
1134         multiple, when the input variable is nested).
1135
1136         If the result has no state type and does not contain any state
1137         variables either, there are no further limitations on its use
1138         (this is the function's own state). If the result has no state
1139         type but does contain state variables we refer to it as a
1140         \emph{state-containing input variable} and this limitation keeps
1141         applying. If the variable has a state type itself, we refer to
1142         it as an \emph{input sub-state variable} and below limitations
1143         apply.
1144
1145       \startdesc{Input sub-state variables can be passed to functions.}
1146         \startlambda
1147           accres = acc i accs
1148           accs' = case accres of (e, f) -> e
1149         \stoplambda
1150
1151         An input sub-state variable can (only) be passed to a function.
1152         Additionally, every input sub-state variable must be used in exactly
1153         \emph{one} application, no more and no less.
1154
1155         The function result should contain exactly one state variable, which
1156         can be extracted using (multiple) case expressions. The extracted
1157         state variable is referred to the \emph{output sub-state}
1158
1159         The type of this output sub-state must be identical to the type of
1160         the input sub-state passed to the function.
1161       \stopdesc
1162
1163       \startdesc{Variables can be inserted into a state-containing output variable.}
1164         \startlambda
1165           s' = (accs', count')
1166         \stoplambda
1167
1168         A function's output state is usually a tuple containing its own
1169         updated state variables and all output sub-states. This result is
1170         built up using any single-constructor algebraic datatype
1171         (possibly nested).
1172
1173         The result of these expressions is referred to as a
1174         \emph{state-containing output variable}, which are subject to these
1175         limitations.
1176       \stopdesc
1177
1178       \startdesc{State containing output variables can be packed.}
1179         \startlambda
1180           spacked' = s' ▶ State (AccState, Word)
1181         \stoplambda
1182
1183         As soon as all a functions own update state and output sub-state
1184         variables have been joined together, the resulting
1185         state-containing output variable can be packed into an output
1186         state variable. Packing is done by casting to a state type.
1187       \stopdesc
1188
1189       \startdesc{Output state variables can appear as (part of) a function result.}
1190         \startlambda
1191           avg = λi.λspacked.
1192             let
1193             \vdots
1194             res = (spacked', o)
1195           in
1196             res
1197         \stoplambda
1198         When the output state is packed, it can be returned as a part
1199         of the function result. Nothing else can be done with this
1200         value (or any value that contains it).
1201       \stopdesc
1202
1203       There is one final limitation that is hard to express in the above
1204       itemization. Whenever sub-states are extracted from the input state
1205       to be passed to functions, the corresponding output sub-states
1206       should be inserted into the output state in the same way. In other
1207       words, each pair of corresponding sub-states in the input and
1208       output states should be passed to / returned from the same called
1209       function.
1210
1211       The prototype currently does not check much of the above
1212       conditions. This means that if the conditions are violated,
1213       sometimes a compile error is generated, but in other cases output
1214       can be generated that is not valid \VHDL\ or at the very least does
1215       not correspond to the input.
1216
1217     \subsection{Translating to \VHDL}
1218       As noted above, the basic approach when generating \VHDL\ for stateful
1219       functions is to generate a single register for every stateful function.
1220       We look around the normal form to find the let binding that removes the
1221       \lam{State} type renaming (using a cast). We also find the let binding that
1222       adds a \lam{State} type. These are connected to the output and the input
1223       of the generated let binding respectively. This means that there can
1224       only be one let binding that adds and one that removes the \lam{State}
1225       type. It is easy to violate this constraint. This problem is detailed in
1226       \in{section}[sec:normalization:stateproblems].
1227
1228       This approach seems simple enough, but will this also work for more
1229       complex stateful functions involving sub-states?  Observe that any
1230       component of a function's state that is a sub-state, \ie\ passed on as
1231       the state of another function, should have no influence on the
1232       hardware generated for the calling function. Any state-specific
1233       \small{VHDL} for this component can be generated entirely within the
1234       called function.  So, we can completely ignore sub-states when
1235       generating \VHDL\ for a function.
1236
1237       From this observation it might seem logical to remove the
1238       sub-states from a function's states altogether and leave only the
1239       state components which are actual states of the current function.
1240       While doing this would not remove any information needed to
1241       generate \small{VHDL} from the function, it would cause the
1242       function definition to become invalid (since we will not have any
1243       sub-state to pass to the functions anymore).  We could solve the
1244       syntactic problems by passing \type{undefined} for state
1245       variables, but that would still break the code on the semantic
1246       level (\ie, the function would no longer be semantically
1247       equivalent to the original input).
1248
1249       To keep the function definition correct until the very end of the
1250       process, we will not deal with (sub)states until we get to the
1251       \small{VHDL} generation.  Then, we are translating from Core to
1252       \small{VHDL}, and we can simply generate no \VHDL for sub-states,
1253       effectively removing them altogether.
1254
1255       But, how will we know what exactly is a sub-state? Since any state
1256       argument or return value that represents state must be of the
1257       \type{State} type, we can look at the type of a value. However, we
1258       must be careful to ignore only \emph{sub-states}, and not a
1259       function's own state.
1260
1261       For \in{example}[ex:AvgStateNormal] above, we should generate a register
1262       with its output connected to \lam{s} and its input connected
1263       to \lam{s'}. However, \lam{s'} is build up from both \lam{accs'} and
1264       \lam{count'}, while only \lam{count'} should end up in the register.
1265       \lam{accs'} is a sub-state for the \lam{acc} function, for which a
1266       register will be created when generating \VHDL\ for the \lam{acc}
1267       function.
1268
1269       Fortunately, the \lam{accs'} variable (and any other sub-state) has a
1270       property that we can easily check: it has a \lam{State} type. This
1271       means that whenever \VHDL\ is generated for a tuple (or other
1272       algebraic type), we can simply leave out all elements that have a
1273       \lam{State} type. This will leave just the parts of the state that
1274       do not have a \lam{State} type themselves, like \lam{count'},
1275       which is exactly a function's own state. This approach also means
1276       that the state part of the result (\eg\ \lam{s'} in \lam{res}) is
1277       automatically excluded when generating the output port, which is
1278       also required.
1279
1280       We can formalize this translation a bit, using the following
1281       rules.
1282
1283       \startitemize
1284         \item A state unpack operation should not generate any \small{VHDL}.
1285         The binder to which the unpacked state is bound should still be
1286         declared, this signal will become the register and will hold the
1287         current state.
1288         \item A state pack operation should not generate any \small{VHDL}.
1289         The binder to which the packed state is bound should not be
1290         declared. The binder that is packed is the signal that will hold the
1291         new state.
1292         \item Any values of a State type should not be translated to
1293         \small{VHDL}. In particular, State elements should be removed from
1294         tuples (and other data-types) and arguments with a state type should
1295         not generate ports.
1296         \item To make the state actually work, a simple \small{VHDL}
1297         (sequential) process should be generated. This process updates
1298         the state at every clock cycle, by assigning the new state to the
1299         current state. This will be recognized by synthesis tools as a
1300         register specification.
1301       \stopitemize
1302
1303       When applying these rules to the function \lam{avg} from
1304       \in{example}[ex:AvgStateNormal], we be left with the description
1305       in \in{example}[ex:AvgStateRemoved]. All the parts that do not
1306       generate any \VHDL\ directly are crossed out, leaving just the
1307       actual flow of values in the final hardware. To illustrate the
1308       change of the types of \lam{s} and \lam{s'}, their types are also
1309       shown.
1310
1311       \startbuffer[AvgStateRemoved]
1312         avg = iλ.λ--spacked.--
1313           let
1314             s :: (--AccState,-- Word)
1315             s = --spacked ▶ (AccState, Word)--
1316             --accs = case s of (a, b) -> a--
1317             count = case s of (--c,-- d) -> d
1318             accres = acc i --accs--
1319             --accs' = case accres of (e, f) -> e--
1320             sum = case accres of (--g,-- h) -> h
1321             count' = count + 1
1322             o = sum / count'
1323             s' :: (--AccState,-- Word)
1324             s' = (--accs',-- count')
1325             --spacked' = s' ▶ State (AccState, Word)--
1326             res = (--spacked',-- o)
1327           in
1328             res
1329       \stopbuffer
1330       \placeexample[here][ex:AvgStateRemoved]{Normalized version of \in{example}[ex:AvgState] with ignored parts crossed out}
1331           {\typebufferlam{AvgStateRemoved}}
1332
1333       When we actually leave out the crossed out parts, we get a slightly
1334       weird program: there is a variable \lam{s} which has no value, and there
1335       is a variable \lam{s'} that is never used. But together, these two will form
1336       the state process of the function. \lam{s} contains the "current" state,
1337       \lam{s'} is assigned the "next" state. So, at the end of each clock
1338       cycle, \lam{s'} should be assigned to \lam{s}.
1339
1340       As an illustration of the result of this function,
1341       \in{example}[ex:AccStateVHDL] and \in{example}[ex:AvgStateVHDL] show the the \VHDL\ that is
1342       generated by Cλash from the examples is this section.
1343
1344       \startbuffer[AvgStateVHDL]
1345         entity avgComponent_0 is
1346              port (\izAlE2\ : in \unsigned_31\;
1347                    \foozAo1zAo12\ : out \(,)unsigned_31\;
1348                    clock : in std_logic;
1349                    resetn : in std_logic);
1350         end entity avgComponent_0;
1351
1352
1353         architecture structural of avgComponent_0 is
1354              signal \szAlG2\ : \(,)unsigned_31\;
1355              signal \countzAlW2\ : \unsigned_31\;
1356              signal \dszAm62\ : \(,)unsigned_31\;
1357              signal \sumzAmk3\ : \unsigned_31\;
1358              signal \reszAnCzAnM2\ : \unsigned_31\;
1359              signal \foozAnZzAnZ2\ : \unsigned_31\;
1360              signal \reszAnfzAnj3\ : \unsigned_31\;
1361              signal \s'zAmC2\ : \(,)unsigned_31\;
1362         begin
1363              \countzAlW2\ <= \szAlG2\.A;
1364
1365              \comp_ins_dszAm62\ : entity accComponent_1
1366                                        port map (\izAob3\ => \izAlE2\,
1367                                                  \foozAoBzAoB2\ => \dszAm62\,
1368                                                  clock => clock,
1369                                                  resetn => resetn);
1370
1371              \sumzAmk3\ <= \dszAm62\.A;
1372
1373              \reszAnCzAnM2\ <= to_unsigned(1, 32);
1374
1375              \foozAnZzAnZ2\ <= \countzAlW2\ + \reszAnCzAnM2\;
1376
1377              \reszAnfzAnj3\ <= \sumzAmk3\ * \foozAnZzAnZ2\;
1378
1379              \s'zAmC2\.A <= \foozAnZzAnZ2\;
1380
1381              \foozAo1zAo12\.A <= \reszAnfzAnj3\;
1382
1383              state : process (clock, resetn)
1384              begin
1385                   if resetn = '0' then
1386                   elseif rising_edge(clock) then
1387                        \szAlG2\ <= \s'zAmC2\;
1388                   end if;
1389              end process state;
1390         end architecture structural;
1391       \stopbuffer
1392
1393       \startbuffer[AvgStateTypes]
1394         package types is
1395              subtype \unsigned_31\ is unsigned (0 to 31);
1396
1397              type \(,)unsigned_31\ is record
1398                    A : \unsigned_31\;
1399              end record;
1400         end package types;
1401       \stopbuffer
1402
1403       \startbuffer[AccStateVHDL]
1404         entity accComponent_1 is
1405              port (\izAob3\ : in \unsigned_31\;
1406                    \foozAoBzAoB2\ : out \(,)unsigned_31\;
1407                    clock : in std_logic;
1408                    resetn : in std_logic);
1409         end entity accComponent_1;
1410
1411         architecture structural of accComponent_1 is
1412              signal \szAod3\ : \unsigned_31\;
1413              signal \reszAonzAor3\ : \unsigned_31\;
1414         begin
1415              \reszAonzAor3\ <= \szAod3\ + \izAob3\;
1416
1417              \foozAoBzAoB2\.A <= \reszAonzAor3\;
1418
1419              state : process (clock, resetn)
1420              begin
1421                   if resetn = '0' then
1422                   elseif rising_edge(clock) then
1423                        \szAod3\ <= \reszAonzAor3\;
1424                   end if;
1425              end process state;
1426         end architecture structural;
1427       \stopbuffer
1428
1429       \placeexample[][ex:AvgStateTypes]{\VHDL\ types generated for \hs{acc} and \hs{avg} from \in{example}[ex:AvgState]}
1430           {\typebuffervhdl{AvgStateTypes}}
1431       \placeexample[][ex:AccStateVHDL]{\VHDL\ generated for \hs{acc} from \in{example}[ex:AvgState]}
1432           {\typebuffervhdl{AccStateVHDL}}
1433       \placeexample[][ex:AvgStateVHDL]{\VHDL\ generated for \hs{avg} from \in{example}[ex:AvgState]}
1434           {\typebuffervhdl{AvgStateVHDL}}
1435   \section{Prototype implementation}
1436     The prototype has been implemented using Haskell as its
1437     implementation language, just like \GHC. This allows the prototype
1438     do directly use parts of \GHC\ through the \small{API} it exposes
1439     (which essentially taps directly into the internals of \GHC, making
1440     this \small{API} not really a stable interface).
1441
1442     Cλash can be run from a separate library, but has also been
1443     integrated into \type{ghci} \cite[baaij09]. The latter does requires
1444     a custom \GHC\ build, however.
1445
1446     The latest version and all history of the Cλash code can be browsed
1447     on-line or retrieved using the \type{git} program.
1448
1449     http://git.stderr.nl/gitweb?p=matthijs/projects/cλash.git
1450
1451 %    \subsection{Initial state}
1452 %      How to specify the initial state? Cannot be done inside a hardware
1453 %      function, since the initial state is its own state argument for the first
1454 %      call (unless you add an explicit, synchronous reset port).
1455 %
1456 %      External init state is natural for simulation.
1457 %
1458 %      External init state works for hardware generation as well.
1459 %
1460 %      Implementation issues: state splitting, linking input to output state,
1461 %      checking usage constraints on state variables.
1462 %
1463 %
1464 % vim: set sw=2 sts=2 expandtab: