Chapters/Prototype.tex

   1 \chapter[chap:prototype]{Prototype}
   2   An important step in this research is the creation of a prototype compiler.
   3   Having this prototype allows us to apply the ideas from the previous chapter
   4   to actual hardware descriptions and evaluate their usefulness. Having a
   5   prototype also helps to find new techniques and test possible
   6   interpretations.
   7
   8   Obviously the prototype was not created after all research
   9   ideas were formed, but its implementation has been interleaved with the
  10   research itself. Also, the prototype described here is the final version, it
  11   has gone through a number of design iterations which we will not completely
  12   describe here.
  13
  14   \section[sec:prototype:input]{Input language}
  15     When implementing this prototype, the first question to ask is: What
  16     (functional) language will we use to describe our hardware? (Note that
  17     this does not concern the \emph{implementation language} of the compiler,
  18     just the language \emph{translated by} the compiler).
  19
  20     On the highest level, we have two choices:
  21
  22     \startitemize
  23       \item Create a new functional language from scratch. This has the
  24       advantage of having a language that contains exactly those elements that
  25       are convenient for describing hardware and can contain special
  26       constructs that might.
  27       \item Use an existing language and create a new backend for it. This has
  28       the advantage that existing tools can be reused, which will speed up
  29       development.
  30     \stopitemize
  31
  32     Considering that we required a prototype which should be working quickly,
  33     and that implementing parsers, semantic checkers and especially
  34     typcheckers isn't exactly the core of this research (but it is lots and
  35     lots of work!), using an existing language is the obvious choice. This
  36     also has the advantage that a large set of language features is available
  37     to experiment with and it is easy to find which features apply well and
  38     which don't. A possible second prototype could use a custom language with
  39     just the useful features (and possibly extra features that are specific to
  40     the domain of hardware description as well).
  41
  42     The second choice is to pick one of the many existing languages. As
  43     mentioned before, this language is Haskell.  This choice has not been the
  44     result of a thorough comparison of languages, for the simple reason that
  45     the requirements on the language were completely unclear at the start of
  46     this language. The fact that Haskell is a language with a broad spectrum
  47     of features, that it is commonly used in research projects and that the
  48     primary compiler, GHC, provides a high level API to its internals, made
  49     Haskell an obvious choice.
  50
  51     TODO: Was Haskell really a good choice? Perhaps say this somewhere else?
  52
  53   \section[sec:prototype:output]{Output format}
  54     The second important question is: What will be our output format? Since
  55     our prototype won't be able to program FPGA's directly, we'll have to have
  56     output our hardware in some format that can be later processed and
  57     programmed by other tools.
  58
  59     Looking at other tools in the industry, the Electronic Design Interchange
  60     Format (\small{EDIF}) is commonly used for storing intermediate
  61     \emph{netlists} (lists of components and connections between these
  62     components) and is commonly the target for \small{VHDL} and Verilog
  63     compilers.
  64
  65     However, \small{EDIF} is not completely tool-independent. It specifies a
  66     meta-format, but the hardware components that can be used vary between
  67     various tool and hardware vendors, as well as the interpretation of the
  68     \small{EDIF} standard (TODO Is this still true? Reference:
  69     http://delivery.acm.org/10.1145/80000/74534/p803-li.pdf?key1=74534\&key2=8370537521\&coll=GUIDE\&dl=GUIDE\&CFID=61207158\&CFTOKEN=61908473).
  70
  71     This means that when working with EDIF, our prototype would become
  72     technology dependent (\eg only work with \small{FPGA}s of a specific
  73     vendor, or even only with specific chips). This limits the applicability
  74     of our prototype. Also, the tools we'd like to use for verifying,
  75     simulating and draw pretty pictures of our output (like Precision, or
  76     QuestaSim) work on \small{VHDL} or Verilog input (TODO: Is this really
  77     true?).
  78
  79     For these reasons, we will use \small{VHDL} as our output language.
  80     Verilog is not used simply because we are familiar with \small{VHDL}
  81     already. The differences between \small{VHDL} and Verilog are on the
  82     higher level, while we will be using \small{VHDL} mainly to write low
  83     level, netlist-like descriptions anyway.
  84
  85     An added advantage of using VHDL is that we can profit from existing
  86     optimizations in VHDL synthesizers. A lot of optimizations are done on the
  87     VHDL level by existing tools. These tools have years of experience in this
  88     field, so it would not be reasonable to assume we could achieve a similar
  89     amount of optimization in our prototype (nor should it be a goal,
  90     considering this is just a prototype).
  91
  92     Note that we will be using \small{VHDL} as our output language, but will
  93     not use its full expressive power. Our output will be limited to using
  94     simple, structural descriptions, without any behavioural descriptions
  95     (which might not be supported by all tools).
  96
  97   \section{Prototype design}
  98     As stated above, we will use the Glasgow Haskell Compiler (\small{GHC}) to
  99     implement our prototype compiler. To understand the design of the
 100     compiler, we will first dive into the \small{GHC} compiler a bit. It's
 101     compilation consists of the following steps (slightly simplified):
 102
 103     \startuseMPgraphic{ghc-pipeline}
 104       % Create objects
 105       save inp, front, desugar, simpl, back, out;
 106       newEmptyBox.inp(0,0);
 107       newBox.front(btex Parser etex);
 108       newBox.desugar(btex Desugarer etex);
 109       newBox.simpl(btex Simplifier etex);
 110       newBox.back(btex Backend etex);
 111       newEmptyBox.out(0,0);
 112
 113       % Space the boxes evenly
 114       inp.c - front.c = front.c - desugar.c = desugar.c - simpl.c
 115         = simpl.c - back.c = back.c - out.c = (0, 1.5cm);
 116       out.c = origin;
 117
 118       % Draw lines between the boxes. We make these lines "deferred" and give
 119       % them a name, so we can use ObjLabel to draw a label beside them.
 120       ncline.inp(inp)(front) "name(haskell)";
 121       ncline.front(front)(desugar) "name(ast)";
 122       ncline.desugar(desugar)(simpl) "name(core)";
 123       ncline.simpl(simpl)(back) "name(simplcore)";
 124       ncline.back(back)(out) "name(native)";
 125       ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
 126       ObjLabel.front(btex Haskell AST etex) "labpathname(ast)", "labdir(rt)";
 127       ObjLabel.desugar(btex Core etex) "labpathname(core)", "labdir(rt)";
 128       ObjLabel.simpl(btex Simplified core etex) "labpathname(simplcore)", "labdir(rt)";
 129       ObjLabel.back(btex Native code etex) "labpathname(native)", "labdir(rt)";
 130
 131       % Draw the objects (and deferred labels)
 132       drawObj (inp, front, desugar, simpl, back, out);
 133     \stopuseMPgraphic
 134     \placefigure[right]{GHC compiler pipeline}{\useMPgraphic{ghc-pipeline}}
 135
 136     \startdesc{Frontend}
 137       This step takes the Haskell source files and parses them into an
 138       abstract syntax tree (\small{AST}). This \small{AST} can express the
 139       complete Haskell language and is thus a very complex one (in contrast
 140       with the Core \small{AST}, later on). All identifiers in this
 141       \small{AST} are resolved by the renamer and all types are checked by the
 142       typechecker.
 143     \stopdesc
 144     \startdesc{Desugaring}
 145       This steps takes the full \small{AST} and translates it to the
 146       \emph{Core} language. Core is a very small functional language with lazy
 147       semantics, that can still express everything Haskell can express. Its
 148       simpleness makes Core very suitable for further simplification and
 149       translation. Core is the language we will be working on as well.
 150     \stopdesc
 151     \startdesc{Simplification}
 152       Through a number of simplification steps (such as inlining, common
 153       subexpression elimination, etc.) the Core program is simplified to make
 154       it faster or easier to process further.
 155     \stopdesc
 156     \startdesc{Backend}
 157       This step takes the simplified Core program and generates an actual
 158       runnable program for it. This is a big and complicated step we will not
 159       discuss it any further, since it is not required for our prototype.
 160     \stopdesc
 161
 162     In this process, there a number of places where we can start our work.
 163     Assuming that we don't want to deal with (or modify) parsing, typechecking
 164     and other frontend business and that native code isn't really a useful
 165     format anymore, we are left with the choice between the full Haskell
 166     \small{AST}, or the smaller (simplified) core representation.
 167
 168     The advantage of taking the full \small{AST} is that the exact structure
 169     of the source program is preserved. We can see exactly what the hardware
 170     descriiption looks like and which syntax constructs were used. However,
 171     the full \small{AST} is a very complicated datastructure. If we are to
 172     handle everything it offers, we will quickly get a big compiler.
 173
 174     Using the core representation gives us a much more compact datastructure
 175     (a core expression only uses 9 constructors). Note that this does not mean
 176     that the core representation itself is smaller, on the contrary. Since the
 177     core language has less constructs, a lot of things will take a larger
 178     expression to express.
 179
 180     However, the fact that the core language is so much smaller, means it is a
 181     lot easier to analyze and translate it into something else. For the same
 182     reason, \small{GHC} runs its simplifications and optimizations on the core
 183     representation as well.
 184
 185     However, we will use the normal core representation, not the simplified
 186     core. Reasons for this are detailed below.
 187
 188     The final prototype roughly consists of three steps:
 189
 190     \startuseMPgraphic{ghc-pipeline}
 191       % Create objects
 192       save inp, front, norm, vhdl, out;
 193       newEmptyBox.inp(0,0);
 194       newBox.front(btex \small{GHC} frontend + desugarer etex);
 195       newBox.norm(btex Normalization etex);
 196       newBox.vhdl(btex \small{VHDL} generation etex);
 197       newEmptyBox.out(0,0);
 198
 199       % Space the boxes evenly
 200       inp.c - front.c = front.c - norm.c = norm.c - vhdl.c
 201         = vhdl.c - out.c = (0, 1.5cm);
 202       out.c = origin;
 203
 204       % Draw lines between the boxes. We make these lines "deferred" and give
 205       % them a name, so we can use ObjLabel to draw a label beside them.
 206       ncline.inp(inp)(front) "name(haskell)";
 207       ncline.front(front)(norm) "name(core)";
 208       ncline.norm(norm)(vhdl) "name(normal)";
 209       ncline.vhdl(vhdl)(out) "name(vhdl)";
 210       ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
 211       ObjLabel.front(btex Core etex) "labpathname(core)", "labdir(rt)";
 212       ObjLabel.norm(btex Normalized core etex) "labpathname(normal)", "labdir(rt)";
 213       ObjLabel.vhdl(btex \small{VHDL} description etex) "labpathname(vhdl)", "labdir(rt)";
 214
 215       % Draw the objects (and deferred labels)
 216       drawObj (inp, front, norm, vhdl, out);
 217     \stopuseMPgraphic
 218     \placefigure[right]{GHC compiler pipeline}{\useMPgraphic{ghc-pipeline}}
 219
 220     \startdesc{Frontend}
 221       This is exactly the frontend and desugarer from the \small{GHC}
 222       pipeline, that translates Haskell sources to a core representation.
 223     \stopdesc
 224     \startdesc{Normalization}
 225       This is a step that transforms the core representation into a normal
 226       form. This normal form is still expressed in the core language, but has
 227       to adhere to an extra set of constraints. This normal form is less
 228       expressive than the full core language (e.g., it can have limited higher
 229       order expressions, has a specific structure, etc.), but is also very
 230       close to directly describing hardware.
 231     \stopdesc
 232     \startdesc{\small{VHDL} generation}
 233       The last step takes the normal formed core representation and generates
 234       \small{VHDL} for it. Since the normal form has a specific, hardware-like
 235       structure, this final step is very straightforward.
 236     \stopdesc
 237
 238     The most interesting step in this process is the normalization step. That
 239     is where more complicated functional constructs, which have no direct
 240     hardware interpretation, are removed and translated into hardware
 241     constructs. This step is described in a lot of detail at
 242     \in{chapter}[chap:normalization].
 243
 244   \section{The Core language}
 245     Most of the prototype deals with handling the program in the Core
 246     language. In this section we will show what this language looks like and
 247     how it works.
 248
 249     The Core language is a functional language that describes
 250     \emph{expressions}. Every identifier used in Core is called a
 251     \emph{binder}, since it is bound to a value somewhere. On the highest
 252     level, a Core program is a collection of functions, each of which bind a
 253     binder (the function name) to an expression (the function value, which has
 254     a function type).
 255
 256     The Core language itself does not prescribe any program structure, only
 257     expression structure. In the \small{GHC} compiler, the Haskell module
 258     structure is used for the resulting Core code as well. Since this is not
 259     so relevant for understanding the Core language or the Normalization
 260     process, we'll only look at the Core expression language here.
 261
 262     Each Core expression consists of one of these possible expressions.
 263
 264     \startdesc{Variable reference}
 265 \startlambda
 266 a
 267 \stoplambda
 268       This is a simple reference to a binder. It's written down as the
 269       name of the binder that is being referred to, which should of course be
 270       bound in a containing scope (including top level scope, so a reference
 271       to a top level function is also a variable reference). Additionally,
 272       constructors from algebraic datatypes also become variable references.
 273
 274       The value of this expression is the value bound to the given binder.
 275
 276       Each binder also carries around its type, but this is usually not shown
 277       in the Core expressions. Occasionally, the type of an entire expression
 278       or function is shown for clarity, but this is only informational. In
 279       practice, the type of an expression is easily determined from the
 280       structure of the expression and the types of the binders and occasional
 281       cast expressions. This minimize the amount of bookkeeping needed to keep
 282       the typing consistent.
 283     \stopdesc
 284     \startdesc{Literal}
 285 \startlambda
 286 10
 287 \stoplambda
 288       This is a simple literal. Only primitive types are supported, like
 289       chars, strings, ints and doubles. The types of these literals are the
 290       \quote{primitive} versions, like \lam{Char\#} and \lam{Word\#}, not the
 291       normal Haskell versions (but there are builtin conversion functions).
 292     \stopdesc
 293     \startdesc{Application}
 294 \startlambda
 295 func arg
 296 \stoplambda
 297       This is simple function application. Each application consists of two
 298       parts: The function part and the argument part. Applications are used
 299       for normal function \quote{calls}, but also for applying type
 300       abstractions and data constructors.
 301
 302       The value of an application is the value of the function part, with the
 303       first argument binder bound to the argument part.
 304     \stopdesc
 305     \startdesc{Lambda abstraction}
 306 \startlambda
 307 λbndr.body
 308 \stoplambda
 309       This is the basic lambda abstraction, as it occurs in labmda calculus.
 310       It consists of a binder part and a body part.  A lambda abstraction
 311       creates a function, that can be applied to an argument.
 312
 313       Note that the body of a lambda abstraction extends all the way to the
 314       end of the expression, or the closing bracket surrounding the lambda. In
 315       other words, the lambda abstraction \quote{operator} has the lowest
 316       priority of all.
 317
 318       The value of an application is the value of the body part, with the
 319       binder bound to the value the entire lambda abstraction is applied to.
 320     \stopdesc
 321     \startdesc{Non-recursive let expression}
 322 \startlambda
 323 let bndr = value in body
 324 \stoplambda
 325       A let expression allows you to bind a binder to some value, while
 326       evaluating to some other value (where that binder is in scope). This
 327       allows for sharing of subexpressions (you can use a binder twice) and
 328       explicit \quote{naming} of arbitrary expressions. Note that the binder
 329       is not in scope in the value bound to it, so it's not possible to make
 330       recursive definitions with the normal form of the let expression (see
 331       the recursive form below).
 332
 333       Even though this let expression is an extension on the basic lambda
 334       calculus, it is easily translated to a lambda abstraction. The let
 335       expression above would then become:
 336
 337 \startlambda
 338 (λbndr.body) value
 339 \stoplambda
 340
 341       This notion might be useful for verifying certain properties on
 342       transformations, since a lot of verification work has been done on
 343       lambda calculus already.
 344
 345       The value of a let expression is the value of the body part, with the
 346       binder bound to the value.
 347     \stopdesc
 348     \startdesc{Recursive let expression}
 349 \startlambda
 350 letrec
 351   bndr1 = value1
 352   \vdots
 353   bndrn = valuen
 354 in
 355   body
 356 \stoplambda
 357
 358       This is the recursive version of the let expression. In \small{GHC}'s
 359       Core implementation, non-recursive and recursive lets are not so
 360       distinct as we present them here, but this provides a clearer overview.
 361
 362       The main difference with the normal let expression is that each of the
 363       binders is in scope in each of the values, in addition to the body. This
 364       allows for self-recursive definitions or mutually recursive definitions.
 365
 366       It should also be possible to express a recursive let using normal
 367       lambda calculus, if we use the \emph{least fixed-point operator},
 368       \lam{Y}.
 369     \stopdesc
 370     \startdesc{Case expression}
 371 \startlambda
 372   case scrut of bndr
 373     DEFAULT -> defaultbody
 374     C0 bndr0,0 ... bndr0,m -> body0
 375     \vdots
 376     Cn bndrn,0 ... bndrn,m -> bodyn
 377 \stoplambda
 378
 379 TODO: Define WHNF
 380
 381     A case expression is the only way in Core to choose between values. A case
 382     expression evaluates its scrutinee, which should have an algebraic
 383     datatype, into weak head normal form (\small{WHNF}) and (optionally) binds
 384     it to \lam{bndr}. It then chooses a body depending on the constructor of
 385     its scrutinee. If none of the constructors match, the \lam{DEFAULT}
 386     alternative is chosen.
 387
 388     Since we can only match the top level constructor, there can be no overlap
 389     in the alternatives and thus order of alternatives is not relevant (though
 390     the \lam{DEFAULT} alternative must appear first for implementation
 391     efficiency).
 392
 393     Any arguments to the constructor in the scrutinee are bound to each of the
 394     binders after the constructor and are in scope only in the corresponding
 395     body.
 396
 397     To support strictness, the scrutinee is always evaluated into WHNF, even
 398     when there is only a \lam{DEFAULT} alternative. This allows a strict
 399     function argument to be written like:
 400
 401 \startlambda
 402 function (case argument of arg
 403   DEFAULT -> arg)
 404 \stoplambda
 405
 406     This seems to be the only use for the extra binder to which the scrutinee
 407     is bound. When not using strictness annotations (which is rather pointless
 408     in hardware descriptions), \small{GHC} seems to never generate any code
 409     making use of this binder. The current prototype does not handle it
 410     either, which probably means that code using it would break.
 411
 412     Note that these case statements are less powerful than the full Haskell
 413     case statements. In particular, they do not support complex patterns like
 414     in Haskell. Only the constructor of an expression can be matched, complex
 415     patterns are implemented using multiple nested case expressions.
 416
 417     Case statements are also used for unpacking of algebraic datatypes, even
 418     when there is only a single constructor. For examples, to add the elements
 419     of a tuple, the following Core is generated:
 420
 421 \startlambda
 422 sum = λtuple.case tuple of
 423   (,) a b -> a + b
 424 \stoplambda
 425
 426     Here, there is only a single alternative (but no \lam{DEFAULT}
 427     alternative, since the single alternative is already exhaustive). When
 428     it's body is evaluated, the arguments to the tuple constructor \lam{(,)}
 429     (\eg, the elements of the tuple) are bound to \lam{a} and \lam{b}.
 430   \stopdesc
 431   \startdesc{Cast expression}
 432 \startlambda
 433 body :: targettype
 434 \stoplambda
 435     A cast expression allows you to change the type of an expression to an
 436     equivalent type. Note that this is not meant to do any actual work, like
 437     conversion of data from one format to another, or force a complete type
 438     change. Instead, it is meant to change between different representations
 439     of the same type, \eg switch between types that are provably equal (but
 440     look different).
 441
 442     In our hardware descriptions, we typically see casts to change between a
 443     Haskell newtype and its contained type, since those are effectively
 444     different representations of the same type.
 445
 446     More complex are types that are proven to be equal by the typechecker,
 447     but look different at first glance. To ensure that, once the typechecker
 448     has proven equality, this information sticks around, explicit casts are
 449     added. In our notation we only write the target type, but in reality a
 450     cast expressions carries around a \emph{coercion}, which can be seen as a
 451     proof of equality. TODO: Example
 452
 453     The value of a cast is the value of its body, unchanged. The type of this
 454     value is equal to the target type, not the type of its body.
 455
 456     Note that this syntax is also used sometimes to indicate that a particular
 457     expression has a particular type, even when no cast expression is
 458     involved. This is then purely informational, since the only elements that
 459     are explicitely typed in the Core language are the binder references and
 460     cast expressions, the types of all other elements are determined at
 461     runtime.
 462   \stopdesc
 463   \startdesc{Note}
 464
 465     The Core language in \small{GHC} allows adding \emph{notes}, which serve
 466     as hints to the inliner or add custom (string) annotations to a core
 467     expression. These shouldn't be generated normally, so these are not
 468     handled in any way in the prototype.
 469   \stopdesc
 470   \startdesc{Type}
 471 \startlambda
 472 @type
 473 \stoplambda
 474     It is possibly to use a Core type as a Core expression. This is done to
 475     allow for type abstractions and applications to be handled as normal
 476     lambda abstractions and applications above. This means that a type
 477     expression in Core can only ever occur in the argument position of an
 478     application, and only if the type of the function that is applied to
 479     expects a type as the first argument. This happens for all polymorphic
 480     functions, for example, the \lam{fst} function:
 481
 482 \startlambda
 483 fst :: \forall a. \forall b. (a, b) -> a
 484 fst = λtup.case tup of (,) a b -> a
 485
 486 fstint :: (Int, Int) -> Int
 487 fstint = λa.λb.fst @Int @Int a b
 488 \stoplambda
 489
 490     The type of \lam{fst} has two universally quantified type variables. When
 491     \lam{fst} is applied in \lam{fstint}, it is first applied to two types.
 492     (which are substitued for \lam{a} and \lam{b} in the type of \lam{fst}, so
 493     the type of \lam{fst} actual type of arguments and result can be found:
 494     \lam{fst @Int @Int :: (Int, Int) -> Int}).
 495   \stopdesc
 496
 497   TODO: Core type system
 498
 499   \section[sec:prototype:statetype]{State annotations in Haskell}
 500       Ideal: Type synonyms, since there is no additional code overhead for
 501       packing and unpacking. Downside: there is no explicit conversion in Core
 502       either, so type synonyms tend to get lost in expressions (they can be
 503       preserved in binders, but this makes implementation harder, since that
 504       statefulness of a value must be manually tracked).
 505
 506       Less ideal: Newtype. Requires explicit packing and unpacking of function
 507       arguments. If you don't unpack substates, there is no overhead for
 508       (un)packing substates. This will result in many nested State constructors
 509       in a nested state type. \eg:
 510
 511   \starttyping
 512   State (State Bit, State (State Word, Bit), Word)
 513   \stoptyping
 514
 515       Alternative: Provide different newtypes for input and output state. This
 516       makes the code even more explicit, and typechecking can find even more
 517       errors. However, this requires defining two type synomyms for each
 518       stateful function instead of just one. \eg:
 519   \starttyping
 520   type AccumStateIn = StateIn Bit
 521   type AccumStateOut = StateOut Bit
 522   \stoptyping
 523       This also increases the possibility of having different input and output
 524       states. Checking for identical input and output state types is also
 525       harder, since each element in the state must be unpacked and compared
 526       separately.
 527
 528       Alternative: Provide a type for the entire result type of a stateful
 529       function, not just the state part. \eg:
 530
 531   \starttyping
 532   newtype Result state result = Result (state, result)
 533   \stoptyping
 534
 535       This makes it easy to say "Any stateful function must return a
 536       \type{Result} type, without having to sort out result from state. However,
 537       this either requires a second type for input state (similar to
 538       \type{StateIn} / \type{StateOut} above), or requires the compiler to
 539       select the right argument for input state by looking at types (which works
 540       for complex states, but when that state has the same type as an argument,
 541       things get ambiguous) or by selecting a fixed (\eg, the last) argument,
 542       which might be limiting.
 543
 544       \subsubsection{Example}
 545       As an example of the used approach, a simple averaging circuit, that lets
 546       the accumulation of the inputs be done by a subcomponent.
 547
 548       \starttyping
 549         newtype State s = State s
 550
 551         type AccumState = State Bit
 552         accum :: Word -> AccumState -> (AccumState, Word)
 553         accum i (State s) = (State (s + i), s + i)
 554
 555         type AvgState = (AccumState, Word)
 556         avg :: Word -> AvgState -> (AvgState, Word)
 557         avg i (State s) = (State s', o)
 558           where
 559             (accums, count) = s
 560             -- Pass our input through the accumulator, which outputs a sum
 561             (accums', sum) = accum i accums
 562             -- Increment the count (which will be our new state)
 563             count' = count + 1
 564             -- Compute the average
 565             o = sum / count'
 566             s' = (accums', count')
 567       \stoptyping
 568
 569       And the normalized, core-like versions:
 570
 571       \starttyping
 572         accum i spacked = res
 573           where
 574             s = case spacked of (State s) -> s
 575             s' = s + i
 576             spacked' = State s'
 577             o = s + i
 578             res = (spacked', o)
 579
 580         avg i spacked = res
 581           where
 582             s = case spacked of (State s) -> s
 583             accums = case s of (accums, \_) -> accums
 584             count = case s of (\_, count) -> count
 585             accumres = accum i accums
 586             accums' = case accumres of (accums', \_) -> accums'
 587             sum = case accumres of (\_, sum) -> sum
 588             count' = count + 1
 589             o = sum / count'
 590             s' = (accums', count')
 591             spacked' = State s'
 592             res = (spacked', o)
 593       \stoptyping
 594
 595
 596
 597       As noted above, any component of a function's state that is a substate,
 598       \eg passed on as the state of another function, should have no influence
 599       on the hardware generated for the calling function. Any state-specific
 600       \small{VHDL} for this component can be generated entirely within the called
 601       function. So,we can completely leave out substates from any function.
 602
 603       From this observation, we might think to remove the substates from a
 604       function's states alltogether, and leave only the state components which
 605       are actual states of the current function. While doing this would not
 606       remove any information needed to generate \small{VHDL} from the function, it would
 607       cause the function definition to become invalid (since we won't have any
 608       substate to pass to the functions anymore). We could solve the syntactic
 609       problems by passing \type{undefined} for state variables, but that would
 610       still break the code on the semantic level (\ie, the function would no
 611       longer be semantically equivalent to the original input).
 612
 613       To keep the function definition correct until the very end of the process,
 614       we will not deal with (sub)states until we get to the \small{VHDL} generation.
 615       Here, we are translating from Core to \small{VHDL}, and we can simply not generate
 616       \small{VHDL} for substates, effectively removing the substate components
 617       alltogether.
 618
 619       There are a few important points when ignore substates.
 620
 621       First, we have to have some definition of "substate". Since any state
 622       argument or return value that represents state must be of the \type{State}
 623       type, we can simply look at its type. However, we must be careful to
 624       ignore only {\em substates}, and not a function's own state.
 625
 626       In the example above, this means we should remove \type{accums'} from
 627       \type{s'}, but not throw away \type{s'} entirely. We should, however,
 628       remove \type{s'} from the output port of the function, since the state
 629       will be handled by a \small{VHDL} procedure within the function.
 630
 631       When looking at substates, these can appear in two places: As part of an
 632       argument and as part of a return value. As noted above, these substates
 633       can only be used in very specific ways.
 634
 635       \desc{State variables can appear as an argument.} When generating \small{VHDL}, we
 636       completely ignore the argument and generate no input port for it.
 637
 638       \desc{State variables can be extracted from other state variables.} When
 639       extracting a state variable from another state variable, this always means
 640       we're extracting a substate, which we can ignore. So, we simply generate no
 641       \small{VHDL} for any extraction operation that has a state variable as a result.
 642
 643       \desc{State variables can be passed to functions.} When passing a
 644       state variable to a function, this always means we're passing a substate
 645       to a subcomponent. The entire argument can simply be ingored in the
 646       resulting port map.
 647
 648       \desc{State variables can be returned from functions.} When returning a
 649       state variable from a function (probably as a part of an algebraic
 650       datatype), this always mean we're returning a substate from a
 651       subcomponent. The entire state variable should be ignored in the resulting
 652       port map. The type binder of the binder that the function call is bound
 653       to should not include the state type either.
 654
 655       \startdesc{State variables can be inserted into other variables.} When inserting
 656       a state variable into another variable (usually by constructing that new
 657       variable using its constructor), we can identify two cases:
 658
 659       \startitemize
 660         \item The state is inserted into another state variable. In this case,
 661         the inserted state is a substate, and can be safely left out of the
 662         constructed variable.
 663         \item The state is inserted into a non-state variable. This happens when
 664         building up the return value of a function, where you put state and
 665         retsult variables together in an algebraic type (usually a tuple). In
 666         this case, we should leave the state variable out as well, since we
 667         don't want it to be included as an output port.
 668       \stopitemize
 669
 670       So, in both cases, we can simply leave out the state variable from the
 671       resulting value. In the latter case, however, we should generate a state
 672       proc instead, which assigns the state variable to the input state variable
 673       at each clock tick.
 674       \stopdesc
 675
 676       \desc{State variables can appear as (part of) a function result.} When
 677       generating \small{VHDL}, we can completely ignore any part of a function result
 678       that has a state type. If the entire result is a state type, this will
 679       mean the entity will not have an output port. Otherwise, the state
 680       elements will be removed from the type of the output port.
 681
 682
 683       Now, we know how to handle each use of a state variable separately. If we
 684       look at the whole, we can conclude the following:
 685
 686       \startitemize
 687       \item A state unpack operation should not generate any \small{VHDL}. The binder
 688       to which the unpacked state is bound should still be declared, this signal
 689       will become the register and will hold the current state.
 690       \item A state pack operation should not generate any \small{VHDL}. The binder th
 691       which the packed state is bound should not be declared. The binder that is
 692       packed is the signal that will hold the new state.
 693       \item Any values of a State type should not be translated to \small{VHDL}. In
 694       particular, State elements should be removed from tuples (and other
 695       datatypes) and arguments with a state type should not generate ports.
 696       \item To make the state actually work, a simple \small{VHDL} proc should be
 697       generated. This proc updates the state at every clockcycle, by assigning
 698       the new state to the current state. This will be recognized by synthesis
 699       tools as a register specification.
 700       \stopitemize
 701
 702
 703       When applying these rules to the example program (in normal form), we will
 704       get the following result. All the parts that don't generate any value are
 705       crossed out, leaving some very boring assignments here and there.
 706
 707
 708   \starthaskell
 709     avg i --spacked-- = res
 710       where
 711         s = --case spacked of (State s) -> s--
 712         --accums = case s of (accums, \_) -> accums--
 713         count = case s of (--\_,-- count) -> count
 714         accumres = accum i --accums--
 715         --accums' = case accumres of (accums', \_) -> accums'--
 716         sum = case accumres of (--\_,-- sum) -> sum
 717         count' = count + 1
 718         o = sum / count'
 719         s' = (--accums',-- count')
 720         --spacked' = State s'--
 721         res = (--spacked',-- o)
 722   \stophaskell
 723
 724       When we would really leave out the crossed out parts, we get a slightly
 725       weird program: There is a variable \type{s} which has no value, and there
 726       is a variable \type{s'} that is never used. Together, these two will form
 727       the state proc of the function. \type{s} contains the "current" state,
 728       \type{s'} is assigned the "next" state. So, at the end of each clock
 729       cycle, \type{s'} should be assigned to \type{s}.
 730
 731       Note that the definition of \type{s'} is not removed, even though one
 732       might think it as having a state type. Since the state type has a single
 733       argument constructor \type{State}, some type that should be the resulting
 734       state should always be explicitly packed with the State constructor,
 735       allowing us to remove the packed version, but still generate \small{VHDL} for the
 736       unpacked version (of course with any substates removed).
 737
 738       As you can see, the definition of \type{s'} is still present, since it
 739       does not have a state type (The State constructor. The \type{accums'} substate has been removed,
 740       leaving us just with the state of \type{avg} itself.
 741     \subsection{Initial state}
 742       How to specify the initial state? Cannot be done inside a hardware
 743       function, since the initial state is its own state argument for the first
 744       call (unless you add an explicit, synchronous reset port).
 745
 746       External init state is natural for simulation.
 747
 748       External init state works for hardware generation as well.
 749
 750       Implementation issues: state splitting, linking input to output state,
 751       checking usage constraints on state variables.
 752
 753         Implementation issues
 754           \subsection[sec:prototype:separate]{Separate compilation}
 755           - Simplified core?
 756
 757   \section{Haskell language coverage and constraints}
 758     Recursion
 759     Builtin types
 760     Custom types (Sum types, product types)
 761     Function types / higher order expressions