1 \chapter[chap:prototype]{Prototype}
2 An important step in this research is the creation of a prototype compiler.
3 Having this prototype allows us to apply the ideas from the previous chapter
4 to actual hardware descriptions and evaluate their usefulness. Having a
5 prototype also helps to find new techniques and test possible
8 Obviously the prototype was not created after all research
9 ideas were formed, but its implementation has been interleaved with the
10 research itself. Also, the prototype described here is the final version, it
11 has gone through a number of design iterations which we will not completely
14 \section{Choice of language}
15 When implementing this prototype, the first question to ask is: What
16 (functional) language will we use to describe our hardware? (Note that
17 this does not concern the \emph{implementation language} of the compiler,
18 just the language \emph{translated by} the compiler).
20 On the highest level, we have two choices:
23 \item Create a new functional language from scratch. This has the
24 advantage of having a language that contains exactly those elements that
25 are convenient for describing hardware and can contain special
26 constructs that might.
27 \item Use an existing language and create a new backend for it. This has
28 the advantage that existing tools can be reused, which will speed up
32 Considering that we required a prototype which should be working quickly,
33 and that implementing parsers, semantic checkers and especially
34 typcheckers isn't exactly the core of this research (but it is lots and
35 lots of work!), using an existing language is the obvious choice. This
36 also has the advantage that a large set of language features is available
37 to experiment with and it is easy to find which features apply well and
38 which don't. A possible second prototype could use a custom language with
39 just the useful features (and possibly extra features that are specific to
40 the domain of hardware description as well).
42 The second choice is to pick one of the many existing languages. As
43 mentioned before, this language is Haskell. This choice has not been the
44 result of a thorough comparison of languages, for the simple reason that
45 the requirements on the language were completely unclear at the start of
46 this language. The fact that Haskell is a language with a broad spectrum
47 of features, that it is commonly used in research projects and that the
48 primary compiler, GHC, provides a high level API to its internals, made
49 Haskell an obvious choice.
51 TODO: Was Haskell really a good choice? Perhaps say this somewhere else?
53 \section{Prototype design}
54 As stated above, we will use the Glasgow Haskell Compiler (\small{GHC}) to
55 implement our prototype compiler. To understand the design of the
56 compiler, we will first dive into the \small{GHC} compiler a bit. It's
57 compilation consists of the following steps (slightly simplified):
59 \startuseMPgraphic{ghc-pipeline}
61 save inp, front, desugar, simpl, back, out;
63 newBox.front(btex Parser etex);
64 newBox.desugar(btex Desugarer etex);
65 newBox.simpl(btex Simplifier etex);
66 newBox.back(btex Backend etex);
69 % Space the boxes evenly
70 inp.c - front.c = front.c - desugar.c = desugar.c - simpl.c
71 = simpl.c - back.c = back.c - out.c = (0, 1.5cm);
74 % Draw lines between the boxes. We make these lines "deferred" and give
75 % them a name, so we can use ObjLabel to draw a label beside them.
76 ncline.inp(inp)(front) "name(haskell)";
77 ncline.front(front)(desugar) "name(ast)";
78 ncline.desugar(desugar)(simpl) "name(core)";
79 ncline.simpl(simpl)(back) "name(simplcore)";
80 ncline.back(back)(out) "name(native)";
81 ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
82 ObjLabel.front(btex Haskell AST etex) "labpathname(ast)", "labdir(rt)";
83 ObjLabel.desugar(btex Core etex) "labpathname(core)", "labdir(rt)";
84 ObjLabel.simpl(btex Simplified core etex) "labpathname(simplcore)", "labdir(rt)";
85 ObjLabel.back(btex Native code etex) "labpathname(native)", "labdir(rt)";
87 % Draw the objects (and deferred labels)
88 drawObj (inp, front, desugar, simpl, back, out);
90 \placefigure[right]{GHC compiler pipeline}{\useMPgraphic{ghc-pipeline}}
93 This step takes the Haskell source files and parses them into an
94 abstract syntax tree (\small{AST}). This \small{AST} can express the
95 complete Haskell language and is thus a very complex one (in contrast
96 with the Core \small{AST}, later on). All identifiers in this
97 \small{AST} are resolved by the renamer and all types are checked by the
100 \startdesc{Desugaring}
101 This steps takes the full \small{AST} and translates it to the
102 \emph{Core} language. Core is a very small functional language with lazy
103 semantics, that can still express everything Haskell can express. Its
104 simpleness makes Core very suitable for further simplification and
105 translation. Core is the language we will be working on as well.
107 \startdesc{Simplification}
108 Through a number of simplification steps (such as inlining, common
109 subexpression elimination, etc.) the Core program is simplified to make
110 it faster or easier to process further.
113 This step takes the simplified Core program and generates an actual
114 runnable program for it. This is a big and complicated step we will not
115 discuss it any further, since it is not required for our prototype.
118 In this process, there a number of places where we can start our work.
119 Assuming that we don't want to deal with (or modify) parsing, typechecking
120 and other frontend business and that native code isn't really a useful
121 format anymore, we are left with the choice between the full Haskell
122 \small{AST}, or the smaller (simplified) core representation.
124 The advantage of taking the full \small{AST} is that the exact structure
125 of the source program is preserved. We can see exactly what the hardware
126 descriiption looks like and which syntax constructs were used. However,
127 the full \small{AST} is a very complicated datastructure. If we are to
128 handle everything it offers, we will quickly get a big compiler.
130 Using the core representation gives us a much more compact datastructure
131 (a core expression only uses 9 constructors). Note that this does not mean
132 that the core representation itself is smaller, on the contrary. Since the
133 core language has less constructs, a lot of things will take a larger
134 expression to express.
136 However, the fact that the core language is so much smaller, means it is a
137 lot easier to analyze and translate it into something else. For the same
138 reason, \small{GHC} runs its simplifications and optimizations on the core
139 representation as well.
141 However, we will use the normal core representation, not the simplified
142 core. Reasons for this are detailed below.
144 The final prototype roughly consists of three steps:
146 \startuseMPgraphic{ghc-pipeline}
148 save inp, front, norm, vhdl, out;
149 newEmptyBox.inp(0,0);
150 newBox.front(btex \small{GHC} frontend + desugarer etex);
151 newBox.norm(btex Normalization etex);
152 newBox.vhdl(btex \small{VHDL} generation etex);
153 newEmptyBox.out(0,0);
155 % Space the boxes evenly
156 inp.c - front.c = front.c - norm.c = norm.c - vhdl.c
157 = vhdl.c - out.c = (0, 1.5cm);
160 % Draw lines between the boxes. We make these lines "deferred" and give
161 % them a name, so we can use ObjLabel to draw a label beside them.
162 ncline.inp(inp)(front) "name(haskell)";
163 ncline.front(front)(norm) "name(core)";
164 ncline.norm(norm)(vhdl) "name(normal)";
165 ncline.vhdl(vhdl)(out) "name(vhdl)";
166 ObjLabel.inp(btex Haskell source etex) "labpathname(haskell)", "labdir(rt)";
167 ObjLabel.front(btex Core etex) "labpathname(core)", "labdir(rt)";
168 ObjLabel.norm(btex Normalized core etex) "labpathname(normal)", "labdir(rt)";
169 ObjLabel.vhdl(btex \small{VHDL} description etex) "labpathname(vhdl)", "labdir(rt)";
171 % Draw the objects (and deferred labels)
172 drawObj (inp, front, norm, vhdl, out);
174 \placefigure[right]{GHC compiler pipeline}{\useMPgraphic{ghc-pipeline}}
177 This is exactly the frontend and desugarer from the \small{GHC}
178 pipeline, that translates Haskell sources to a core representation.
180 \startdesc{Normalization}
181 This is a step that transforms the core representation into a normal
182 form. This normal form is still expressed in the core language, but has
183 to adhere to an extra set of constraints. This normal form is less
184 expressive than the full core language (e.g., it can have limited higher
185 order expressions, has a specific structure, etc.), but is also very
186 close to directly describing hardware.
188 \startdesc{\small{VHDL} generation}
189 The last step takes the normal formed core representation and generates
190 \small{VHDL} for it. Since the normal form has a specific, hardware-like
191 structure, this final step is very straightforward.
194 The most interesting step in this process is the normalization step. That
195 is where more complicated functional constructs, which have no direct
196 hardware interpretation, are removed and translated into hardware
197 constructs. This step is described in a lot of detail at
198 \in{chapter}[chap:normalization].
200 \section{The Core language}
201 Most of the prototype deals with handling the program in the Core
202 language. In this section we will show what this language looks like and
205 The Core language is a functional language that describes
206 \emph{expressions}. Every identifier used in Core is called a
207 \emph{binder}, since it is bound to a value somewhere. On the highest
208 level, a Core program is a collection of functions, each of which bind a
209 binder (the function name) to an expression (the function value, which has
212 The Core language itself does not prescribe any program structure, only
213 expression structure. In the \small{GHC} compiler, the Haskell module
214 structure is used for the resulting Core code as well. Since this is not
215 so relevant for understanding the Core language or the Normalization
216 process, we'll only look at the Core expression language here.
218 Each Core expression consists of one of these possible expressions.
220 \startdesc{Variable reference}
224 This is a simple reference to a binder. It's written down as the
225 name of the binder that is being referred to, which should of course be
226 bound in a containing scope (including top level scope, so a reference
227 to a top level function is also a variable reference). Additionally,
228 constructors from algebraic datatypes also become variable references.
230 The value of this expression is the value bound to the given binder.
232 Each binder also carries around its type, but this is usually not shown
233 in the Core expressions. Occasionally, the type of an entire expression
234 or function is shown for clarity, but this is only informational. In
235 practice, the type of an expression is easily determined from the
236 structure of the expression and the types of the binders and occasional
237 cast expressions. This minimize the amount of bookkeeping needed to keep
238 the typing consistent.
244 This is a simple literal. Only primitive types are supported, like
245 chars, strings, ints and doubles. The types of these literals are the
246 \quote{primitive} versions, like \lam{Char\#} and \lam{Word\#}, not the
247 normal Haskell versions (but there are builtin conversion functions).
249 \startdesc{Application}
253 This is simple function application. Each application consists of two
254 parts: The function part and the argument part. Applications are used
255 for normal function \quote{calls}, but also for applying type
256 abstractions and data constructors.
258 The value of an application is the value of the function part, with the
259 first argument binder bound to the argument part.
261 \startdesc{Lambda abstraction}
265 This is the basic lambda abstraction, as it occurs in labmda calculus.
266 It consists of a binder part and a body part. A lambda abstraction
267 creates a function, that can be applied to an argument.
269 Note that the body of a lambda abstraction extends all the way to the
270 end of the expression, or the closing bracket surrounding the lambda. In
271 other words, the lambda abstraction \quote{operator} has the lowest
274 The value of an application is the value of the body part, with the
275 binder bound to the value the entire lambda abstraction is applied to.
277 \startdesc{Non-recursive let expression}
279 let bndr = value in body
281 A let expression allows you to bind a binder to some value, while
282 evaluating to some other value (where that binder is in scope). This
283 allows for sharing of subexpressions (you can use a binder twice) and
284 explicit \quote{naming} of arbitrary expressions. Note that the binder
285 is not in scope in the value bound to it, so it's not possible to make
286 recursive definitions with the normal form of the let expression (see
287 the recursive form below).
289 Even though this let expression is an extension on the basic lambda
290 calculus, it is easily translated to a lambda abstraction. The let
291 expression above would then become:
297 This notion might be useful for verifying certain properties on
298 transformations, since a lot of verification work has been done on
299 lambda calculus already.
301 The value of a let expression is the value of the body part, with the
302 binder bound to the value.
304 \startdesc{Recursive let expression}
314 This is the recursive version of the let expression. In \small{GHC}'s
315 Core implementation, non-recursive and recursive lets are not so
316 distinct as we present them here, but this provides a clearer overview.
318 The main difference with the normal let expression is that each of the
319 binders is in scope in each of the values, in addition to the body. This
320 allows for self-recursive definitions or mutually recursive definitions.
322 It should also be possible to express a recursive let using normal
323 lambda calculus, if we use the \emph{least fixed-point operator},
326 \startdesc{Case expression}
329 DEFAULT -> defaultbody
330 C0 bndr0,0 ... bndr0,m -> body0
332 Cn bndrn,0 ... bndrn,m -> bodyn
337 A case expression is the only way in Core to choose between values. A case
338 expression evaluates its scrutinee, which should have an algebraic
339 datatype, into weak head normal form (\small{WHNF}) and (optionally) binds
340 it to \lam{bndr}. It then chooses a body depending on the constructor of
341 its scrutinee. If none of the constructors match, the \lam{DEFAULT}
342 alternative is chosen.
344 Since we can only match the top level constructor, there can be no overlap
345 in the alternatives and thus order of alternatives is not relevant (though
346 the \lam{DEFAULT} alternative must appear first for implementation
349 Any arguments to the constructor in the scrutinee are bound to each of the
350 binders after the constructor and are in scope only in the corresponding
353 To support strictness, the scrutinee is always evaluated into WHNF, even
354 when there is only a \lam{DEFAULT} alternative. This allows a strict
355 function argument to be written like:
358 function (case argument of arg
362 This seems to be the only use for the extra binder to which the scrutinee
363 is bound. When not using strictness annotations (which is rather pointless
364 in hardware descriptions), \small{GHC} seems to never generate any code
365 making use of this binder. The current prototype does not handle it
366 either, which probably means that code using it would break.
368 Note that these case statements are less powerful than the full Haskell
369 case statements. In particular, they do not support complex patterns like
370 in Haskell. Only the constructor of an expression can be matched, complex
371 patterns are implemented using multiple nested case expressions.
373 Case statements are also used for unpacking of algebraic datatypes, even
374 when there is only a single constructor. For examples, to add the elements
375 of a tuple, the following Core is generated:
378 sum = λtuple.case tuple of
382 Here, there is only a single alternative (but no \lam{DEFAULT}
383 alternative, since the single alternative is already exhaustive). When
384 it's body is evaluated, the arguments to the tuple constructor \lam{(,)}
385 (\eg, the elements of the tuple) are bound to \lam{a} and \lam{b}.
387 \startdesc{Cast expression}
391 A cast expression allows you to change the type of an expression to an
392 equivalent type. Note that this is not meant to do any actual work, like
393 conversion of data from one format to another, or force a complete type
394 change. Instead, it is meant to change between different representations
395 of the same type, \eg switch between types that are provably equal (but
398 In our hardware descriptions, we typically see casts to change between a
399 Haskell newtype and its contained type, since those are effectively
400 different representations of the same type.
402 More complex are types that are proven to be equal by the typechecker,
403 but look different at first glance. To ensure that, once the typechecker
404 has proven equality, this information sticks around, explicit casts are
405 added. In our notation we only write the target type, but in reality a
406 cast expressions carries around a \emph{coercion}, which can be seen as a
407 proof of equality. TODO: Example
409 The value of a cast is the value of its body, unchanged. The type of this
410 value is equal to the target type, not the type of its body.
412 Note that this syntax is also used sometimes to indicate that a particular
413 expression has a particular type, even when no cast expression is
414 involved. This is then purely informational, since the only elements that
415 are explicitely typed in the Core language are the binder references and
416 cast expressions, the types of all other elements are determined at
421 The Core language in \small{GHC} allows adding \emph{notes}, which serve
422 as hints to the inliner or add custom (string) annotations to a core
423 expression. These shouldn't be generated normally, so these are not
424 handled in any way in the prototype.
430 It is possibly to use a Core type as a Core expression. This is done to
431 allow for type abstractions and applications to be handled as normal
432 lambda abstractions and applications above. This means that a type
433 expression in Core can only ever occur in the argument position of an
434 application, and only if the type of the function that is applied to
435 expects a type as the first argument. This happens for all polymorphic
436 functions, for example, the \lam{fst} function:
439 fst :: \forall a. \forall b. (a, b) -> a
440 fst = λtup.case tup of (,) a b -> a
442 fstint :: (Int, Int) -> Int
443 fstint = λa.λb.fst @Int @Int a b
446 The type of \lam{fst} has two universally quantified type variables. When
447 \lam{fst} is applied in \lam{fstint}, it is first applied to two types.
448 (which are substitued for \lam{a} and \lam{b} in the type of \lam{fst}, so
449 the type of \lam{fst} actual type of arguments and result can be found:
450 \lam{fst @Int @Int :: (Int, Int) -> Int}).
453 TODO: Core type system
455 Implementation issues
458 Haskell language coverage / constraints
461 Custom types (Sum types, product types)
462 Function types / higher order expressions