1 \chapter[chap:future]{Future work}
2 \section{Improved notation for hierarchical state}
3 The hierarchical state model requires quite some boilerplate code for unpacking
4 and distributing the input state and collecting and repacking the output
7 \in{Example}[ex:NestedState] shows a simple composition of two stateful
8 functions, \hs{funca} and \hs{funcb}. The state annotation using the
9 \hs{State} newtype has been left out, for clarity and because the proposed
10 approach below cannot handle that (yet).
12 \startbuffer[NestedState]
13 type FooState = ( AState, BState )
14 foo :: Word -> FooState -> (FooState, Word)
18 (sa', outa) = funca in sa
19 (sb', outb) = funcb outa sb
22 \placeexample[here][ex:NestedState]{Simple function composing two stateful
23 functions.}{\typebufferhs{NestedState}}
25 Since state handling always follows strict rules, there is really no other way
26 in which this state handling can be done (you can of course move some things
27 around from the where clause to the pattern or vice versa, but it would still
28 be effectively the same thing). This means it makes extra sense to hide this
29 boilerplate away. This would incur no flexibility cost at all, since there are
30 no other ways that would work.
32 One particular notation in Haskell that seems promising, is the \hs{do}
33 notation. This is meant to simplify Monad notation by hiding away some
34 details. It allows one to write a list of expressions, which are composited
35 using the monadic \emph{bind} operator, written in Haskell as \hs{>>}. For
44 will be desugared into:
47 (somefunc a) >> (otherfunc b)
50 \todo{Properly introduce >>=}
51 There is also the \hs{>>=} operator, which allows for passing variables from
52 one expression to the next. If we could use this notation to compose a
53 stateful computation from a number of other stateful functions, this could
54 move all the boilerplate code into the \hs{>>} operator. Perhaps the compiler
55 should be taught to always inline the \hs{>>} operator, but after that there
56 should be no further changes required to the compiler.
58 This is highlights an important aspect of using a functional language for our
59 descriptions: We can use the language itself to provide abstractions of common
60 patterns, making our code smaller.
62 \subsection{Breaking out of the Monad}
63 However, simply using the monad notation is not as easy as it sounds. The main
64 problem is that the Monad type class poses a number of limitations on the
65 bind operator \hs{>>}. Most importantly, it has the following type signature:
68 (>>) :: (Monad m) => m a -> m b -> m b
71 This means that any expression in our composition must use the same Monad
72 instance as its type, only the "return" value can be different between
75 Ideally, we would like the \hs{>>} operator to have a type like the following
77 type Stateful s r = s -> (s, r)
78 (>>) :: Stateful s1 r1 -> Stateful s2 r2 -> Stateful (s1, s2) r2
81 What we see here, is that when we compose two stateful functions (whose inputs
82 have already been applied, leaving just the state argument to be applied), the
83 result is again a stateful function whose state is composed of the two
84 \emph{substates}. The return value is simply the return value of the second
85 function, discarding the first (to preserve that one, the \hs{>>=} operator
88 There is a trick we can apply to change the signature of the \hs{>>} operator.
89 \small{GHC} does not require the bind operators to be part of the \hs{Monad}
90 type class, as long as it can use them to translate the do notation. This
91 means we can define our own \hs{>>} and \hs{>>=} operators, outside of the
92 \hs{Monad} typeclass. This does conflict with the existing methods of the
93 \hs{Monad} typeclass, so we should prevent \small{GHC} from loading those (and
94 all of the Prelude) by passing \type{-XNoImplicitPrelude} to \type{ghc}. This
95 is slightly inconvenient, but since we hardly using anything from the prelude,
96 this is not a big problem. We might even be able to replace some of the
97 Prelude with hardware-translateable versions by doing this.
99 We can now define the following binding operators. For completeness, we also
100 supply the return function, which is not called by \small{GHC} implicitely,
101 but can be called explicitely by a hardware description.
104 (>>) :: Stateful s1 r1 -> Stateful s2 r2 -> Stateful (s1, s2) r2
105 f1 >> f2 = f1 >>= \_ -> f2
107 (>>=) :: Stateful s1 r1 -> (r1 -> Stateful s2 r2) -> Stateful (s1, s2) r2
108 f1 >>= f2 = \(s1, s2) -> let (s1', r1) = f1 s1
112 return :: r -> Stateful () r
113 return r = \s -> (s, r)
116 As you can see, this closely resembles the boilerplate of unpacking state,
117 passing it to two functions and repacking the new state. With these
118 definitions, we could have writting \in{example}[ex:NestedState] a lot
119 shorter, see \in{example}[ex:DoState]. In this example the type signature of
120 foo is the same (though it is now written using the \hs{Stateful} type
121 synonym, it is still completely equivalent to the original: \hs{foo :: Word ->
122 FooState -> (FooState, Word)}.
124 Note that the \hs{FooState} type has changed (so indirectly the type of
125 \hs{foo} as well). Since the state composition by the \hs{>>} works on two
126 stateful functions at a time, the final state consists of nested two-tuples.
127 The final \hs{()} in the state originates from the fact that the \hs{return}
128 function has no real state, but is part of the composition. We could have left
129 out the return statement (and the \hs{outb <-} part) to make \hs{foo}'s return
130 value equal to \hs{funcb}'s, but this approach makes it clearer what is
133 \startbuffer[DoState]
134 type FooState = ( AState, (BState, ()) )
135 foo :: Word -> Stateful FooState Word
141 \placeexample[here][ex:DoState]{Simple function composing two stateful
142 functions, using do notation.}
143 {\typebufferhs{DoState}}
145 An important implication of this approach is that the order of writing
146 function applications affects the state type. Fortunately, this problem can be
147 localized by consistently using type synonyms for state types, which should
148 prevent changes in other function's source when a function changes.
150 A less obvous implications of this approach is that the scope of variables
151 produced by each of these expressions (using the \hs{<-} syntax) is limited to
152 the expressions that come after it. This prevents values from flowing between
153 two functions (components) in two directions. For most Monad instances, this
154 is a requirement, but here it could have been different.
156 \todo{Add examples or reference for state synonyms}
158 \subsection{Alternative syntax}
159 Because of these typing issues, misusing Haskell's do notation is probably not
160 the best solution here. However, it does show that using fairly simple
161 abstractions, we could hide a lot of the boilerplate code. Extending
162 \small{GHC} with some new syntax sugar similar to the do notation might be a
165 \section[sec:future:pipelining]{Improved notation or abstraction for pipelining}
166 Since pipelining is a very common optimization for hardware systems, it should
167 be easy to specify a pipelined system. Since it introduces quite some registers
168 into an otherwise regular combinatoric system, we might look for some way to
169 abstract away some of the boilerplate for pipelining.
171 Something similar to the state boilerplate removal above might be appropriate:
172 Abstract away some of the boilerplate code using combinators, then hide away
173 the combinators in special syntax. The combinators will be slightly different,
174 since there is a (typing) distinction between a pipeline stage and a pipeline
175 consisting of multiple stages. Also, it seems necessary to treat either the
176 first or the last pipeline stage differently, to prevent an off-by-one error
177 in the amount of registers (which is similar to the extra \hs{()} state type
178 in \in{example}[ex:DoState], which is harmless there, but would be a problem
179 if it introduced an extra, useless, pipeline stage).
181 This problem is slightly more complex than the problem we've seen before. One
182 significant difference is that each variable that crosses a stage boundary
183 needs a register. However, when a variable crosses multiple stage boundaries,
184 it must be stored for a longer period and should receive multiple registers.
185 Since we can't find out from the combinator code where the result of the
186 combined values is used (at least not without using Template Haskell to
187 inspect the \small{AST}), there seems to be no easy way to find how much
188 registers are needed.
190 There seem to be two obvious ways of handling this problem:
193 \item Limit the scoping of each variable produced by a stage to the next
194 stage only. This means that any variables that are to be used in subsequent
195 stages should be passed on explicitely, which should allocate the required
198 This produces cumbersome code, where there is still a lot of explicitness
199 (though this could be hidden in syntax sugar).
200 \todo{The next sentence is unclear}
201 \item Scope each variable over every subsequent pipeline stage and allocate
202 the maximum amount of registers that \emph{could} be needed. This means we
203 will allocate registers that are never used, but those could be optimized
204 away later. This does mean we need some way to introduce a variable number
205 of variables (depending on the total number of stages), assign the output of
206 a different register to each (\eg., a different part of the state) and scope
207 a different one of them over each the subsequent stages.
209 This also means that when adding a stage to an existing pipeline will change
210 the state type of each of the subsequent pipeline stages, and the state type
211 ofthe added stage depends on the number of subsequent stages.
213 Properly describing this will probably also require quite explicit syntax,
214 meaning this is not feasible without some special syntax.
217 Some other interesting issues include pipeline stages which are already
218 stateful, mixing pipelined with normal computation, etc.
221 The main problems of recursion have been described in
222 \in{section}[sec:recursion]. In the current implementation, recursion is
223 therefore not possible, instead we rely on a number of implicitly list-recursive
226 Since recursion is a very important and central concept in functional
227 programming, it would very much improve the flexibility and elegance of our
228 hardware descriptions if we could support (full) recursion.
230 For this, there are two main problems to solve:
233 \item For list recursion, how to make a description type check? It is quite
234 possible that improvements in the \small{GHC} typechecker will make this
235 possible, though it will still stay a challenge. Further advances in
236 dependent typing support for Haskell will probably help here as well.
238 \todo{Reference Christiaan and other type-level work
239 (http://personal.cis.strath.ac.uk/conor/pub/she/)}
240 \item For all recursion, there is the obvious challenge of deciding when
241 recursion is finished. For list recursion, this might be easier (Since the
242 base case of the recursion influences the type signatures). For general
243 recursion, this requires a complete set of simplification and evaluation
244 transformations to prevent infinite expansion. The main challenge here is how
245 to make this set complete, or at least define the constraints on possible
246 recursion that guarantee it will work.
248 \todo{Reference Christian for loop unrolling?}
251 \section{Multiple clock domains and asynchronicity}
252 Cλash currently only supports synchronous systems with a single clock domain.
253 In a lot of real-world systems, both of these limitations pose problems.
255 There might be asynchronous events to which a system wants to respond. The
256 most obvious asynchronous event is of course a reset signal. Though a reset
257 signal can be synchronous, that is less flexible (and a hassle to describe in
258 Cλash, currently). Since every function in Cλash describes the behaviour on
259 each cycle boundary, we really can't fit in asynchronous behaviour easily.
261 Due to the same reason, multiple clock domains cannot be easily supported. There is
262 currently no way for the compiler to know in which clock domain a function
263 should operate and since the clock signal is never explicit, there is also no
264 way to express circuits that synchronize various clock domains.
266 A possible way to express more complex timing behaviour would be to make
267 functions more generic event handlers, where the system generates a stream of
268 events (Like \quote{clock up}, \quote{clock down}, \quote{input A changed},
269 \quote{reset}, etc.). When working with multiple clock domains, each domain
270 could get its own clock events.
272 As an example, we would have something like the following:
275 data Event = ClockUp | Reset | ...
277 type MainState = State Word
280 initstate :: MainState
283 main :: Word -> Event -> MainState -> (MainState, Word)
284 main inp event (State acc) = (State acc', acc')
287 -- On a reset signal, reset the accumulator and output
289 -- On a normal clock cycle, accumulate the result of func
290 ClockUp -> acc + (func inp event)
291 -- On any other event, leave state and output unchanged
294 -- func is some combinatoric expression
295 func :: Word -> Event -> Word
296 func inp _ = inp * 2 + 3
301 In this example, we see that every function takes an input of type
302 \hs{Event}. The function \hs{main} that takes the output of
303 \hs{func} and accumulates it on every clock cycle. On a reset signal, the
304 accumulator is reset. The function \hs{func} is just a combinatoric
305 function, with no synchronous elements. We can see this because there is no
306 state and the event input is completely ignored. If the compiler is smart
307 enough, we could even leave the event input out for functions that don't use
308 it, either because they are completely combinatoric (like in this example), or
309 because they rely on the the caller to select the clock signal.
311 This structure is similar to the event handling structure used to perform I/O
312 in languages like Amanda. \todo{ref} There is a top level case expression that
313 decides what to do depending on the current input event.
315 A slightly more complex example that shows a system with two clock domains.
316 There is no real integration between the clock domains (there is one input and
317 one output for each clock domain), but it does show how multiple clocks can be
321 data Event = ClockUpA | ClockUpB | ...
323 type MainState = State (SubAState, SubBState)
326 initstate :: MainState
327 initstate = State (initsubastate, initsubbstate)
329 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
330 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
332 -- Only update the substates if the corresponding clock has an up
340 -- Always call suba and subb, so we can always have a value for our output
342 (sa'', outa) = suba inpa sa
343 (sb'', outb) = subb inpb sb
345 type SubAState = State ...
346 suba :: Word -> SubAState -> (SubAState, Word)
349 type SubBState = State ...
350 subb :: Word -> SubAState -> (SubAState, Word)
354 Note that in the above example the \hs{suba} and \hs{subb} functions are
355 \emph{always} called, to get at their combinatoric outputs. The new state
356 is calculated as well, but only saved when the right clock has an up
359 As you can see, there is some code duplication in the case statement that
360 selects the right clock. One of the advantages of an explicit approach like
361 this, is that some of this duplication can be extracted away into helper
362 functions. For example, we could imagine a \hs{select_clock} function, which
363 takes a stateful function that is not aware of events, and changes it into a
364 function that only updates its state on a specific (clock) event.
367 select_clock :: Event
368 -> (input -> State s -> (State s, output))
369 -> (input -> State s -> Event -> (State s, output))
370 select_clock clock func inp state event = (state', out)
372 state' = if clock == event then state'' else state
373 (state'', out) = func inp state
375 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
376 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
378 (sa'', outa) = (select_clock ClockUpA suba) inpa sa event
379 (sb'', outb) = (select_clock ClockUpB subb) inpb sb event
382 As you can see, this can greatly reduce the length of the main function, while
383 increasing the readability. As you might also have noted, the select\_clock
384 function takes any stateful function from the current Cλash prototype and
385 turns it into an event-aware function!
387 Going along this route for more complex timing behaviour seems promising,
388 espicially since it seems possible to express very advanced timing behaviours,
389 while still allowing simple functions without any extra overhead when complex
390 behaviour is not needed.
392 The main cost of this approach will probably be extra complexity in the
393 compiler: The paths (state) data can take become very non-trivial, and it
394 is probably hard to properly analyze these paths and produce the
395 intended \VHDL description.
397 \section{Multiple cycle descriptions}
398 In the current Cλash prototype, every description is a single-cycle
399 description. In other words, every function describes behaviour for each
400 separate cycle and any recursion (or folds, maps, etc.) is expanded in space.
402 Sometimes, you might want to have a complex description that could possibly
403 take multiple cycles. Some examples include:
406 \item Some kind of map or fold over a list that could be expanded in time
407 instead of space. This would result in a function that describes n cycles
408 instead of just one, where n is the lenght of the list.
409 \item A large combinatoric expressions that would introduce a very long
410 combinatoric path and thus limit clock frequency. Such an expression could
411 be broken up into multiple stages, which effectively results in a pipelined
412 system (see also \in{section}[sec:future:pipelining]) with a known delay.
413 There should probably be some way for the developer to specify the cycle
414 division of the expression, since automatically deciding on such a division
415 is too complex and contains too many tradeoffs, at least initially.
416 \item Unbounded recursion. It is not possible to expand unbounded (or even
417 recursion with a depth that is not known at compile time) in space, since
418 there is no way to tell how much hardware to create (it might even be
421 When expanding infinite recursion over time, each step of the recursion can
422 simply become a single clockcycle. When expanding bounded but unknown
423 recursion, we probably need to add an extra data valid output bit or
427 Apart from translating each of these multiple cycle descriptions into a per-cycle
428 description, we also need to somehow match the type signature of the multiple
429 cycle description to the type signature of the single cycle description that
430 the rest of the system expects (assuming that the rest of the system is
431 described in the \quote{normal} per-cycle manner). For example, an infinitely
432 recursive expression typically has the return type \lam{[a]}, while the rest
433 of the system would expect just \lam{a} (since the recursive expression
434 generates just a single element each cycle).
436 Naively, this matching could be done using a (builtin) function with a
437 signature like \lam{[a] -> a}, which also serves as an indicator to the
438 compiler that some expanding over time is required. However, this poses a
439 problem for simulation: How will our Haskell implementation of this magical
440 builtin function know which element of the input list to return. This
441 obviously depends on the current cycle number, but there is no way for this
442 function to know the current cycle without breaking all kinds of safety and
443 purity properties. Making this function have some state input and output could
444 help, though this state is not present in the actual hardware (or perhaps
445 there is some state, if there are value passed from one recursion step to the
446 next, but how can the type of that state be determined by the typechecker?).
448 It seems that this is the most pressing problem for multicycle descriptions:
449 How to interface with the rest of the system, without sacrificing safety and
450 simulation capabilities?
452 \section{Higher order values in state}
453 Another interesting idea is putting a higher order value inside a function's
454 state value. Since we can use higher order values anywhere, why not in the
457 As a (contrived) example, consider the following accumulator:
460 -- The accumulator function that takes a word and returns a new accumulator
461 -- and the result so far. This is the function we want to put inside the
463 type Acc = Word -> (Acc, Word)
464 acc = \a -> (\b -> acc ( a + b ), a )
466 main :: Word -> State Acc -> (State Acc, Word)
467 main a s = (State s', out)
468 where (s', out) = s a
471 This code uses a function as its state, which implicitely contains the value
472 accumulated so far. This is a fairly trivial example, that is more easy to
473 write with a simple \hs{Word} state value, but for more complex descriptions
474 this style might pay off. Note that in a way we are using the
475 \emph{continuation passing style} of writing functions, where we pass a
476 sort of \emph{continuation} from one cycle to the next.
478 Our normalization process completely removes higher order values inside a
479 function by moving applications and higher order values around so that every
480 higher order value will eventually be full applied. However, if we would put a
481 higher order value inside our state, the path from the higher order value
482 definition to its application runs through the state, which is external to the
483 function. A higher order value defined in one cycle is not applied until a
484 later cycle. Our normalization process only works within the function, so it
485 cannot remove this use of a higher order value.
487 However, we cannot leave a higher order value in our state value, since it is
488 impossible to generate a register containing a higher order value, we simply
489 can't translate a function type to a hardware type. To solve this, we must
490 replace the higher order value inside our state with some other value
491 representing these higher order values.
493 On obvious approach here is to use an algebraic datatype where each
494 constructor represents one possible higher order value that can end up in the
495 state and each constructor has an argument for each free variable of the
496 higher order value replaced. This allows us to completely reconstruct the
497 higher order value at the spot where it is applied, and thus the higher order
500 This approach is commonly known as the \quote{Reynolds approach to
501 defuntionalization}, first described by J.C. Reynolds and seems to apply well
502 to this situation. One note here is that Reynolds' approach puts all the
503 higher order values in a single datatype. For a typed language, we will at
504 least have to have a single datatype for each function type, since we can't
505 mix them. It would be even better to split these datatypes a bit further, so
506 that such a datatype will never hold a constructor that is never used for a
507 particular state variable. This separation is probably a non-trivial problem,
510 \todo{Reference "Definitional interpreters for higher-order programming
512 http://portal.acm.org/citation.cfm?id=805852\&dl=GUIDE\&coll=GUIDE\&CFID=58835435\&CFTOKEN=81856623}
514 \section{New language}
515 During the development of the prototype, it has become clear that Haskell is
516 not the perfect language for this work. There are two main problems:
518 \item Haskell is not expressive enough. Haskell is still quite new in the area
519 of dependent typing, something we use extensively. Also, Haskell has some
520 special syntax to write monadic composition and arrow composition, which is
521 well suited to normal Haskell programs. However, for our hardware
522 descriptions, we could use some similar, but other, syntactic sugar (as we
524 \item Haskell is too expressive. There are some things that Haskell can
525 express, but we can never properly translate. There are certain types (both
526 primitive types and certain type constructions like recursive types) we can
527 never translate, as well as certain forms of recursion.
530 It might be good to reevaluate the choice of language for Cλash, perhaps there
531 are other languages which are better suited to describe hardware, now that
532 we've learned a lot about it.
534 Perhaps Haskell (or some other language) can be extended by using a
535 preprocessor. There has been some (point of proof) work on a the Strathclyde
536 Haskell Enhancement (\small{SHE}) preprocessor for type-level programming,
537 perhaps we can extend that, or create something similar for hardware-specific
540 It is not unlikely that the most flexible way
541 forward would be to define a completely new language with exactly the needed
542 features. This is of course an enormous effort, which should not be taken
545 \section{Don't care values}
546 A powerful value in \VHDL is the \emph{don't care} value, given as
547 \type{'-'}. This value tells the compiler that you don't really care about
548 which value is assigned to a signal, allowing the compiler to make some
549 optimizations. Since hardware choice in hardware is often implemented using
550 a collection of logic gates instead of multiplexers only, synthesizers can
551 often reduce the amount of hardware needed by smartly choosing values for
552 these don't care cases.
554 There is not really anything comparable with don't care values in normal
555 programming languages. The closest thing is an undefined or uninitialized
556 value, though those are usually associated with error conditions.
558 It would be useful if Cλash also has some way to specify don't care values.
559 When looking at the Haskell typing system, there are really two ways to do
563 \item Add a special don't care value to every datatype. This includes the
564 obvious \hs{Bit} type, but will also need to include every user defined
565 type. An exception can be made for vectors and product types, since those
566 can simply use the don't care values of the types they contain.
568 This would also require some kind of \quote{Dont careable} type class
569 that allows each type to specify what its don't care value is. The
570 compiler must then recognize this constant and replace it with don't care
571 values in the final \VHDL code.
573 This is of course a very intrusive solution. Every type must become member
574 of this typeclass, and there is now some member in every type that is a
575 special don't care value. Guaranteeing the obvious don't care semantics
576 also becomes harder, since every pattern match or case statement must now
577 also take care of the don't care value (this might actually be an
578 advantage, since it forces designers to specify how to handle don't care
579 for different operations).
581 \item Use the special \hs{undefined}, or \emph{bottom} value present in
582 Haskell. This is a type that is member of all types automatically, without
583 any explicit declarations.
585 Any expression that requires evaluation of an undefined value
586 automatically becomes undefined itself (or rather, there is some exception
587 mechanism). Since Haskell is lazy, this means that whenever it tries to
588 evaluate undefined, it is apparently required for determining the output
589 of the system. This property is useful, since typically a don't care
590 output is used when some output is not valid and should not be read. If
591 it is in fact not read, it should never be evaluated and simulation should
594 In practice, this works less ideal. In particular, pattern matching is not
595 always smart enough to deal with undefined. Consider the following
596 definition of an \hs{and} operator:
599 or :: Bit -> Bit -> Bit
605 When using the \hs{and} operation on an undefined (don't care) and a Low
606 value should always return a Low value. Its value does not depend on the
607 value chosen for the don't care value. However, though when applying the
608 above and function to \hs{Low} and \hs{undefined} results in exactly that
609 behviour, the result is \hs{undefined} when the arguments are swapped.
610 This is because the first pattern forces the first argument to be
611 evaluated. If it is \hs{undefined}, evaluation is halted and an exception
612 is show, which is not what is intended.
615 These options should be explored further to see if they provide feasible
616 methods for describing don't care conditions. Possibly there are completely
617 other methods which work better.
619 \section{Correctness proofs of the normalization system}
620 As stated in \in{section}[sec:normalization:properties], there are a
621 number of properties we would like to see verified about the
622 normalization system. In particular, the \emph{termination} and
623 \emph{completeness} of the system would be a good candidate for future
624 research. Specifying formal semantics for the Core language in
625 order to verify the \emph{soundness} of the system would be an even more
627 % vim: set sw=2 sts=2 expandtab: