1 \chapter[chap:future]{Future work}
2 \section{Improved notation for hierarchical state}
3 The hierarchic state model requires quite some boilerplate code for unpacking
4 and distributing the input state and collecting and repacking the output
7 \in{Example}[ex:NestedState] shows a simple composition of two stateful
8 functions, \hs{funca} and \hs{funcb}. The state annotation using the
9 \hs{State} newtype has been left out, for clarity and because the proposed
10 approach below cannot handle that (yet).
12 \startbuffer[NestedState]
13 type FooState = ( AState, BState )
14 foo :: Word -> FooState -> (FooState, Word)
18 (sa', outa) = funca in sa
19 (sb', outb) = funcb outa sb
22 \placeexample[here][ex:NestedState]{Simple function composing two stateful
23 functions.}{\typebufferhs{NestedState}}
25 Since state handling always follows strict rules, there is really no other way
26 in which this state handling can be done (you can of course move some things
27 around from the where clause to the pattern or vice versa, but it would still
28 be effectively the same thing). This means it makes extra sense to hide this
29 boilerplate away. This would incur no flexibility cost at all, since there are
30 no other ways that would work.
32 One particular notation in Haskell that seemed promising, whas he \hs{do}
33 notation. This is meant to simplify Monad notation by hiding away some
34 details. It allows one to write a list of expressions, which are composited
35 using the monadic \emph{bind} operator, written in Haskell as \hs{>>}. For
44 will be desugared into:
47 (somefunc a b) >> (otherfunc b c)
50 There is also the \hs{>>=} operator, which allows for passing variables from
51 one expression to the next. If we could use this notation to compose a
52 stateful computation from a number of other stateful functions, this could
53 move all the boilerplate code into the \hs{>>} operator. Perhaps the compiler
54 should be taught to always inline the \hs{>>} operator, but after that there
55 should be no further changes required to the compiler.
57 This is highlights an important aspect of using a functional language for our
58 descriptions: We can use the language itself to provide abstractions of common
59 patterns, making our code smaller.
61 \subsection{Outside the Monad}
62 However, simply using the monad notation is not as easy as it sounds. The main
63 problem is that the Monad type class poses a number of limitations on the
64 bind operator \hs{>>}. Most importantly, it has the following type signature:
67 (>>) :: (Monad m) => m a -> m b -> m b
70 This means that any expression in our composition must use the same Monad
71 instance as its type, only the "return" value can be different between
74 Ideally, we would like the \hs{>>} operator to have a type like the following
76 type Stateful s r = s -> (s, r)
77 (>>) :: Stateful s1 r1 -> Stateful s2 r2 -> Stateful (s1, s2) r2
80 What we see here, is that when we compose two stateful functions (whose inputs
81 have already been applied, leaving just the state argument to be applied), the
82 result is again a stateful function whose state is composed of the two
83 \emph{substates}. The return value is simply the return value of the second
84 function, discarding the first (to preserve that one, the \hs{>>=} operator
87 There is a trick we can apply to change the signature of the \hs{>>} operator.
88 \small{GHC} does not require the bind operators to be part of the \hs{Monad}
89 type class, as long as it can use them to translate the do notation. This
90 means we can define our own \hs{>>} and \hs{>>=} operators, outside of the
91 \hs{Monad} typeclass. This does conflict with the existing methods of the
92 \hs{Monad} typeclass, so we should prevent \small{GHC} from loading those (and
93 all of the Prelude) by passing \type{-XNoImplicitPrelude} to \type{ghc}. This
94 is slightly inconvenient, but since we hardly using anything from the prelude,
95 this is not a big problem. We might even be able to replace some of the
96 Prelude with hardware-translateable versions by doing this.
98 We can now define the following binding operators. For completeness, we also
99 supply the return function, which is not called by \small{GHC} implicitely,
100 but can be called explicitely by a hardware description.
103 (>>) :: Stateful s1 r1 -> Stateful s2 r2 -> Stateful (s1, s2) r2
104 f1 >> f2 = f1 >>= \_ -> f2
106 (>>=) :: Stateful s1 r1 -> (r1 -> Stateful s2 r2) -> Stateful (s1, s2) r2
107 f1 >>= f2 = \(s1, s2) -> let (s1', r1) = f1 s1
111 return :: r -> Stateful () r
112 return r = \s -> (s, r)
115 As you can see, this closely resembles the boilerplate of unpacking state,
116 passing it to two functions and repacking the new state. With these
117 definitions, we could have writting \in{example}[ex:NestedState] a lot
118 shorter, see \in{example}[ex:DoState]. In this example the type signature of
119 foo is the same (though it is now written using the \hs{Stateful} type
120 synonym, it is still completel equivalent to the original: \hs{foo :: Word ->
121 FooState -> (FooState, Word)}.
123 Note that the \hs{FooState} type has changed (so indirectly the type of
124 \hs{foo} as well). Since the state composition by the \hs{>>} works on two
125 stateful functions at a time, the final state consists of nested two-tuples.
126 The final \hs{()} in the state originates from the fact that the \hs{return}
127 function has no real state, but is part of the composition. We could have left
128 out the return statement (and the \hs{outb <-} part) to make \hs{foo}'s return
129 value equal to \hs{funcb}'s, but this approach makes it clearer what is
132 \startbuffer[DoState]
133 type FooState = ( AState, (BState, ()) )
134 foo :: Word -> Stateful FooState Word
137 outb <- funcb outa sb
140 \placeexample[here][ex:DoState]{Simple function composing two stateful
141 functions, using do notation.}
142 {\typebufferhs{DoState}}
144 An important implication of this approach is that the order of writing
145 function applications affects the state type. Fortunately, this problem can be
146 localized by consistently using type synonyms for state types, which should
147 prevent changes in other function's source when a function changes.
149 A less obvous implications of this approach is that the scope of variables
150 produced by each of these expressions (using the \hs{<-} syntax) is limited to
151 the expressions that come after it. This prevents values from flowing between
152 two functions (components) in two directions. For most Monad instances, this
153 is a requirement, but here it could have been different.
155 \subsection{Alternative syntax}
156 Because of the above issues, misusing Haskell's do notation is probably not
157 the best solution here. However, it does show that using fairly simple
158 abstractions, we could hide a lot of the boilerplate code. Extending
159 \small{GHC} with some new syntax sugar similar to the do notation might be a
162 \section[sec:future:pipelining]{Improved notation or abstraction for pipelining}
163 Since pipelining is a very common optimization for hardware systems, it should
164 be easy to specify a pipelined system. Since it involves quite some registers
165 into an otherwise regular combinatoric system, we might look for some way to
166 abstract away some of the boilerplate for pipelining.
168 Something similar to the state boilerplate removal above might be appropriate:
169 Abstract away some of the boilerplate code using combinators, then hide away
170 the combinators in special syntax. The combinators will be slightly different,
171 since there is a (typing) distinction between a pipeline stage and a pipeline
172 consisting of multiple stages. Also, it seems necessary to treat either the
173 first or the last pipeline stage differently, to prevent an off-by-one error
174 in the amount of registers (which is similar to the extra \hs{()} state type
175 in \in{example}[ex:DoState], which is harmless there, but would be a problem
176 if it introduced an extra, useless, pipeline stage).
178 This problem is slightly more complex than the problem we've seen before. One
179 significant difference is that each variable that crosses a stage boundary
180 needs a registers. However, when a variable crosses multiple stage boundaries,
181 it must be stored for a longer period and should receive multiple registers.
182 Since we can't find out from the combinator code where the result of the
183 combined values is used (at least not without using Template Haskell to
184 inspect the \small{AST}), there seems to be no easy way to find how much
185 registers are needed.
187 There seem to be two obvious ways of handling this problem:
190 \item Limit the scoping of each variable produced by a stage to the next
191 stage only. This means that any variables that are to be used in subsequent
192 stages should be passed on explicitely, which should allocate the required
195 This produces cumbersome code, where there is still a lot of explicitness
196 (though this could be hidden in syntax sugar).
197 \item Scope each variable over every subsequent pipeline stage and allocate
198 the maximum amount of registers that \emph{could} be needed. This means we
199 will allocate registers that are never used, but those could be optimized
200 away later. This does mean we need some way to introduce a variable number
201 of variables (depending on the total number of stages), assign the output of
202 a different register to each (\eg., a different part of the state) and scope
203 a different one of them over each the subsequent stages.
205 This also means that when adding a stage to an existing pipeline will change
206 the state type of each of the subsequent pipeline stages, and the state type
207 ofthe added stage depends on the number of subsequent stages.
209 Properly describing this will probably also require quite explicit syntax,
210 meaning this is not feasible without some special syntax.
213 Some other interesting issues include pipeline stages which are already
214 stateful, mixing pipelined with normal computation, etc.
217 The main problems of recursion have been described in
218 \in{section}[sec:recursion]. In the current implementation, recursion is
219 therefore not possible, instead we rely on a number of implicit list-recursive
222 Since recursion is a very important and central concept in functional
223 programming, it would very much improve the flexibility and elegance of our
224 hardware descriptions if we could support full recursion.
226 For this, there are two main problems to solve:
229 \item For list recursion, how to make a description type check? It is quite
230 possible that improvements in the \small{GHC} typechecker will make this
231 possible, though it will still stay a challenge. Further advances in
232 dependent typing support for Haskell will probably help here as well.
234 TODO: Reference Christiaan and other type-level work
235 (http://personal.cis.strath.ac.uk/conor/pub/she/)
236 \item For all recursion, there is the obvious challenge of deciding when
237 recursion is finished. For list recursion, this might be easier (Since the
238 base case of the recursion influences the type signatures). For general
239 recursion, this requires a complete set of simplification and evaluation
240 transformations to prevent infinite expansion. The main challenge here is how
241 to make this set complete, or at least define the constraints on possible
242 recursion which guarantee it will work.
247 \section{Multiple clock domains and asynchronicity}
248 Cλash currently only supports synchronous systems with a single clock domain.
249 In a lot of real-world systems, both of these limitations pose problems.
251 There might be asynchronous events to which a system wants to respond. The
252 most obvious asynchronous event is of course a reset signal. Though a reset
253 signal can be synchronous, that is less flexible (and a hassle to describe in
254 Cλash, currently). Since every function in Cλash describes the behaviour on
255 each cycle boundary, we really can't fit in asynchronous behaviour easily.
257 Due to the same reason, multiple clock domains cannot be supported. There is
258 currently no way for the compiler to know in which clock domain a function
259 should operate and since the clock signal is never explicit, there is also no
260 way to express circuits that synchronize various clock domains.
262 A possible way to express more complex timing behaviour would be to make
263 functions more generic event handlers, where the system generates a stream of
264 events (Like \quote{clock up}, \quote{clock down}, \quote{input A changed},
265 \quote{reset}, etc.). When working with multiple clock domains, each domain
266 could get its own events.
268 As an example, we would have something like the following:
271 data Event = ClockUp | Reset | ...
273 type MainState = State Word
276 initstate :: MainState
279 main :: Word -> Event -> MainState -> (MainState, Word)
280 main inp event (State acc) = (State acc', acc')
283 -- On a reset signal, reset the accumulator and output
285 -- On a normal clock cycle, accumulate the result of func
286 ClockUp -> acc + (func inp event)
287 -- On any other event, leave state and output unchanged
290 -- func is some combinatoric expression
291 func :: Word -> Event -> Word
292 func inp _ = inp * 2 + 3
297 In this example, we see that every function takes an input of type
298 \hs{Event}. The function \hs{main} that takes the output of
299 \hs{func} and accumulates it on every clock cycle. On a reset signal, the
300 accumulator is reset. The function \hs{func} is just a combinatoric
301 function, with no synchronous elements. We can see this because there is no
302 state and the event input is completely ignored. If the compiler is smart
303 enough, we could even leave the event input out for functions that don't use
304 it, either because they are completely combinatoric (like in this example), or
305 because they rely on the the caller to select the clock signal.
307 This structure is similar to the event handling structure used to perform I/O
308 in languages like Amanda. TODO: Ref. There is a top level case expression that
309 decides what to do depending on the current input event.
311 A slightly more complex example that shows a system with two clock domains.
312 There is no real integration between the clock domains (there is one input and
313 one output for each clock domain), but it does show how multiple clocks can be
317 data Event = ClockUpA | ClockUpB | ...
319 type MainState = State (SubAState, SubBState)
322 initstate :: MainState
323 initstate = State (initsubastate, initsubbstate)
325 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
326 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
328 -- Only update the substates if the corresponding clock has an up
336 -- Always call suba and subb, so we can always have a value for our output
338 (sa'', outa) = suba inpa sa
339 (sb'', outb) = subb inpb sb
341 type SubAState = State ...
342 suba :: Word -> SubAState -> (SubAState, Word)
345 type SubBState = State ...
346 subb :: Word -> SubAState -> (SubAState, Word)
350 Note that in the above example the \hs{suba} and \hs{subb} functions are
351 \emph{always} called, to get at their combinatoric outputs. The new state
352 is calculated as well, but only saved when the right clock has an up
355 As you can see, there is some code duplication in the case statement that
356 selects the right clock. One of the advantages of an explicit approach like
357 this, is that some of this duplication can be extracted away into helper
358 functions. For example, we could imagine a \hs{select_clock} function, which
359 takes a stateful function that is not aware of events, and changes it into a
360 function that only updates its state on a specific (clock) event.
363 select_clock :: Event
364 -> (input -> State s -> (State s, output))
365 -> (input -> State s -> Event -> (State s, output))
366 select_clock clock func inp state event = (state', out)
368 state' = if clock == event then state'' else state
369 (state'', out) = func inp state
371 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
372 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
374 (sa'', outa) = (select_clock ClockUpA suba) inpa sa event
375 (sb'', outb) = (select_clock ClockUpB subb) inpb sb event
378 As you can see, this can greatly reduce the length of the main function, while
379 increasing the readability. As you might also have noted, the select\_clock
380 function takes any stateful function from the current Cλash prototype and
381 turns it into an event-aware function!
383 Going along this route for more complex timing behaviour seems promising,
384 espicially since it seems possible to express very advanced timing behaviours,
385 while still allowing simple functions without any extra overhead when complex
386 behaviour is not needed.
388 The main cost of this approach will probably be extra complexity in the
389 compiler: The paths (state) data can take become very non-trivial, and it
390 is probably hard to properly analyze these paths and produce the intended VHDL
393 \section{Multiple cycle descriptions}
394 In the current Cλash prototype, every description is a single-cycle
395 description. In other words, every function describes behaviour for each
396 separate cycle and any recursion (or folds, maps, etc.) is expanded in space.
398 Sometimes, you might want to have a complex description that could possibly
399 take multiple cycles. Some examples include:
402 \item Some kind of map or fold over a list that could be expanded in time
403 instead of space. This would result in a function that describes n cycles
404 instead of just one, where n is the lenght of the list.
405 \item A large combinatoric expressions that would introduce a very long
406 combinatoric path and thus limit clock frequency. Such an expression could
407 be broken up into multiple stages, which effectively results in a pipelined
408 system (see also \in{section}[sec:future:pipelining]) with a known delay.
409 There should probably be some way for the developer to specify the cycle
410 division of the expression, since automatically deciding on such a division
411 is too complex and contains too many tradeoffs, at least initially.
412 \item Unbounded recursion. It is not possible to expand unbounded (or even
413 recursion with a depth that is not known at compile time) in space, since
414 there is no way to tell how much hardware to create (it might even be
417 When expanding infinite recursion over time, each step of the recursion can
418 simply become a single clockcycle. When expanding bounded but unknown
419 recursion, we probably need to add an extra data valid output bit or
423 Apart from translating each of these multiple cycle descriptions into a per-cycle
424 description, we also need to somehow match the type signature of the multiple
425 cycle description to the type signature of the single cycle description that
426 the rest of the system expects (assuming that the rest of the system is
427 described in the \quote{normal} per-cycle manner). For example, an infinitely
428 recursive expression typically has the return type \lam{[a]}, while the rest
429 of the system would expect just \lam{a} (since the recursive expression
430 generates just a single element each cycle).
432 Naively, this matching could be done using a (builtin) function with a
433 signature like \lam{[a] -> a}, which also serves as an indicator to the
434 compiler that some expanding over time is required. However, this poses a
435 problem for simulation: How will our Haskell implementation of this magical
436 builtin function know which element of the input list to return. This
437 obviously depends on the current cycle number, but there is no way for this
438 function to know the current cycle without breaking all kinds of safety and
439 purity properties. Making this function have some state input and output could
440 help, though this state is not present in the actual hardware (or perhaps
441 there is some state, if there are value passed from one recursion step to the
442 next, but how can the type of that state be determined by the typechecker?).
444 It seems that this is the most pressing problem for multicycle descriptions:
445 How to interface with the rest of the system, without sacrificing safety and
446 simulation capabilities?
448 \section{Higher order values in state}
449 Another interesting idea is putting a higher order value inside a function's
450 state value. Since we can use higher order values anywhere, why not in the
453 As a (contrived) example, consider the following accumulator:
456 -- The accumulator function that takes a word and returns a new accumulator
457 -- and the result so far. This is the function we want to put inside the
459 type Acc = Word -> (Acc, Word)
460 acc = \a -> (\b -> acc ( a + b ), a )
462 main :: Word -> State Acc -> (State Acc, Word)
463 main a s = (State s', out)
464 where (s', out) = s a
467 This code uses a function as its state, which implicitely contains the value
468 accumulated so far. This is a fairly trivial example, that is more easy to
469 write with a simple \hs{Word} state value, but for more complex descriptions
470 this style might pay off. Note that in a way we are using the
471 \emph{continuation passing style} of writing functions, where we pass a
472 sort of \emph{continuation} from one cycle to the next.
474 Our normalization process completely removes higher order values inside a
475 function by moving applications and higher order values around so that every
476 higher order value will eventually be full applied. However, if we would put a
477 higher order value inside our state, the path from the higher order value
478 definition to its application runs through the state, which is external to the
479 function. A higher order value defined in one cycle is not applied until a
480 later cycle. Our normalization process only works within the function, so it
481 cannot remove this use of a higher order value.
483 However, we cannot leave a higher order value in our state value, since it is
484 impossible to generate a register containing a higher order value, we simply
485 can't translate a function type to a hardware type. To solve this, we must
486 replace the higher order value inside our state with some other value
487 representing these higher order values.
489 On obvious approach here is to use an algebraic datatype where each
490 constructor represents one possible higher order value that can end up in the
491 state and each constructor has an argument for each free variable of the
492 higher order value replaced. This allows us to completely reconstruct the
493 higher order value at the spot where it is applied, and thus the higher order
496 This approach is commonly known as the \quote{Reynolds approach to
497 defuntionalization}, first described by J.C. Reynolds and seems to apply well
498 to this situation. One note here is that Reynolds' approach puts all the
499 higher order values in a single datatype. For a typed language, we will at
500 least have to have a single datatype for each function type, since we can't
501 mix them. It would be even better to split these datatypes a bit further, so
502 that such a datatype will never hold a constructor that is never used for a
503 particular state variable. This separation is probably a non-trivial problem,
506 TODO: Reference "Definitional interpreters for higher-order programming
508 http://portal.acm.org/citation.cfm?id=805852\&dl=GUIDE\&coll=GUIDE\&CFID=58835435\&CFTOKEN=81856623
510 \section{New language}
511 During the development of the prototype, it has become clear that Haskell is
512 not the perfect language for this work. There are two main problems:
514 \item Haskell is not expressive enough. Haskell is still quite new in the area
515 of dependent typing, something we use extensively. Also, Haskell has some
516 special syntax to write monadic composition and arrow composition, which is
517 well suited to normal Haskell programs. However, for our hardware
518 descriptions, we could use some similar, but other, syntactic sugar (as we
520 \item Haskell is too expressive. There are some things that Haskell can
521 express, but we can never properly translate. There are certain types (both
522 primitive types and certain type constructions like recursive types) we can
523 never translate, as well as certain forms of recursion.
526 It might be good to reevaluate the choice of language for Cλash, perhaps there
527 are other languages which are better suited to describe hardware, now that
528 we've learned a lot about it.
530 Perhaps Haskell (or some other language) can be extended by using a
531 preprocessor. There has been some (point of proof) work on a the Strathclyde
532 Haskell Enhancement (\small{SHE}) preprocessor for type-level programming,
533 perhaps we can extend that, or create something similar for hardware-specific
536 It is not unlikely that the most flexible way
537 forward would be to define a completely new language with exactly the needed
538 features. This is of course an enormous effort, which should not be taken
541 \section{Don't care values}
542 A powerful value in \VHDL is the \emph{don't care} value, given as
543 \type{'-'}. This value tells the compiler that you don't really care about
544 which value is assigned to a signal, allowing the compiler to make some
545 optimizations. Since hardware choice in hardware is often implemented using
546 a collection of logic gates instead of multiplexers only, synthesizers can
547 often reduce the amount of hardware needed by smartly choosing values for
548 these don't care cases.
550 There is not really anything comparable with don't care values in normal
551 programming languages. The closest thing is an undefined or uninitialized
552 value, though those are usually associated with error conditions.
554 It would be useful if Cλash also has some way to specify don't care values.
555 When looking at the Haskell typing system, there are really two ways to do
559 \item Add a special don't care value to every datatype. This includes the
560 obvious \hs{Bit} type, but will also need to include every user defined
561 type. An exception can be made for vectors and product types, since those
562 can simply use the don't care values of the types they contain.
564 This would also require some kind of \quote{Dont careable} type class
565 that allows each type to specify what its don't care value is. The
566 compiler must then recognize this constant and replace it with don't care
567 values in the final \VHDL code.
569 This is of course a very intrusive solution. Every type must become member
570 of this typeclass, and there is now some member in every type that is a
571 special don't care value. Guaranteeing the obvious don't care semantics
572 also becomes harder, since every pattern match or case statement must now
573 also take care of the don't care value (this might actually be an
574 advantage, since it forces designers to specify how to handle don't care
575 for different operations).
577 \item Use the special \hs{undefined}, or \emph{bottom} value present in
578 Haskell. This is a type that is member of all types automatically, without
579 any explicit declarations.
581 Any expression that requires evaluation of an undefined value
582 automatically becomes undefined itself (or rather, there is some exception
583 mechanism). Since Haskell is lazy, this means that whenever it tries to
584 evaluate undefined, it is apparently required for determining the output
585 of the system. This property is useful, since typically a don't care
586 output is used when some output is not valid and should not be read. If
587 it is in fact not read, it should never be evaluated and simulation should
590 In practice, this works less ideal. In particular, pattern matching is not
591 always smart enough to deal with undefined. Consider the following
592 definition of an \hs{and} operator:
595 or :: Bit -> Bit -> Bit
601 When using the \hs{and} operation on an undefined (don't care) and a Low
602 value should always return a Low value. Its value does not depend on the
603 value chosen for the don't care value. However, though when applying the
604 above and function to \hs{Low} and \hs{undefined} results in exactly that
605 behviour, the result is \hs{undefined} when the arguments are swapped.
606 This is because the first pattern forces the first argument to be
607 evaluated. If it is \hs{undefined}, evaluation is halted and an exception
608 is show, which is not what is intended.
611 These options should be explored further to see if they provide feasible
612 methods for describing don't care conditions. Possibly there are completely
613 other methods which work better.