1 \chapter[chap:future]{Future work}
2 \section{Improved notation for hierarchical state}
3 The hierarchic state model requires quite some boilerplate code for unpacking
4 and distributing the input state and collecting and repacking the output
5 state. There is really only one way in which this state handling can be done,
6 so it would make sense to hide this boilerplate. This would incur no
7 flexibility cost at all, since there are no other ways that would work.
9 Options: Misuse the do notation in Haskell, create some abstraction in
10 Haskell or add new syntax.
12 \section[sec:future:pipelining]{Improved notation or abstraction for pipelining}
13 Since pipelining is a very common optimization for hardware systems, it should
14 be easy to specify a pipelined system. Since it involves quite some registers
15 into an otherwise regular combinatoric system, we might look for some way to
16 abstract away some of the boilerplate for pipelining.
18 Example of naive pipelined code
20 Using monadic do does not fit the typing
22 Using custom combinators would work
25 The main problems of recursion have been described in
26 \at{section}[sec:recursion]. In the current implementation, recursion is
27 therefore not possible, instead we rely on a number of implicit list-recursive
30 Since recursion is a very important and central concept in functional
31 programming, it would very much improve the flexibility and elegance of our
32 hardware descriptions if we could support full recursion.
34 For this, there are two main problems to solve:
37 \item For list recursion, how to make a description type check? It is quite
38 possible that improvements in the \small{GHC} typechecker will make this
39 possible, though it will still stay a challenge. Further advances in
40 dependent typing support for Haskell will probably help here as well.
42 TODO: Reference Christiaan and other type-level work
43 (http://personal.cis.strath.ac.uk/conor/pub/she/)
44 \item For all recursion, there is the obvious challenge of deciding when
45 recursion is finished. For list recursion, this might be easier (Since the
46 base case of the recursion influences the type signatures). For general
47 recursion, this requires a complete set of simplification and evaluation
48 transformations to prevent infinite expansion. The main challenge here is how
49 to make this set complete, or at least define the constraints on possible
50 recursion which guarantee it will work.
55 \section{Multiple clock domains and asynchronicity}
56 Cλash currently only supports synchronous systems with a single clock domain.
57 In a lot of real-world systems, both of these limitations pose problems.
59 There might be asynchronous events to which a system wants to respond. The
60 most obvious asynchronous event is of course a reset signal. Though a reset
61 signal can be synchronous, that is less flexible (and a hassle to describe in
62 Cλash, currently). Since every function in Cλash describes the behaviour on
63 each cycle boundary, we really can't fit in asynchronous behaviour easily.
65 Due to the same reason, multiple clock domains cannot be supported. There is
66 currently no way for the compiler to know in which clock domain a function
67 should operate and since the clock signal is never explicit, there is also no
68 way to express circuits that synchronize various clock domains.
70 A possible way to express more complex timing behaviour would be to make
71 functions more generic event handlers, where the system generates a stream of
72 events (Like \quote{clock up}, \quote{clock down}, \quote{input A changed},
73 \quote{reset}, etc.). When working with multiple clock domains, each domain
74 could get its own events.
76 As an example, we would have something like the following:
79 data Event = ClockUp | Reset | ...
81 type MainState = State Word
84 initstate :: MainState
87 main :: Word -> Event -> MainState -> (MainState, Word)
88 main inp event (State acc) = (State acc', acc')
91 -- On a reset signal, reset the accumulator and output
93 -- On a normal clock cycle, accumulate the result of func
94 ClockUp -> acc + (func inp event)
95 -- On any other event, leave state and output unchanged
98 -- func is some combinatoric expression
99 func :: Word -> Event -> Word
100 func inp _ = inp * 2 + 3
105 In this example, we see that every function takes an input of type
106 \hs{Event}. The function \hs{main} that takes the output of
107 \hs{func} and accumulates it on every clock cycle. On a reset signal, the
108 accumulator is reset. The function \hs{func} is just a combinatoric
109 function, with no synchronous elements. We can see this because there is no
110 state and the event input is completely ignored. If the compiler is smart
111 enough, we could even leave the event input out for functions that don't use
112 it, either because they are completely combinatoric (like in this example), or
113 because they rely on the the caller to select the clock signal.
115 This structure is similar to the event handling structure used to perform I/O
116 in languages like Amanda. TODO: Ref. There is a top level case expression that
117 decides what to do depending on the current input event.
119 A slightly more complex example that shows a system with two clock domains.
120 There is no real integration between the clock domains (there is one input and
121 one output for each clock domain), but it does show how multiple clocks can be
125 data Event = ClockUpA | ClockUpB | ...
127 type MainState = State (SubAState, SubBState)
130 initstate :: MainState
131 initstate = State (initsubastate, initsubbstate)
133 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
134 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
136 -- Only update the substates if the corresponding clock has an up
144 -- Always call suba and subb, so we can always have a value for our output
146 (sa'', outa) = suba inpa sa
147 (sb'', outb) = subb inpb sb
149 type SubAState = State ...
150 suba :: Word -> SubAState -> (SubAState, Word)
153 type SubBState = State ...
154 subb :: Word -> SubAState -> (SubAState, Word)
158 Note that in the above example the \hs{suba} and \hs{subb} functions are
159 \emph{always} called, to get at their combinatoric outputs. The new state
160 is calculated as well, but only saved when the right clock has an up
163 As you can see, there is some code duplication in the case statement that
164 selects the right clock. One of the advantages of an explicit approach like
165 this, is that some of this duplication can be extracted away into helper
166 functions. For example, we could imagine a \hs{select_clock} function, which
167 takes a stateful function that is not aware of events, and changes it into a
168 function that only updates its state on a specific (clock) event.
171 select_clock :: Event
172 -> (input -> State s -> (State s, output))
173 -> (input -> State s -> Event -> (State s, output))
174 select_clock clock func inp state event = (state', out)
176 state' = if clock == event then state'' else state
177 (state'', out) = func inp state
179 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
180 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
182 (sa'', outa) = (select_clock ClockUpA suba) inpa sa event
183 (sb'', outb) = (select_clock ClockUpB subb) inpb sb event
186 As you can see, this can greatly reduce the length of the main function, while
187 increasing the readability. As you might also have noted, the select\_clock
188 function takes any stateful function from the current Cλash prototype and
189 turns it into an event-aware function!
191 Going along this route for more complex timing behaviour seems promising,
192 espicially since it seems possible to express very advanced timing behaviours,
193 while still allowing simple functions without any extra overhead when complex
194 behaviour is not needed.
196 The main cost of this approach will probably be extra complexity in the
197 compiler: The paths (state) data can take become very non-trivial, and it
198 is probably hard to properly analyze these paths and produce the intended VHDL
201 \section{Multiple cycle descriptions}
202 In the current Cλash prototype, every description is a single-cycle
203 description. In other words, every function describes behaviour for each
204 separate cycle and any recursion (or folds, maps, etc.) is expanded in space.
206 Sometimes, you might want to have a complex description that could possibly
207 take multiple cycles. Some examples include:
210 \item Some kind of map or fold over a list that could be expanded in time
211 instead of space. This would result in a function that describes n cycles
212 instead of just one, where n is the lenght of the list.
213 \item A large combinatoric expressions that would introduce a very long
214 combinatoric path and thus limit clock frequency. Such an expression could
215 be broken up into multiple stages, which effectively results in a pipelined
216 system (see also \at{section}[sec:future:pipelining]) with a known delay.
217 There should probably be some way for the developer to specify the cycle
218 division of the expression, since automatically deciding on such a division
219 is too complex and contains too many tradeoffs, at least initially.
220 \item Unbounded recursion. It is not possible to expand unbounded (or even
221 recursion with a depth that is not known at compile time) in space, since
222 there is no way to tell how much hardware to create (it might even be
225 When expanding infinite recursion over time, each step of the recursion can
226 simply become a single clockcycle. When expanding bounded but unknown
227 recursion, we probably need to add an extra data valid output bit or
231 Apart from translating each of these multiple cycle descriptions into a per-cycle
232 description, we also need to somehow match the type signature of the multiple
233 cycle description to the type signature of the single cycle description that
234 the rest of the system expects (assuming that the rest of the system is
235 described in the \quote{normal} per-cycle manner). For example, an infinitely
236 recursive expression typically has the return type \lam{[a]}, while the rest
237 of the system would expect just \lam{a} (since the recursive expression
238 generates just a single element each cycle).
240 Naively, this matching could be done using a (builtin) function with a
241 signature like \lam{[a] -> a}, which also serves as an indicator to the
242 compiler that some expanding over time is required. However, this poses a
243 problem for simulation: How will our Haskell implementation of this magical
244 builtin function know which element of the input list to return. This
245 obviously depends on the current cycle number, but there is no way for this
246 function to know the current cycle without breaking all kinds of safety and
247 purity properties. Making this function have some state input and output could
248 help, though this state is not present in the actual hardware (or perhaps
249 there is some state, if there are value passed from one recursion step to the
250 next, but how can the type of that state be determined by the typechecker?).
252 It seems that this is the most pressing problem for multicycle descriptions:
253 How to interface with the rest of the system, without sacrificing safety and
254 simulation capabilities?
256 \section{Higher order values in state}
257 Another interesting idea is putting a higher order value inside a function's
258 state value. Since we can use higher order values anywhere, why not in the
261 As a (contrived) example, consider the following accumulator:
264 -- The accumulator function that takes a word and returns a new accumulator
265 -- and the result so far. This is the function we want to put inside the
267 type Acc = Word -> (Acc, Word)
268 acc = \a -> (\b -> acc ( a + b ), a )
270 main :: Word -> State Acc -> (State Acc, Word)
271 main a s = (State s', out)
272 where (s', out) = s a
275 This code uses a function as its state, which implicitely contains the value
276 accumulated so far. This is a fairly trivial example, that is more easy to
277 write with a simple \hs{Word} state value, but for more complex descriptions
278 this style might pay off. Note that in a way we are using the
279 \emph{continuation passing style} of writing functions, where we pass a
280 sort of \emph{continuation} from one cycle to the next.
282 Our normalization process completely removes higher order values inside a
283 function by moving applications and higher order values around so that every
284 higher order value will eventually be full applied. However, if we would put a
285 higher order value inside our state, the path from the higher order value
286 definition to its application runs through the state, which is external to the
287 function. A higher order value defined in one cycle is not applied until a
288 later cycle. Our normalization process only works within the function, so it
289 cannot remove this use of a higher order value.
291 However, we cannot leave a higher order value in our state value, since it is
292 impossible to generate a register containing a higher order value, we simply
293 can't translate a function type to a hardware type. To solve this, we must
294 replace the higher order value inside our state with some other value
295 representing these higher order values.
297 On obvious approach here is to use an algebraic datatype where each
298 constructor represents one possible higher order value that can end up in the
299 state and each constructor has an argument for each free variable of the
300 higher order value replaced. This allows us to completely reconstruct the
301 higher order value at the spot where it is applied, and thus the higher order
304 This approach is commonly known as the \quote{Reynolds approach to
305 defuntionalization}, first described by J.C. Reynolds and seems to apply well
306 to this situation. One note here is that Reynolds' approach puts all the
307 higher order values in a single datatype. For a typed language, we will at
308 least have to have a single datatype for each function type, since we can't
309 mix them. It would be even better to split these datatypes a bit further, so
310 that such a datatype will never hold a constructor that is never used for a
311 particular state variable. This separation is probably a non-trivial problem,
314 TODO: Reference "Definitional interpreters for higher-order programming
316 http://portal.acm.org/citation.cfm?id=805852\&dl=GUIDE\&coll=GUIDE\&CFID=58835435\&CFTOKEN=81856623
318 \section{New language}
319 During the development of the prototype, it has become clear that Haskell is
320 not the perfect language for this work. There are two main problems:
322 \item Haskell is not expressive enough. Haskell is still quite new in the area
323 of dependent typing, something we use extensively. Also, Haskell has some
324 special syntax to write monadic composition and arrow composition, which is
325 well suited to normal Haskell programs. However, for our hardware
326 descriptions, we could use some similar, but other, syntactic sugar (as we
328 \item Haskell is too expressive. There are some things that Haskell can
329 express, but we can never properly translate. There are certain types (both
330 primitive types and certain type constructions like recursive types) we can
331 never translate, as well as certain forms of recursion.
334 It might be good to reevaluate the choice of language for Cλash, perhaps there
335 are other languages which are better suited to describe hardware, now that
336 we've learned a lot about it.
338 Perhaps Haskell (or some other language) can be extended by using a
339 preprocessor. There has been some (point of proof) work on a the Strathclyde
340 Haskell Enhancement (\small{SHE}) preprocessor for type-level programming,
341 perhaps we can extend that, or create something similar for hardware-specific
344 It is not unlikely that the most flexible way
345 forward would be to define a completely new language with exactly the needed
346 features. This is of course an enormous effort, which should not be taken