Chapters/Future.tex

   1 \chapter[chap:future]{Future work}
   2 \section{Improved notation for hierarchical state}
   3 The hierarchic state model requires quite some boilerplate code for unpacking
   4 and distributing the input state and collecting and repacking the output
   5 state. There is really only one way in which this state handling can be done,
   6 so it would make sense to hide this boilerplate. This would incur no
   7 flexibility cost at all, since there are no other ways that would work.
   8
   9 Options: Misuse the do notation in Haskell, create some abstraction in
  10 Haskell or add new syntax.
  11
  12 \section[sec:future:pipelining]{Improved notation or abstraction for pipelining}
  13 Since pipelining is a very common optimization for hardware systems, it should
  14 be easy to specify a pipelined system. Since it involves quite some registers
  15 into an otherwise regular combinatoric system, we might look for some way to
  16 abstract away some of the boilerplate for pipelining.
  17
  18 Example of naive pipelined code
  19
  20 Using monadic do does not fit the typing
  21
  22 Using custom combinators would work
  23
  24 \section{Recursion}
  25 The main problems of recursion have been described in
  26 \at{section}[sec:recursion]. In the current implementation, recursion is
  27 therefore not possible, instead we rely on a number of implicit list-recursive
  28 builtin functions.
  29
  30 Since recursion is a very important and central concept in functional
  31 programming, it would very much improve the flexibility and elegance of our
  32 hardware descriptions if we could support full recursion.
  33
  34 For this, there are two main problems to solve:
  35
  36 \startitemize
  37 \item For list recursion, how to make a description type check? It is quite
  38 possible that improvements in the \small{GHC} typechecker will make this
  39 possible, though it will still stay a challenge. Further advances in
  40 dependent typing support for Haskell will probably help here as well.
  41
  42 TODO: Reference Christiaan and other type-level work
  43 (http://personal.cis.strath.ac.uk/conor/pub/she/)
  44 \item For all recursion, there is the obvious challenge of deciding when
  45 recursion is finished. For list recursion, this might be easier (Since the
  46 base case of the recursion influences the type signatures). For general
  47 recursion, this requires a complete set of simplification and evaluation
  48 transformations to prevent infinite expansion. The main challenge here is how
  49 to make this set complete, or at least define the constraints on possible
  50 recursion which guarantee it will work.
  51
  52 TODO: Loop unrolling
  53 \stopitemize
  54
  55 \section{Multiple clock domains and asynchronicity}
  56 Cλash currently only supports synchronous systems with a single clock domain.
  57 In a lot of real-world systems, both of these limitations pose problems.
  58
  59 There might be asynchronous events to which a system wants to respond. The
  60 most obvious asynchronous event is of course a reset signal. Though a reset
  61 signal can be synchronous, that is less flexible (and a hassle to describe in
  62 Cλash, currently). Since every function in Cλash describes the behaviour on
  63 each cycle boundary, we really can't fit in asynchronous behaviour easily.
  64
  65 Due to the same reason, multiple clock domains cannot be supported. There is
  66 currently no way for the compiler to know in which clock domain a function
  67 should operate and since the clock signal is never explicit, there is also no
  68 way to express circuits that synchronize various clock domains.
  69
  70 A possible way to express more complex timing behaviour would be to make
  71 functions more generic event handlers, where the system generates a stream of
  72 events (Like \quote{clock up}, \quote{clock down}, \quote{input A changed},
  73 \quote{reset}, etc.). When working with multiple clock domains, each domain
  74 could get its own events.
  75
  76 As an example, we would have something like the following:
  77
  78 \starthaskell
  79 data Event = ClockUp | Reset | ...
  80
  81 type MainState = State Word
  82
  83 -- Initial state
  84 initstate :: MainState
  85 initstate = State 0
  86
  87 main :: Word -> Event -> MainState -> (MainState, Word)
  88 main inp event (State acc) = (State acc', acc')
  89   where
  90     acc' = case event of
  91       -- On a reset signal, reset the accumulator and output
  92       Reset -> initstate
  93       -- On a normal clock cycle, accumulate the result of func
  94       ClockUp -> acc + (func inp event)
  95       -- On any other event, leave state and output unchanged
  96       _ -> acc
  97
  98 -- func is some combinatoric expression
  99 func :: Word -> Event -> Word
 100 func inp _ = inp * 2 + 3
 101 \stophaskell
 102
 103 TODO: Picture
 104
 105 In this example, we see that every function takes an input of type
 106 \hs{Event}. The function \hs{main} that takes the output of
 107 \hs{func} and accumulates it on every clock cycle. On a reset signal, the
 108 accumulator is reset. The function \hs{func} is just a combinatoric
 109 function, with no synchronous elements.  We can see this because there is no
 110 state and the event input is completely ignored. If the compiler is smart
 111 enough, we could even leave the event input out for functions that don't use
 112 it, either because they are completely combinatoric (like in this example), or
 113 because they rely on the the caller to select the clock signal.
 114
 115 This structure is similar to the event handling structure used to perform I/O
 116 in languages like Amanda. TODO: Ref. There is a top level case expression that
 117 decides what to do depending on the current input event.
 118
 119 A slightly more complex example that shows a system with two clock domains.
 120 There is no real integration between the clock domains (there is one input and
 121 one output for each clock domain), but it does show how multiple clocks can be
 122 distinguished.
 123
 124 \starthaskell
 125 data Event = ClockUpA | ClockUpB | ...
 126
 127 type MainState = State (SubAState, SubBState)
 128
 129 -- Initial state
 130 initstate :: MainState
 131 initstate = State (initsubastate, initsubbstate)
 132
 133 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
 134 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
 135   where
 136     -- Only update the substates if the corresponding clock has an up
 137     -- transition.
 138     sa' = case event of
 139       ClockUpA -> sa''
 140       _ -> sa
 141     sb' = case event of
 142       ClockUpB -> sb''
 143       _ -> sb
 144     -- Always call suba and subb, so we can always have a value for our output
 145     -- ports.
 146     (sa'', outa) = suba inpa sa
 147     (sb'', outb) = subb inpb sb
 148
 149 type SubAState = State ...
 150 suba :: Word -> SubAState -> (SubAState, Word)
 151 suba = ...
 152
 153 type SubBState = State ...
 154 subb :: Word -> SubAState -> (SubAState, Word)
 155 subb = ...
 156 \stophaskell
 157
 158 Note that in the above example the \hs{suba} and \hs{subb} functions are
 159 \emph{always} called, to get at their combinatoric outputs. The new state
 160 is calculated as well, but only saved when the right clock has an up
 161 transition.
 162
 163 As you can see, there is some code duplication in the case statement that
 164 selects the right clock. One of the advantages of an explicit approach like
 165 this, is that some of this duplication can be extracted away into helper
 166 functions. For example, we could imagine a \hs{select_clock} function, which
 167 takes a stateful function that is not aware of events, and changes it into a
 168 function that only updates its state on a specific (clock) event.
 169
 170 \starthaskell
 171 select_clock :: Event
 172                 -> (input -> State s -> (State s, output))
 173                 -> (input -> State s -> Event -> (State s, output))
 174 select_clock clock func inp state event = (state', out)
 175   where
 176     state' = if clock == event then state'' else state
 177     (state'', out) = func inp state
 178
 179 main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
 180 main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
 181   where
 182     (sa'', outa) = (select_clock ClockUpA suba) inpa sa event
 183     (sb'', outb) = (select_clock ClockUpB subb) inpb sb event
 184 \stophaskell
 185
 186 As you can see, this can greatly reduce the length of the main function, while
 187 increasing the readability. As you might also have noted, the select\_clock
 188 function takes any stateful function from the current Cλash prototype and
 189 turns it into an event-aware function!
 190
 191 Going along this route for more complex timing behaviour seems promising,
 192 espicially since it seems possible to express very advanced timing behaviours,
 193 while still allowing simple functions without any extra overhead when complex
 194 behaviour is not needed.
 195
 196 The main cost of this approach will probably be extra complexity in the
 197 compiler: The paths (state) data can take become very non-trivial, and it
 198 is probably hard to properly analyze these paths and produce the intended VHDL
 199 description.
 200
 201 \section{Multiple cycle descriptions}
 202 In the current Cλash prototype, every description is a single-cycle
 203 description. In other words, every function describes behaviour for each
 204 separate cycle and any recursion (or folds, maps, etc.) is expanded in space.
 205
 206 Sometimes, you might want to have a complex description that could possibly
 207 take multiple cycles. Some examples include:
 208
 209 \startitemize
 210   \item Some kind of map or fold over a list that could be expanded in time
 211   instead of space. This would result in a function that describes n cycles
 212   instead of just one, where n is the lenght of the list.
 213   \item A large combinatoric expressions that would introduce a very long
 214   combinatoric path and thus limit clock frequency. Such an expression could
 215   be broken up into multiple stages, which effectively results in a pipelined
 216   system (see also \at{section}[sec:future:pipelining]) with a known delay.
 217   There should probably be some way for the developer to specify the cycle
 218   division of the expression, since automatically deciding on such a division
 219   is too complex and contains too many tradeoffs, at least initially.
 220   \item Unbounded recursion. It is not possible to expand unbounded (or even
 221   recursion with a depth that is not known at compile time) in space, since
 222   there is no way to tell how much hardware to create (it might even be
 223   infinite).
 224
 225   When expanding infinite recursion over time, each step of the recursion can
 226   simply become a single clockcycle. When expanding bounded but unknown
 227   recursion, we probably need to add an extra data valid output bit or
 228   something similar.
 229 \stopitemize
 230
 231 Apart from translating each of these multiple cycle descriptions into a per-cycle
 232 description, we also need to somehow match the type signature of the multiple
 233 cycle description to the type signature of the single cycle description that
 234 the rest of the system expects (assuming that the rest of the system is
 235 described in the \quote{normal} per-cycle manner). For example, an infinitely
 236 recursive expression typically has the return type \lam{[a]}, while the rest
 237 of the system would expect just \lam{a} (since the recursive expression
 238 generates just a single element each cycle).
 239
 240 Naively, this matching could be done using a (builtin) function with a
 241 signature like \lam{[a] -> a}, which also serves as an indicator to the
 242 compiler that some expanding over time is required. However, this poses a
 243 problem for simulation: How will our Haskell implementation of this magical
 244 builtin function know which element of the input list to return. This
 245 obviously depends on the current cycle number, but there is no way for this
 246 function to know the current cycle without breaking all kinds of safety and
 247 purity properties. Making this function have some state input and output could
 248 help, though this state is not present in the actual hardware (or perhaps
 249 there is some state, if there are value passed from one recursion step to the
 250 next, but how can the type of that state be determined by the typechecker?).
 251
 252 It seems that this is the most pressing problem for multicycle descriptions:
 253 How to interface with the rest of the system, without sacrificing safety and
 254 simulation capabilities?
 255
 256 \section{Higher order values in state}
 257 Another interesting idea is putting a higher order value inside a function's
 258 state value. Since we can use higher order values anywhere, why not in the
 259 state?
 260
 261 As a (contrived) example, consider the following accumulator:
 262
 263 \starthaskell
 264 -- The accumulator function that takes a word and returns a new accumulator
 265 -- and the result so far. This is the function we want to put inside the
 266 -- state.
 267 type Acc = Word -> (Acc, Word)
 268 acc = \a -> (\b -> acc ( a + b ), a )
 269
 270 main :: Word -> State Acc -> (State Acc, Word)
 271 main a s = (State s', out)
 272   where (s', out) = s a
 273 \stophaskell
 274
 275 This code uses a function as its state, which implicitely contains the value
 276 accumulated so far. This is a fairly trivial example, that is more easy to
 277 write with a simple \hs{Word} state value, but for more complex descriptions
 278 this style might pay off. Note that in a way we are using the
 279 \emph{continuation passing style} of writing functions, where we pass a
 280 sort of \emph{continuation} from one cycle to the next.
 281
 282 Our normalization process completely removes higher order values inside a
 283 function by moving applications and higher order values around so that every
 284 higher order value will eventually be full applied. However, if we would put a
 285 higher order value inside our state, the path from the higher order value
 286 definition to its application runs through the state, which is external to the
 287 function. A higher order value defined in one cycle is not applied until a
 288 later cycle. Our normalization process only works within the function, so it
 289 cannot remove this use of a higher order value.
 290
 291 However, we cannot leave a higher order value in our state value, since it is
 292 impossible to generate a register containing a higher order value, we simply
 293 can't translate a function type to a hardware type. To solve this, we must
 294 replace the higher order value inside our state with some other value
 295 representing these higher order values.
 296
 297 On obvious approach here is to use an algebraic datatype where each
 298 constructor represents one possible higher order value that can end up in the
 299 state and each constructor has an argument for each free variable of the
 300 higher order value replaced. This allows us to completely reconstruct the
 301 higher order value at the spot where it is applied, and thus the higher order
 302 value disappears.
 303
 304 This approach is commonly known as the \quote{Reynolds approach to
 305 defuntionalization}, first described by J.C. Reynolds and seems to apply well
 306 to this situation. One note here is that Reynolds' approach puts all the
 307 higher order values in a single datatype. For a typed language, we will at
 308 least have to have a single datatype for each function type, since we can't
 309 mix them. It would be even better to split these datatypes a bit further, so
 310 that such a datatype will never hold a constructor that is never used for a
 311 particular state variable. This separation is probably a non-trivial problem,
 312 though.
 313
 314 TODO: Reference "Definitional interpreters for higher-order programming
 315 languages":
 316 http://portal.acm.org/citation.cfm?id=805852\&dl=GUIDE\&coll=GUIDE\&CFID=58835435\&CFTOKEN=81856623
 317
 318 \section{New language}
 319 During the development of the prototype, it has become clear that Haskell is
 320 not the perfect language for this work. There are two main problems:
 321 \startitemize
 322 \item Haskell is not expressive enough. Haskell is still quite new in the area
 323 of dependent typing, something we use extensively. Also, Haskell has some
 324 special syntax to write monadic composition and arrow composition, which is
 325 well suited to normal Haskell programs. However, for our hardware
 326 descriptions, we could use some similar, but other, syntactic sugar (as we
 327 have seen above).
 328 \item Haskell is too expressive. There are some things that Haskell can
 329 express, but we can never properly translate. There are certain types (both
 330 primitive types and certain type constructions like recursive types) we can
 331 never translate, as well as certain forms of recursion.
 332 \stopitemize
 333
 334 It might be good to reevaluate the choice of language for Cλash, perhaps there
 335 are other languages which are better suited to describe hardware, now that
 336 we've learned a lot about it.
 337
 338 Perhaps Haskell (or some other language) can be extended by using a
 339 preprocessor. There has been some (point of proof) work on a the Strathclyde
 340 Haskell Enhancement (\small{SHE}) preprocessor for type-level programming,
 341 perhaps we can extend that, or create something similar for hardware-specific
 342 notation.
 343
 344 It is not unlikely that the most flexible way
 345 forward would be to define a completely new language with exactly the needed
 346 features. This is of course an enormous effort, which should not be taken
 347 lightly.