Make the normal form use only recursive lets again.

[matthijs/master-project/report.git] / Chapters / Future.tex
diff --git a/Chapters/Future.tex b/Chapters/Future.tex

index 420771f0f862a964e456ea5cea7a98f96a7fad59..6be0ba8be0d465adc19063ffa2030c9779d016cd 100644 (file)
--- a/Chapters/Future.tex
+++ b/Chapters/Future.tex
@@ -9,7 +9,7 @@ flexibility cost at all, since there are no other ways that would work.
  Options: Misuse the do notation in Haskell, create some abstraction in
  Haskell or add new syntax.
  
-\section{Improved notation or abstraction for pipelining}
+\section[sec:future:pipelining]{Improved notation or abstraction for pipelining}
  Since pipelining is a very common optimization for hardware systems, it should
  be easy to specify a pipelined system. Since it involves quite some registers
  into an otherwise regular combinatoric system, we might look for some way to
@@ -20,3 +20,328 @@ Example of naive pipelined code
  Using monadic do does not fit the typing
  
  Using custom combinators would work
+
+\section{Recursion}
+The main problems of recursion have been described in
+\in{section}[sec:recursion]. In the current implementation, recursion is
+therefore not possible, instead we rely on a number of implicit list-recursive
+builtin functions.
+
+Since recursion is a very important and central concept in functional
+programming, it would very much improve the flexibility and elegance of our
+hardware descriptions if we could support full recursion.
+
+For this, there are two main problems to solve:
+
+\startitemize
+\item For list recursion, how to make a description type check? It is quite
+possible that improvements in the \small{GHC} typechecker will make this
+possible, though it will still stay a challenge. Further advances in
+dependent typing support for Haskell will probably help here as well.
+
+TODO: Reference Christiaan and other type-level work
+(http://personal.cis.strath.ac.uk/conor/pub/she/)
+\item For all recursion, there is the obvious challenge of deciding when
+recursion is finished. For list recursion, this might be easier (Since the
+base case of the recursion influences the type signatures). For general
+recursion, this requires a complete set of simplification and evaluation
+transformations to prevent infinite expansion. The main challenge here is how
+to make this set complete, or at least define the constraints on possible
+recursion which guarantee it will work.
+
+TODO: Loop unrolling
+\stopitemize
+
+\section{Multiple clock domains and asynchronicity}
+Cλash currently only supports synchronous systems with a single clock domain.
+In a lot of real-world systems, both of these limitations pose problems.
+
+There might be asynchronous events to which a system wants to respond. The
+most obvious asynchronous event is of course a reset signal. Though a reset
+signal can be synchronous, that is less flexible (and a hassle to describe in
+Cλash, currently). Since every function in Cλash describes the behaviour on
+each cycle boundary, we really can't fit in asynchronous behaviour easily.
+
+Due to the same reason, multiple clock domains cannot be supported. There is
+currently no way for the compiler to know in which clock domain a function
+should operate and since the clock signal is never explicit, there is also no
+way to express circuits that synchronize various clock domains.
+
+A possible way to express more complex timing behaviour would be to make
+functions more generic event handlers, where the system generates a stream of
+events (Like \quote{clock up}, \quote{clock down}, \quote{input A changed},
+\quote{reset}, etc.). When working with multiple clock domains, each domain
+could get its own events.
+
+As an example, we would have something like the following:
+
+\starthaskell
+data Event = ClockUp | Reset | ...
+
+type MainState = State Word
+
+-- Initial state
+initstate :: MainState
+initstate = State 0
+
+main :: Word -> Event -> MainState -> (MainState, Word)
+main inp event (State acc) = (State acc', acc')
+  where
+    acc' = case event of
+      -- On a reset signal, reset the accumulator and output
+      Reset -> initstate
+      -- On a normal clock cycle, accumulate the result of func
+      ClockUp -> acc + (func inp event)
+      -- On any other event, leave state and output unchanged
+      _ -> acc
+
+-- func is some combinatoric expression
+func :: Word -> Event -> Word
+func inp _ = inp * 2 + 3
+\stophaskell
+
+TODO: Picture
+
+In this example, we see that every function takes an input of type
+\hs{Event}. The function \hs{main} that takes the output of
+\hs{func} and accumulates it on every clock cycle. On a reset signal, the
+accumulator is reset. The function \hs{func} is just a combinatoric
+function, with no synchronous elements.  We can see this because there is no
+state and the event input is completely ignored. If the compiler is smart
+enough, we could even leave the event input out for functions that don't use
+it, either because they are completely combinatoric (like in this example), or
+because they rely on the the caller to select the clock signal.
+
+This structure is similar to the event handling structure used to perform I/O
+in languages like Amanda. TODO: Ref. There is a top level case expression that
+decides what to do depending on the current input event.
+
+A slightly more complex example that shows a system with two clock domains.
+There is no real integration between the clock domains (there is one input and
+one output for each clock domain), but it does show how multiple clocks can be
+distinguished.
+
+\starthaskell
+data Event = ClockUpA | ClockUpB | ...
+
+type MainState = State (SubAState, SubBState)
+
+-- Initial state
+initstate :: MainState
+initstate = State (initsubastate, initsubbstate)
+
+main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
+main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
+  where
+    -- Only update the substates if the corresponding clock has an up
+    -- transition.
+    sa' = case event of
+      ClockUpA -> sa''
+      _ -> sa
+    sb' = case event of
+      ClockUpB -> sb''
+      _ -> sb
+    -- Always call suba and subb, so we can always have a value for our output
+    -- ports.
+    (sa'', outa) = suba inpa sa
+    (sb'', outb) = subb inpb sb
+
+type SubAState = State ...
+suba :: Word -> SubAState -> (SubAState, Word)
+suba = ...
+
+type SubBState = State ...
+subb :: Word -> SubAState -> (SubAState, Word)
+subb = ...
+\stophaskell
+
+Note that in the above example the \hs{suba} and \hs{subb} functions are
+\emph{always} called, to get at their combinatoric outputs. The new state
+is calculated as well, but only saved when the right clock has an up
+transition.
+
+As you can see, there is some code duplication in the case statement that
+selects the right clock. One of the advantages of an explicit approach like
+this, is that some of this duplication can be extracted away into helper
+functions. For example, we could imagine a \hs{select_clock} function, which
+takes a stateful function that is not aware of events, and changes it into a
+function that only updates its state on a specific (clock) event.
+
+\starthaskell
+select_clock :: Event
+                -> (input -> State s -> (State s, output))
+                -> (input -> State s -> Event -> (State s, output))
+select_clock clock func inp state event = (state', out)
+  where
+    state' = if clock == event then state'' else state
+    (state'', out) = func inp state
+
+main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
+main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
+  where
+    (sa'', outa) = (select_clock ClockUpA suba) inpa sa event
+    (sb'', outb) = (select_clock ClockUpB subb) inpb sb event
+\stophaskell
+
+As you can see, this can greatly reduce the length of the main function, while
+increasing the readability. As you might also have noted, the select\_clock
+function takes any stateful function from the current Cλash prototype and
+turns it into an event-aware function!
+
+Going along this route for more complex timing behaviour seems promising,
+espicially since it seems possible to express very advanced timing behaviours,
+while still allowing simple functions without any extra overhead when complex
+behaviour is not needed.
+
+The main cost of this approach will probably be extra complexity in the
+compiler: The paths (state) data can take become very non-trivial, and it
+is probably hard to properly analyze these paths and produce the intended VHDL
+description.
+
+\section{Multiple cycle descriptions}
+In the current Cλash prototype, every description is a single-cycle
+description. In other words, every function describes behaviour for each
+separate cycle and any recursion (or folds, maps, etc.) is expanded in space.
+
+Sometimes, you might want to have a complex description that could possibly
+take multiple cycles. Some examples include:
+
+\startitemize
+  \item Some kind of map or fold over a list that could be expanded in time
+  instead of space. This would result in a function that describes n cycles
+  instead of just one, where n is the lenght of the list.
+  \item A large combinatoric expressions that would introduce a very long
+  combinatoric path and thus limit clock frequency. Such an expression could
+  be broken up into multiple stages, which effectively results in a pipelined
+  system (see also \in{section}[sec:future:pipelining]) with a known delay.
+  There should probably be some way for the developer to specify the cycle
+  division of the expression, since automatically deciding on such a division
+  is too complex and contains too many tradeoffs, at least initially.
+  \item Unbounded recursion. It is not possible to expand unbounded (or even
+  recursion with a depth that is not known at compile time) in space, since
+  there is no way to tell how much hardware to create (it might even be
+  infinite).
+
+  When expanding infinite recursion over time, each step of the recursion can
+  simply become a single clockcycle. When expanding bounded but unknown
+  recursion, we probably need to add an extra data valid output bit or
+  something similar.
+\stopitemize
+
+Apart from translating each of these multiple cycle descriptions into a per-cycle
+description, we also need to somehow match the type signature of the multiple
+cycle description to the type signature of the single cycle description that
+the rest of the system expects (assuming that the rest of the system is
+described in the \quote{normal} per-cycle manner). For example, an infinitely
+recursive expression typically has the return type \lam{[a]}, while the rest
+of the system would expect just \lam{a} (since the recursive expression
+generates just a single element each cycle).
+
+Naively, this matching could be done using a (builtin) function with a
+signature like \lam{[a] -> a}, which also serves as an indicator to the
+compiler that some expanding over time is required. However, this poses a
+problem for simulation: How will our Haskell implementation of this magical
+builtin function know which element of the input list to return. This
+obviously depends on the current cycle number, but there is no way for this
+function to know the current cycle without breaking all kinds of safety and
+purity properties. Making this function have some state input and output could
+help, though this state is not present in the actual hardware (or perhaps
+there is some state, if there are value passed from one recursion step to the
+next, but how can the type of that state be determined by the typechecker?).
+
+It seems that this is the most pressing problem for multicycle descriptions:
+How to interface with the rest of the system, without sacrificing safety and
+simulation capabilities?
+
+\section{Higher order values in state}
+Another interesting idea is putting a higher order value inside a function's
+state value. Since we can use higher order values anywhere, why not in the
+state?
+
+As a (contrived) example, consider the following accumulator:
+
+\starthaskell
+-- The accumulator function that takes a word and returns a new accumulator
+-- and the result so far. This is the function we want to put inside the
+-- state.
+type Acc = Word -> (Acc, Word)
+acc = \a -> (\b -> acc ( a + b ), a )
+
+main :: Word -> State Acc -> (State Acc, Word)
+main a s = (State s', out)
+  where (s', out) = s a
+\stophaskell
+
+This code uses a function as its state, which implicitely contains the value
+accumulated so far. This is a fairly trivial example, that is more easy to
+write with a simple \hs{Word} state value, but for more complex descriptions
+this style might pay off. Note that in a way we are using the
+\emph{continuation passing style} of writing functions, where we pass a
+sort of \emph{continuation} from one cycle to the next.
+
+Our normalization process completely removes higher order values inside a
+function by moving applications and higher order values around so that every
+higher order value will eventually be full applied. However, if we would put a
+higher order value inside our state, the path from the higher order value
+definition to its application runs through the state, which is external to the
+function. A higher order value defined in one cycle is not applied until a
+later cycle. Our normalization process only works within the function, so it
+cannot remove this use of a higher order value.
+
+However, we cannot leave a higher order value in our state value, since it is
+impossible to generate a register containing a higher order value, we simply
+can't translate a function type to a hardware type. To solve this, we must
+replace the higher order value inside our state with some other value
+representing these higher order values.
+
+On obvious approach here is to use an algebraic datatype where each
+constructor represents one possible higher order value that can end up in the
+state and each constructor has an argument for each free variable of the
+higher order value replaced. This allows us to completely reconstruct the
+higher order value at the spot where it is applied, and thus the higher order
+value disappears.
+
+This approach is commonly known as the \quote{Reynolds approach to
+defuntionalization}, first described by J.C. Reynolds and seems to apply well
+to this situation. One note here is that Reynolds' approach puts all the
+higher order values in a single datatype. For a typed language, we will at
+least have to have a single datatype for each function type, since we can't
+mix them. It would be even better to split these datatypes a bit further, so
+that such a datatype will never hold a constructor that is never used for a
+particular state variable. This separation is probably a non-trivial problem,
+though.
+
+TODO: Reference "Definitional interpreters for higher-order programming
+languages":
+http://portal.acm.org/citation.cfm?id=805852\&dl=GUIDE\&coll=GUIDE\&CFID=58835435\&CFTOKEN=81856623
+
+\section{New language}
+During the development of the prototype, it has become clear that Haskell is
+not the perfect language for this work. There are two main problems:
+\startitemize
+\item Haskell is not expressive enough. Haskell is still quite new in the area
+of dependent typing, something we use extensively. Also, Haskell has some
+special syntax to write monadic composition and arrow composition, which is
+well suited to normal Haskell programs. However, for our hardware
+descriptions, we could use some similar, but other, syntactic sugar (as we
+have seen above).
+\item Haskell is too expressive. There are some things that Haskell can
+express, but we can never properly translate. There are certain types (both
+primitive types and certain type constructions like recursive types) we can
+never translate, as well as certain forms of recursion.
+\stopitemize
+
+It might be good to reevaluate the choice of language for Cλash, perhaps there
+are other languages which are better suited to describe hardware, now that
+we've learned a lot about it.
+
+Perhaps Haskell (or some other language) can be extended by using a
+preprocessor. There has been some (point of proof) work on a the Strathclyde
+Haskell Enhancement (\small{SHE}) preprocessor for type-level programming,
+perhaps we can extend that, or create something similar for hardware-specific
+notation.
+
+It is not unlikely that the most flexible way
+forward would be to define a completely new language with exactly the needed
+features. This is of course an enormous effort, which should not be taken
+lightly.