Options: Misuse the do notation in Haskell, create some abstraction in
Haskell or add new syntax.
-\section{Improved notation or abstraction for pipelining}
+\section[sec:future:pipelining]{Improved notation or abstraction for pipelining}
Since pipelining is a very common optimization for hardware systems, it should
be easy to specify a pipelined system. Since it involves quite some registers
into an otherwise regular combinatoric system, we might look for some way to
Using monadic do does not fit the typing
Using custom combinators would work
+
+\section{Recursion}
+The main problems of recursion have been described in
+\at{section}[sec:recursion]. In the current implementation, recursion is
+therefore not possible, instead we rely on a number of implicit list-recursive
+builtin functions.
+
+Since recursion is a very important and central concept in functional
+programming, it would very much improve the flexibility and elegance of our
+hardware descriptions if we could support full recursion.
+
+For this, there are two main problems to solve:
+
+\startitemize
+\item For list recursion, how to make a description type check? It is quite
+possible that improvements in the \small{GHC} typechecker will make this
+possible, though it will still stay a challenge. Further advances in
+dependent typing support for Haskell will probably help here as well.
+
+TODO: Reference Christiaan and other type-level work
+(http://personal.cis.strath.ac.uk/conor/pub/she/)
+\item For all recursion, there is the obvious challenge of deciding when
+recursion is finished. For list recursion, this might be easier (Since the
+base case of the recursion influences the type signatures). For general
+recursion, this requires a complete set of simplification and evaluation
+transformations to prevent infinite expansion. The main challenge here is how
+to make this set complete, or at least define the constraints on possible
+recursion which guarantee it will work.
+
+TODO: Loop unrolling
+\stopitemize
+
+\section{Multiple clock domains and asynchronicity}
+Cλash currently only supports synchronous systems with a single clock domain.
+In a lot of real-world systems, both of these limitations pose problems.
+
+There might be asynchronous events to which a system wants to respond. The
+most obvious asynchronous event is of course a reset signal. Though a reset
+signal can be synchronous, that is less flexible (and a hassle to describe in
+Cλash, currently). Since every function in Cλash describes the behaviour on
+each cycle boundary, we really can't fit in asynchronous behaviour easily.
+
+Due to the same reason, multiple clock domains cannot be supported. There is
+currently no way for the compiler to know in which clock domain a function
+should operate and since the clock signal is never explicit, there is also no
+way to express circuits that synchronize various clock domains.
+
+A possible way to express more complex timing behaviour would be to make
+functions more generic event handlers, where the system generates a stream of
+events (Like \quote{clock up}, \quote{clock down}, \quote{input A changed},
+\quote{reset}, etc.). When working with multiple clock domains, each domain
+could get its own events.
+
+As an example, we would have something like the following:
+
+\starthaskell
+data Event = ClockUp | Reset | ...
+
+type MainState = State Word
+
+-- Initial state
+initstate :: MainState
+initstate = State 0
+
+main :: Word -> Event -> MainState -> (MainState, Word)
+main inp event (State acc) = (State acc', acc')
+ where
+ acc' = case event of
+ -- On a reset signal, reset the accumulator and output
+ Reset -> initstate
+ -- On a normal clock cycle, accumulate the result of func
+ ClockUp -> acc + (func inp event)
+ -- On any other event, leave state and output unchanged
+ _ -> acc
+
+-- func is some combinatoric expression
+func :: Word -> Event -> Word
+func inp _ = inp * 2 + 3
+\stophaskell
+
+TODO: Picture
+
+In this example, we see that every function takes an input of type
+\hs{Event}. The function \hs{main} that takes the output of
+\hs{func} and accumulates it on every clock cycle. On a reset signal, the
+accumulator is reset. The function \hs{func} is just a combinatoric
+function, with no synchronous elements. We can see this because there is no
+state and the event input is completely ignored. If the compiler is smart
+enough, we could even leave the event input out for functions that don't use
+it, either because they are completely combinatoric (like in this example), or
+because they rely on the the caller to select the clock signal.
+
+This structure is similar to the event handling structure used to perform I/O
+in languages like Amanda. TODO: Ref. There is a top level case expression that
+decides what to do depending on the current input event.
+
+A slightly more complex example that shows a system with two clock domains.
+There is no real integration between the clock domains (there is one input and
+one output for each clock domain), but it does show how multiple clocks can be
+distinguished.
+
+\starthaskell
+data Event = ClockUpA | ClockUpB | ...
+
+type MainState = State (SubAState, SubBState)
+
+-- Initial state
+initstate :: MainState
+initstate = State (initsubastate, initsubbstate)
+
+main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
+main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
+ where
+ -- Only update the substates if the corresponding clock has an up
+ -- transition.
+ sa' = case event of
+ ClockUpA -> sa''
+ _ -> sa
+ sb' = case event of
+ ClockUpB -> sb''
+ _ -> sb
+ -- Always call suba and subb, so we can always have a value for our output
+ -- ports.
+ (sa'', outa) = suba inpa sa
+ (sb'', outb) = subb inpb sb
+
+type SubAState = State ...
+suba :: Word -> SubAState -> (SubAState, Word)
+suba = ...
+
+type SubBState = State ...
+subb :: Word -> SubAState -> (SubAState, Word)
+subb = ...
+\stophaskell
+
+Note that in the above example the \hs{suba} and \hs{subb} functions are
+\emph{always} called, to get at their combinatoric outputs. The new state
+is calculated as well, but only saved when the right clock has an up
+transition.
+
+As you can see, there is some code duplication in the case statement that
+selects the right clock. One of the advantages of an explicit approach like
+this, is that some of this duplication can be extracted away into helper
+functions. For example, we could imagine a \hs{select_clock} function, which
+takes a stateful function that is not aware of events, and changes it into a
+function that only updates its state on a specific (clock) event.
+
+\starthaskell
+select_clock :: Event
+ -> (input -> State s -> (State s, output))
+ -> (input -> State s -> Event -> (State s, output))
+select_clock clock func inp state event = (state', out)
+ where
+ state' = if clock == event then state'' else state
+ (state'', out) = func inp state
+
+main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
+main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
+ where
+ (sa'', outa) = (select_clock ClockUpA suba) inpa sa event
+ (sb'', outb) = (select_clock ClockUpB subb) inpb sb event
+\stophaskell
+
+As you can see, this can greatly reduce the length of the main function, while
+increasing the readability. As you might also have noted, the select\_clock
+function takes any stateful function from the current Cλash prototype and
+turns it into an event-aware function!
+
+Going along this route for more complex timing behaviour seems promising,
+espicially since it seems possible to express very advanced timing behaviours,
+while still allowing simple functions without any extra overhead when complex
+behaviour is not needed.
+
+The main cost of this approach will probably be extra complexity in the
+compiler: The paths (state) data can take become very non-trivial, and it
+is probably hard to properly analyze these paths and produce the intended VHDL
+description.
+
+\section{Multiple cycle descriptions}
+In the current Cλash prototype, every description is a single-cycle
+description. In other words, every function describes behaviour for each
+separate cycle and any recursion (or folds, maps, etc.) is expanded in space.
+
+Sometimes, you might want to have a complex description that could possibly
+take multiple cycles. Some examples include:
+
+\startitemize
+ \item Some kind of map or fold over a list that could be expanded in time
+ instead of space. This would result in a function that describes n cycles
+ instead of just one, where n is the lenght of the list.
+ \item A large combinatoric expressions that would introduce a very long
+ combinatoric path and thus limit clock frequency. Such an expression could
+ be broken up into multiple stages, which effectively results in a pipelined
+ system (see also \at{section}[sec:future:pipelining]) with a known delay.
+ There should probably be some way for the developer to specify the cycle
+ division of the expression, since automatically deciding on such a division
+ is too complex and contains too many tradeoffs, at least initially.
+ \item Unbounded recursion. It is not possible to expand unbounded (or even
+ recursion with a depth that is not known at compile time) in space, since
+ there is no way to tell how much hardware to create (it might even be
+ infinite).
+
+ When expanding infinite recursion over time, each step of the recursion can
+ simply become a single clockcycle. When expanding bounded but unknown
+ recursion, we probably need to add an extra data valid output bit or
+ something similar.
+\stopitemize
+
+Apart from translating each of these multiple cycle descriptions into a per-cycle
+description, we also need to somehow match the type signature of the multiple
+cycle description to the type signature of the single cycle description that
+the rest of the system expects (assuming that the rest of the system is
+described in the \quote{normal} per-cycle manner). For example, an infinitely
+recursive expression typically has the return type \lam{[a]}, while the rest
+of the system would expect just \lam{a} (since the recursive expression
+generates just a single element each cycle).
+
+Naively, this matching could be done using a (builtin) function with a
+signature like \lam{[a] -> a}, which also serves as an indicator to the
+compiler that some expanding over time is required. However, this poses a
+problem for simulation: How will our Haskell implementation of this magical
+builtin function know which element of the input list to return. This
+obviously depends on the current cycle number, but there is no way for this
+function to know the current cycle without breaking all kinds of safety and
+purity properties. Making this function have some state input and output could
+help, though this state is not present in the actual hardware (or perhaps
+there is some state, if there are value passed from one recursion step to the
+next, but how can the type of that state be determined by the typechecker?).
+
+It seems that this is the most pressing problem for multicycle descriptions:
+How to interface with the rest of the system, without sacrificing safety and
+simulation capabilities?
+
+\section{Higher order values in state}
+Another interesting idea is putting a higher order value inside a function's
+state value. Since we can use higher order values anywhere, why not in the
+state?
+
+As a (contrived) example, consider the following accumulator:
+
+\starthaskell
+-- The accumulator function that takes a word and returns a new accumulator
+-- and the result so far. This is the function we want to put inside the
+-- state.
+type Acc = Word -> (Acc, Word)
+acc = \a -> (\b -> acc ( a + b ), a )
+
+main :: Word -> State Acc -> (State Acc, Word)
+main a s = (State s', out)
+ where (s', out) = s a
+\stophaskell
+
+This code uses a function as its state, which implicitely contains the value
+accumulated so far. This is a fairly trivial example, that is more easy to
+write with a simple \hs{Word} state value, but for more complex descriptions
+this style might pay off. Note that in a way we are using the
+\emph{continuation passing style} of writing functions, where we pass a
+sort of \emph{continuation} from one cycle to the next.
+
+Our normalization process completely removes higher order values inside a
+function by moving applications and higher order values around so that every
+higher order value will eventually be full applied. However, if we would put a
+higher order value inside our state, the path from the higher order value
+definition to its application runs through the state, which is external to the
+function. A higher order value defined in one cycle is not applied until a
+later cycle. Our normalization process only works within the function, so it
+cannot remove this use of a higher order value.
+
+However, we cannot leave a higher order value in our state value, since it is
+impossible to generate a register containing a higher order value, we simply
+can't translate a function type to a hardware type. To solve this, we must
+replace the higher order value inside our state with some other value
+representing these higher order values.
+
+On obvious approach here is to use an algebraic datatype where each
+constructor represents one possible higher order value that can end up in the
+state and each constructor has an argument for each free variable of the
+higher order value replaced. This allows us to completely reconstruct the
+higher order value at the spot where it is applied, and thus the higher order
+value disappears.
+
+This approach is commonly known as the \quote{Reynolds approach to
+defuntionalization}, first described by J.C. Reynolds and seems to apply well
+to this situation. One note here is that Reynolds' approach puts all the
+higher order values in a single datatype. For a typed language, we will at
+least have to have a single datatype for each function type, since we can't
+mix them. It would be even better to split these datatypes a bit further, so
+that such a datatype will never hold a constructor that is never used for a
+particular state variable. This separation is probably a non-trivial problem,
+though.
+
+TODO: Reference "Definitional interpreters for higher-order programming
+languages":
+http://portal.acm.org/citation.cfm?id=805852\&dl=GUIDE\&coll=GUIDE\&CFID=58835435\&CFTOKEN=81856623
+
+\section{New language}
+During the development of the prototype, it has become clear that Haskell is
+not the perfect language for this work. There are two main problems:
+\startitemize
+\item Haskell is not expressive enough. Haskell is still quite new in the area
+of dependent typing, something we use extensively. Also, Haskell has some
+special syntax to write monadic composition and arrow composition, which is
+well suited to normal Haskell programs. However, for our hardware
+descriptions, we could use some similar, but other, syntactic sugar (as we
+have seen above).
+\item Haskell is too expressive. There are some things that Haskell can
+express, but we can never properly translate. There are certain types (both
+primitive types and certain type constructions like recursive types) we can
+never translate, as well as certain forms of recursion.
+\stopitemize
+
+It might be good to reevaluate the choice of language for Cλash, perhaps there
+are other languages which are better suited to describe hardware, now that
+we've learned a lot about it.
+
+Perhaps Haskell (or some other language) can be extended by using a
+preprocessor. There has been some (point of proof) work on a the Strathclyde
+Haskell Enhancement (\small{SHE}) preprocessor for type-level programming,
+perhaps we can extend that, or create something similar for hardware-specific
+notation.
+
+It is not unlikely that the most flexible way
+forward would be to define a completely new language with exactly the needed
+features. This is of course an enormous effort, which should not be taken
+lightly.