+Something similar to the state boilerplate removal above might be appropriate:
+Abstract away some of the boilerplate code using combinators, then hide away
+the combinators in special syntax. The combinators will be slightly different,
+since there is a (typing) distinction between a pipeline stage and a pipeline
+consisting of multiple stages. Also, it seems necessary to treat either the
+first or the last pipeline stage differently, to prevent an off-by-one error
+in the amount of registers (which is similar to the extra \hs{()} state type
+in \in{example}[ex:DoState], which is harmless there, but would be a problem
+if it introduced an extra, useless, pipeline stage).
+
+This problem is slightly more complex than the problem we have seen before. One
+significant difference is that each variable that crosses a stage boundary
+needs a register. However, when a variable crosses multiple stage boundaries,
+it must be stored for a longer period and should receive multiple registers.
+Since we cannot find out from the combinator code where the result of the
+combined values is used (at least not without using Template Haskell to
+inspect the \small{AST}), there seems to be no easy way to find how much
+registers are needed.
+
+There seem to be two obvious ways of handling this problem:
+
+\startitemize
+ \item Limit the scoping of each variable produced by a stage to the next
+ stage only. This means that any variables that are to be used in subsequent
+ stages should be passed on explicitly, which should allocate the required
+ number of registers.
+
+ This produces cumbersome code, where there is still a lot of explicitness
+ (though this could be hidden in syntax sugar).
+ \todo{The next sentence is unclear}
+ \item Scope each variable over every subsequent pipeline stage and allocate
+ the maximum number of registers that \emph{could} be needed. This means we
+ will allocate registers that are never used, but those could be optimized
+ away later. This does mean we need some way to introduce a variable number
+ of variables (depending on the total number of stages), assign the output of
+ a different register to each (\eg., a different part of the state) and scope
+ a different one of them over each the subsequent stages.
+
+ This also means that when adding a stage to an existing pipeline will change
+ the state type of each of the subsequent pipeline stages, and the state type
+ of the added stage depends on the number of subsequent stages.
+
+ Properly describing this will probably also require quite explicit syntax,
+ meaning this is not feasible without some special syntax.
+\stopitemize
+
+Some other interesting issues include pipeline stages which are already
+stateful, mixing pipelined with normal computation, etc.
+
+\section{Recursion}
+The main problems of recursion have been described in
+\in{section}[sec:recursion]. In the current implementation, recursion is
+therefore not possible, instead we rely on a number of implicitly list-recursive
+built-in functions.
+
+Since recursion is a very important and central concept in functional
+programming, it would very much improve the flexibility and elegance of our
+hardware descriptions if we could support (full) recursion.
+
+For this, there are two main problems to solve:
+
+\startitemize
+\item For list recursion, how to make a description type check? It is quite
+possible that improvements in the \small{GHC} type-checker will make this
+possible, though it will still stay a challenge. Further advances in
+dependent typing support for Haskell will probably help here as well.
+
+\todo{Reference Christiaan and other type-level work
+(http://personal.cis.strath.ac.uk/conor/pub/she/)}
+\item For all recursion, there is the obvious challenge of deciding when
+recursion is finished. For list recursion, this might be easier (Since the
+base case of the recursion influences the type signatures). For general
+recursion, this requires a complete set of simplification and evaluation
+transformations to prevent infinite expansion. The main challenge here is how
+to make this set complete, or at least define the constraints on possible
+recursion that guarantee it will work.
+
+\todo{Reference Christian for loop unrolling?}
+\stopitemize
+
+\section{Multiple clock domains and asynchronicity}
+Cλash currently only supports synchronous systems with a single clock domain.
+In a lot of real-world systems, both of these limitations pose problems.
+
+There might be asynchronous events to which a system wants to respond. The
+most obvious asynchronous event is of course a reset signal. Though a reset
+signal can be synchronous, that is less flexible (and a hassle to describe in
+Cλash, currently). Since every function in Cλash describes the behavior on
+each cycle boundary, we really cannot fit in asynchronous behavior easily.
+
+Due to the same reason, multiple clock domains cannot be easily supported. There is
+currently no way for the compiler to know in which clock domain a function
+should operate and since the clock signal is never explicit, there is also no
+way to express circuits that synchronize various clock domains.
+
+A possible way to express more complex timing behavior would be to make
+functions more generic event handlers, where the system generates a stream of
+events (Like \quote{clock up}, \quote{clock down}, \quote{input A changed},
+\quote{reset}, etc.). When working with multiple clock domains, each domain
+could get its own clock events.
+
+\startbuffer[AsyncDesc]
+data Event = ClockUp | Reset | ...
+
+type MainState = State Word
+
+-- Initial state
+initstate :: MainState
+initstate = State 0
+
+main :: Word -> Event -> MainState -> (MainState, Word)
+main inp event (State acc) = (State acc', acc')
+ where
+ acc' = case event of
+ -- On a reset signal, reset the accumulator and output
+ Reset -> initstate
+ -- On a normal clock cycle, accumulate the result of func
+ ClockUp -> acc + (func inp event)
+ -- On any other event, leave state and output unchanged
+ _ -> acc
+
+-- func is some combinational expression
+func :: Word -> Event -> Word
+func inp _ = inp * 2 + 3
+\stopbuffer
+
+\placeexample[][ex:AsyncDesc]{Hardware description using asynchronous events.}
+ {\typebufferhs{AsyncDesc}}
+
+\todo{Picture}
+
+\in{Example}[ex:AsyncDesc] shows a simple example of this event-based
+approach. In this example we see that every function takes an input of
+type \hs{Event}. The function \hs{main} that takes the output of
+\hs{func} and accumulates it on every clock cycle. On a reset signal,
+the accumulator is reset. The function \hs{func} is just a combinational
+function, with no synchronous elements. We can see this because there
+is no state and the event input is completely ignored. If the compiler
+is smart enough, we could even leave the event input out for functions
+that do not use it, either because they are completely combinational (like
+in this example), or because they rely on the the caller to select the
+clock signal.
+
+This structure is similar to the event handling structure used to perform I/O
+in languages like Amanda. \todo{ref} There is a top level case expression that
+decides what to do depending on the current input event.
+
+A slightly more complex example is show in
+\in{example}[ex:MulticlockDesc]. It shows a system with two clock
+domains. There is no real integration between the clock domains in this
+example (there is one input and one output for each clock domain), but
+it does show how multiple clocks can be distinguished.
+
+\startbuffer[MulticlockDesc]
+data Event = ClockUpA | ClockUpB | ...
+
+type MainState = State (SubAState, SubBState)
+
+-- Initial state
+initstate :: MainState
+initstate = State (initsubastate, initsubbstate)
+
+main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
+main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
+ where
+ -- Only update the substates if the corresponding clock has an up
+ -- transition.
+ sa' = case event of
+ ClockUpA -> sa''
+ _ -> sa
+ sb' = case event of
+ ClockUpB -> sb''
+ _ -> sb
+ -- Always call suba and subb, so we can always have a value for our output
+ -- ports.
+ (sa'', outa) = suba inpa sa
+ (sb'', outb) = subb inpb sb
+
+type SubAState = State ...
+suba :: Word -> SubAState -> (SubAState, Word)
+suba = ...
+
+type SubBState = State ...
+subb :: Word -> SubAState -> (SubAState, Word)
+subb = ...
+\stopbuffer
+
+\placeexample[][ex:MulticlockDesc]{Hardware description with multiple clock domains through events.}
+ {\typebufferhs{MulticlockDesc}}
+
+Note that in \in{example}[ex:MulticlockDesc] the \hs{suba} and \hs{subb}
+functions are \emph{always} called, to get at their combinational
+outputs. The new state is calculated as well, but only saved when the
+right clock has an up transition.
+
+As you can see, there is some code duplication in the case expression that
+selects the right clock. One of the advantages of an explicit approach like
+this, is that some of this duplication can be extracted away into helper
+functions. For example, we could imagine a \hs{select_clock} function, which
+takes a stateful function that is not aware of events, and changes it into a
+function that only updates its state on a specific (clock) event. Such a
+function is shown in \in{example}[ex:SelectClock].
+
+\startbuffer[SelectClock]
+select_clock :: Event
+ -> (input -> State s -> (State s, output))
+ -> (input -> State s -> Event -> (State s, output))
+select_clock clock func inp state event = (state', out)
+ where
+ state' = if clock == event then state'' else state
+ (state'', out) = func inp state
+
+main :: Word -> Word -> Event -> MainState -> (MainState, Word, Word)
+main inpa inpb event (State (sa, sb)) = (State (sa', sb'), outa, outb)
+ where
+ (sa'', outa) = (select_clock ClockUpA suba) inpa sa event
+ (sb'', outb) = (select_clock ClockUpB subb) inpb sb event
+\stopbuffer
+\placeexample[][ex:SelectClock]{A function to filter clock events.}
+ {\typebufferhs{SelectClock}}
+
+As you can see, this can greatly reduce the length of the main function, while
+increasing the readability. As you might also have noted, the select\_clock
+function takes any stateful function from the current Cλash prototype and
+turns it into an event-aware function!
+
+Going along this route for more complex timing behavior seems promising,
+especially since it seems possible to express very advanced timing behaviors,
+while still allowing simple functions without any extra overhead when complex
+behavior is not needed.
+
+The main cost of this approach will probably be extra complexity in the
+compiler: the paths (state) data can take become very non-trivial, and it
+is probably hard to properly analyze these paths and produce the
+intended \VHDL\ description.
+
+\section{Multiple cycle descriptions}
+In the current Cλash prototype, every description is a single-cycle
+description. In other words, every function describes behavior for each
+separate cycle and any recursion (or folds, maps, etc.) is expanded in space.
+
+Sometimes, you might want to have a complex description that could possibly
+take multiple cycles. Some examples include:
+
+\startitemize
+ \item Some kind of map or fold over a list that could be expanded in time
+ instead of space. This would result in a function that describes n cycles
+ instead of just one, where n is the length of the list.
+ \item A large combinational expressions that would introduce a very long
+ combinational path and thus limit clock frequency. Such an expression could
+ be broken up into multiple stages, which effectively results in a pipelined
+ system (see also \in{section}[sec:future:pipelining]) with a known delay.
+ There should probably be some way for the developer to specify the cycle
+ division of the expression, since automatically deciding on such a division
+ is too complex and contains too many trade-offs, at least initially.
+ \item Unbounded recursion. It is not possible to expand unbounded (or even
+ recursion with a depth that is not known at compile time) in space, since
+ there is no way to tell how much hardware to create (it might even be
+ infinite).
+
+ When expanding infinite recursion over time, each step of the recursion can
+ simply become a single clock cycle. When expanding bounded but unknown
+ recursion, we probably need to add an extra data valid output bit or
+ something similar.
+\stopitemize
+
+Apart from translating each of these multiple cycle descriptions into a per-cycle
+description, we also need to somehow match the type signature of the multiple
+cycle description to the type signature of the single cycle description that
+the rest of the system expects (assuming that the rest of the system is
+described in the \quote{normal} per-cycle manner). For example, an infinitely
+recursive expression typically has the return type \lam{[a]}, while the rest
+of the system would expect just \lam{a} (since the recursive expression
+generates just a single element each cycle).
+
+Naively, this matching could be done using a (built-in) function with a
+signature like \lam{[a] -> a}, which also serves as an indicator to the
+compiler that some expanding over time is required. However, this poses a
+problem for simulation: how will our Haskell implementation of this magical
+built-in function know which element of the input list to return. This
+obviously depends on the current cycle number, but there is no way for this
+function to know the current cycle without breaking all kinds of safety and
+purity properties. Making this function have some state input and output could
+help, though this state is not present in the actual hardware (or perhaps
+there is some state, if there are value passed from one recursion step to the
+next, but how can the type of that state be determined by the type-checker?).
+
+It seems that this is the most pressing problem for multi-cycle descriptions:
+How to interface with the rest of the system, without sacrificing safety and
+simulation capabilities?
+
+\section{Higher order values in state}
+Another interesting idea is putting a higher-order value inside a function's
+state value. Since we can use higher-order values anywhere, why not in the
+state?
+
+\startbuffer[HigherAccum]
+-- The accumulator function that takes a word and returns a new accumulator
+-- and the result so far. This is the function we want to put inside the
+-- state.
+type Acc = Word -> (Acc, Word)
+acc = \a -> (\b -> acc ( a + b ), a )
+
+main :: Word -> State Acc -> (State Acc, Word)
+main a s = (State s', out)
+ where (s', out) = s a
+\stopbuffer
+\placeexample[][ex:HigherAccum]{An accumulator using a higher-order state.}
+ {\typebufferhs{HigherAccum}}
+
+As a (contrived) example, consider the accumulator in
+\in{example}[ex:HigherAccum]. This code uses a function as its state,
+which implicitly contains the value accumulated so far. This is a
+fairly trivial example, that is more easy to write with a simple
+\hs{Word} state value, but for more complex descriptions this style
+might pay off. Note that in a way we are using the \emph{continuation
+passing style} of writing functions, where we pass a sort of
+\emph{continuation} from one cycle to the next.
+
+Our normalization process completely removes higher-order values inside a
+function by moving applications and higher-order values around so that every
+higher-order value will eventually be full applied. However, if we would put a
+higher-order value inside our state, the path from the higher-order value
+definition to its application runs through the state, which is external to the
+function. A higher-order value defined in one cycle is not applied until a
+later cycle. Our normalization process only works within the function, so it
+cannot remove this use of a higher-order value.
+
+However, we cannot leave a higher-order value in our state value, since it is
+impossible to generate a register containing a higher-order value, we simply
+cannot translate a function type to a hardware type. To solve this, we must
+replace the higher-order value inside our state with some other value
+representing these higher-order values.
+
+On obvious approach here is to use an algebraic datatype where each
+constructor represents one possible higher-order value that can end up in the
+state and each constructor has an argument for each free variable of the
+higher-order value replaced. This allows us to completely reconstruct the
+higher-order value at the spot where it is applied, and thus the higher-order
+value disappears.
+
+This approach is commonly known as the \quote{Reynolds approach to
+defunctionalization}, first described by J.C. Reynolds \cite[reynolds98]\ and
+seems to apply well to this situation. One note here is that Reynolds'
+approach puts all the higher-order values in a single datatype. For a typed
+language, we will at least have to have a single datatype for each function
+type, since we cannot mix them. It would be even better to split these
+data-types a bit further, so that such a datatype will never hold a constructor
+that is never used for a particular state variable. This separation is
+probably a non-trivial problem, though.
+
+\section{Don't care values}
+ A powerful value in \VHDL\ is the \emph{don't care} value, given as
+ \type{'-'}. This value tells the compiler that you do not really care about
+ which value is assigned to a signal, allowing the compiler to make some
+ optimizations. Since choice in hardware is often implemented using
+ a collection of logic gates instead of multiplexers only, synthesizers can
+ often reduce the amount of hardware needed by smartly choosing values for
+ these don't care cases.
+
+ There is not really anything comparable with don't care values in normal
+ programming languages. The closest thing is an undefined or uninitialized
+ value, though those are usually associated with error conditions.
+
+ It would be useful if Cλash also has some way to specify don't care values.
+ When looking at the Haskell typing system, there are really two ways to do
+ this:
+
+ \startitemize
+ \item Add a special don't care value to every datatype. This includes the
+ obvious \hs{Bit} type, but will also need to include every user defined
+ type. An exception can be made for vectors and product types, since those
+ can simply use the don't care values of the types they contain.
+
+ This would also require some kind of \quote{Don't careable} type class
+ that allows each type to specify what its don't care value is. The
+ compiler must then recognize this constant and replace it with don't care
+ values in the final \VHDL\ code.
+
+ This is of course a very intrusive solution. Every type must become member
+ of this type class, and there is now some member in every type that is a
+ special don't care value. Guaranteeing the obvious don't care semantics
+ also becomes harder, since every pattern match or case expressions must now
+ also take care of the don't care value (this might actually be an
+ advantage, since it forces designers to specify how to handle don't care
+ for different operations).
+
+ \item Use the special \hs{undefined}, or \emph{bottom} value present in
+ Haskell. This is a type that is member of all types automatically, without
+ any explicit declarations.
+
+ Any expression that requires evaluation of an undefined value
+ automatically becomes undefined itself (or rather, there is some exception
+ mechanism). Since Haskell is lazy, this means that whenever it tries to
+ evaluate undefined, it is apparently required for determining the output
+ of the system. This property is useful, since typically a don't care
+ output is used when some output is not valid and should not be read. If
+ it is in fact not read, it should never be evaluated and simulation should
+ be fine.
+
+ In practice, this works less ideal. In particular, pattern matching is not
+ always smart enough to deal with undefined. Consider the following
+ definition of an \hs{and} operator:
+
+ \starthaskell
+ and :: Bit -> Bit -> Bit
+ and Low _ = Low
+ and _ Low = Low
+ and High High = High
+ \stophaskell
+
+ When using the \hs{and} operation on an undefined (don't care) and a Low
+ value should always return a Low value. Its value does not depend on the
+ value chosen for the don't care value. However, though when applying the
+ above and function to \hs{Low} and \hs{undefined} results in exactly that
+ behavior, the result is \hs{undefined} when the arguments are swapped.
+ This is because the first pattern forces the first argument to be
+ evaluated. If it is \hs{undefined}, evaluation is halted and an exception
+ is show, which is not what is intended.
+ \stopitemize