From: Matthijs Kooijman <matthijs@stdin.nl>
Date: Wed, 11 Nov 2009 16:43:58 +0000 (+0100)
Subject: Expand the improved notation for state and pipelining sections.
X-Git-Tag: final-thesis~158
X-Git-Url: https://git.stderr.nl/gitweb?p=matthijs%2Fmaster-project%2Freport.git;a=commitdiff_plain;h=4a1874f15102d7ad201d9f1f7fa8800bebff4ffc

Expand the improved notation for state and pipelining sections.
---

diff --git a/Chapters/Future.tex b/Chapters/Future.tex
index 6be0ba8..f59e034 100644
--- a/Chapters/Future.tex
+++ b/Chapters/Future.tex
@@ -2,12 +2,162 @@
 \section{Improved notation for hierarchical state}
 The hierarchic state model requires quite some boilerplate code for unpacking
 and distributing the input state and collecting and repacking the output
-state. There is really only one way in which this state handling can be done,
-so it would make sense to hide this boilerplate. This would incur no
-flexibility cost at all, since there are no other ways that would work.
+state.
+
+\in{Example}[ex:NestedState] shows a simple composition of two stateful
+functions, \hs{funca} and \hs{funcb}. The state annotation using the
+\hs{State} newtype has been left out, for clarity and because the proposed
+approach below cannot handle that (yet).
+
+\startbuffer[NestedState]
+  type FooState = ( AState, BState )
+  foo :: Word -> FooState -> (FooState, Word)
+  foo in s = (s', outb)
+    where
+      (sa, sb) = s
+      (sa', outa) = funca in sa
+      (sb', outb) = funcb outa sb
+      s' = (sa', sb')
+\stopbuffer
+\placeexample[here][ex:NestedState]{Simple function composing two stateful
+                                    functions.}{\typebufferhs{NestedState}}
+
+Since state handling always follows strict rules, there is really no other way
+in which this state handling can be done (you can of course move some things
+around from the where clause to the pattern or vice versa, but it would still
+be effectively the same thing). This means it makes extra sense to hide this
+boilerplate away. This would incur no flexibility cost at all, since there are
+no other ways that would work.
+
+One particular notation in Haskell that seemed promising, whas he \hs{do}
+notation. This is meant to simplify Monad notation by hiding away some
+details. It allows one to write a list of expressions, which are composited
+using the monadic \emph{bind} operator, written in Haskell as \hs{>>}. For
+example, the snippet:
 
-Options: Misuse the do notation in Haskell, create some abstraction in
-Haskell or add new syntax.
+\starthaskell
+do
+  somefunc a
+  otherfunc b
+\stophaskell
+
+will be desugared into:
+
+\starthaskell
+(somefunc a b) >> (otherfunc b c)
+\stophaskell
+
+There is also the \hs{>>=} operator, which allows for passing variables from
+one expression to the next. If we could use this notation to compose a
+stateful computation from a number of other stateful functions, this could
+move all the boilerplate code into the \hs{>>} operator. Perhaps the compiler
+should be taught to always inline the \hs{>>} operator, but after that there
+should be no further changes required to the compiler.
+
+This is highlights an important aspect of using a functional language for our
+descriptions: We can use the language itself to provide abstractions of common
+patterns, making our code smaller.
+
+\subsection{Outside the Monad}
+However, simply using the monad notation is not as easy as it sounds. The main
+problem is that the Monad type class poses a number of limitations on the
+bind operator \hs{>>}. Most importantly, it has the following type signature:
+
+\starthaskell
+(>>) :: (Monad m) => m a -> m b -> m b
+\stophaskell
+
+This means that any expression in our composition must use the same Monad
+instance as its type, only the "return" value can be different between
+expressions. 
+
+Ideally, we would like the \hs{>>} operator to have a type like the following
+\starthaskell
+type Stateful s r = s -> (s, r)
+(>>) :: Stateful s1 r1 -> Stateful s2 r2 -> Stateful (s1, s2) r2
+\stophaskell
+
+What we see here, is that when we compose two stateful functions (whose inputs
+have already been applied, leaving just the state argument to be applied), the
+result is again a stateful function whose state is composed of the two
+\emph{substates}. The return value is simply the return value of the second
+function, discarding the first (to preserve that one, the \hs{>>=} operator
+can be used).
+
+There is a trick we can apply to change the signature of the \hs{>>} operator.
+\small{GHC} does not require the bind operators to be part of the \hs{Monad}
+type class, as long as it can use them to translate the do notation. This
+means we can define our own \hs{>>} and \hs{>>=} operators, outside of the
+\hs{Monad} typeclass. This does conflict with the existing methods of the
+\hs{Monad} typeclass, so we should prevent \small{GHC} from loading those (and
+all of the Prelude) by passing \type{-XNoImplicitPrelude} to \type{ghc}. This
+is slightly inconvenient, but since we hardly using anything from the prelude,
+this is not a big problem. We might even be able to replace some of the
+Prelude with hardware-translateable versions by doing this.
+
+We can now define the following binding operators. For completeness, we also
+supply the return function, which is not called by \small{GHC} implicitely,
+but can be called explicitely by a hardware description.
+
+\starthaskell
+(>>) :: Stateful s1 r1 -> Stateful s2 r2 -> Stateful (s1, s2) r2
+f1 >> f2 = f1 >>= \_ -> f2
+
+(>>=) :: Stateful s1 r1 -> (r1 -> Stateful s2 r2) -> Stateful (s1, s2) r2
+f1 >>= f2 = \\(s1, s2) -> let (s1', r1) = f1 s1
+                             (s2', r2) = f2 r1 s2
+                         in ((s1', s2'), r2)
+               
+return :: r -> Stateful () r
+return r = \s -> (s, r)
+\stophaskell
+
+As you can see, this closely resembles the boilerplate of unpacking state,
+passing it to two functions and repacking the new state. With these
+definitions, we could have writting \in{example}[ex:NestedState] a lot
+shorter, see \in{example}[ex:DoState]. In this example the type signature of
+foo is the same (though it is now written using the \hs{Stateful} type
+synonym, it is still completel equivalent to the original: \hs{foo :: Word ->
+FooState -> (FooState, Word)}.
+
+Note that the \hs{FooState} type has changed (so indirectly the type of
+\hs{foo} as well). Since the state composition by the \hs{>>} works on two
+stateful functions at a time, the final state consists of nested two-tuples.
+The final \hs{()} in the state originates from the fact that the \hs{return}
+function has no real state, but is part of the composition. We could have left
+out the return statement (and the \hs{outb <-} part) to make \hs{foo}'s return
+value equal to \hs{funcb}'s, but this approach makes it clearer what is
+happening.
+
+\startbuffer[DoState]
+  type FooState = ( AState, (BState, ()) )
+  foo :: Word -> Stateful FooState Word
+  foo in = do
+      outa <- funca in sa
+      outb <- funcb outa sb
+      return outb
+\stopbuffer
+\placeexample[here][ex:DoState]{Simple function composing two stateful
+                                functions, using do notation.}
+                               {\typebufferhs{DoState}}
+
+An important implication of this approach is that the order of writing
+function applications affects the state type. Fortunately, this problem can be
+localized by consistently using type synonyms for state types, which should
+prevent changes in other function's source when a function changes.
+
+A less obvous implications of this approach is that the scope of variables
+produced by each of these expressions (using the \hs{<-} syntax) is limited to
+the expressions that come after it. This prevents values from flowing between
+two functions (components) in two directions. For most Monad instances, this
+is a requirement, but here it could have been different.
+
+\subsection{Alternative syntax}
+Because of the above issues, misusing Haskell's do notation is probably not
+the best solution here. However, it does show that using fairly simple
+abstractions, we could hide a lot of the boilerplate code. Extending
+\small{GHC} with some new syntax sugar similar to the do notation might be a
+feasible.
 
 \section[sec:future:pipelining]{Improved notation or abstraction for pipelining}
 Since pipelining is a very common optimization for hardware systems, it should
@@ -15,11 +165,53 @@ be easy to specify a pipelined system. Since it involves quite some registers
 into an otherwise regular combinatoric system, we might look for some way to
 abstract away some of the boilerplate for pipelining.
 
-Example of naive pipelined code
+Something similar to the state boilerplate removal above might be appropriate:
+Abstract away some of the boilerplate code using combinators, then hide away
+the combinators in special syntax. The combinators will be slightly different,
+since there is a (typing) distinction between a pipeline stage and a pipeline
+consisting of multiple stages. Also, it seems necessary to treat either the
+first or the last pipeline stage differently, to prevent an off-by-one error
+in the amount of registers (which is similar to the extra \hs{()} state type
+in \in{example}[ex:DoState], which is harmless there, but would be a problem
+if it introduced an extra, useless, pipeline stage).
+
+This problem is slightly more complex than the problem we've seen before. One
+significant difference is that each variable that crosses a stage boundary
+needs a registers. However, when a variable crosses multiple stage boundaries,
+it must be stored for a longer period and should receive multiple registers.
+Since we can't find out from the combinator code where the result of the
+combined values is used (at least not without using Template Haskell to
+inspect the \small{AST}), there seems to be no easy way to find how much
+registers are needed.
+
+There seem to be two obvious ways of handling this problem:
 
-Using monadic do does not fit the typing
+\startitemize
+  \item Limit the scoping of each variable produced by a stage to the next
+  stage only. This means that any variables that are to be used in subsequent
+  stages should be passed on explicitely, which should allocate the required
+  number of registers.
+
+  This produces cumbersome code, where there is still a lot of explicitness
+  (though this could be hidden in syntax sugar).
+  \item Scope each variable over every subsequent pipeline stage and allocate
+  the maximum amount of registers that \emph{could} be needed. This means we
+  will allocate registers that are never used, but those could be optimized
+  away later. This does mean we need some way to introduce a variable number
+  of variables (depending on the total number of stages), assign the output of
+  a different register to each (\eg., a different part of the state) and scope
+  a different one of them over each the subsequent stages.
+
+  This also means that when adding a stage to an existing pipeline will change
+  the state type of each of the subsequent pipeline stages, and the state type
+  ofthe added stage depends on the number of subsequent stages.
+
+  Properly describing this will probably also require quite explicit syntax,
+  meaning this is not feasible without some special syntax.
+\stopitemize
 
-Using custom combinators would work
+Some other interesting issues include pipeline stages which are already
+stateful, mixing pipelined with normal computation, etc.
 
 \section{Recursion}
 The main problems of recursion have been described in