+ There is one final limitation that is hard to express in the above
+ itemization. Whenever sub-states are extracted from the input state
+ to be passed to functions, the corresponding output sub-states
+ should be inserted into the output state in the same way. In other
+ words, each pair of corresponding sub-states in the input and
+ output states should be passed to / returned from the same called
+ function.
+
+ The prototype currently does not check much of the above
+ conditions. This means that if the conditions are violated,
+ sometimes a compile error is generated, but in other cases output
+ can be generated that is not valid \VHDL\ or at the very least does
+ not correspond to the input.
+
+ \subsection{Translating to \VHDL}
+ As noted above, the basic approach when generating \VHDL\ for stateful
+ functions is to generate a single register for every stateful function.
+ We look around the normal form to find the let binding that removes the
+ \lam{State} type renaming (using a cast). We also find the let binding that
+ adds a \lam{State} type. These are connected to the output and the input
+ of the generated let binding respectively. This means that there can
+ only be one let binding that adds and one that removes the \lam{State}
+ type. It is easy to violate this constraint. This problem is detailed in
+ \in{section}[sec:normalization:stateproblems].
+
+ This approach seems simple enough, but will this also work for more
+ complex stateful functions involving sub-states? Observe that any
+ component of a function's state that is a sub-state, \ie\ passed on as
+ the state of another function, should have no influence on the
+ hardware generated for the calling function. Any state-specific
+ \small{VHDL} for this component can be generated entirely within the
+ called function. So, we can completely ignore sub-states when
+ generating \VHDL\ for a function.
+
+ From this observation it might seem logical to remove the
+ sub-states from a function's states altogether and leave only the
+ state components which are actual states of the current function.
+ While doing this would not remove any information needed to
+ generate \small{VHDL} from the function, it would cause the
+ function definition to become invalid (since we will not have any
+ sub-state to pass to the functions anymore). We could solve the
+ syntactic problems by passing \type{undefined} for state
+ variables, but that would still break the code on the semantic
+ level (\ie, the function would no longer be semantically
+ equivalent to the original input).
+
+ To keep the function definition correct until the very end of the
+ process, we will not deal with (sub)states until we get to the
+ \small{VHDL} generation. Then, we are translating from Core to
+ \small{VHDL}, and we can simply generate no \VHDL for sub-states,
+ effectively removing them altogether.
+
+ But, how will we know what exactly is a sub-state? Since any state
+ argument or return value that represents state must be of the
+ \type{State} type, we can look at the type of a value. However, we
+ must be careful to ignore only \emph{sub-states}, and not a
+ function's own state.
+
+ For \in{example}[ex:AvgStateNormal] above, we should generate a register
+ with its output connected to \lam{s} and its input connected
+ to \lam{s'}. However, \lam{s'} is build up from both \lam{accs'} and
+ \lam{count'}, while only \lam{count'} should end up in the register.
+ \lam{accs'} is a sub-state for the \lam{acc} function, for which a
+ register will be created when generating \VHDL\ for the \lam{acc}
+ function.
+
+ Fortunately, the \lam{accs'} variable (and any other sub-state) has a
+ property that we can easily check: it has a \lam{State} type. This
+ means that whenever \VHDL\ is generated for a tuple (or other
+ algebraic type), we can simply leave out all elements that have a
+ \lam{State} type. This will leave just the parts of the state that
+ do not have a \lam{State} type themselves, like \lam{count'},
+ which is exactly a function's own state. This approach also means
+ that the state part of the result (\eg\ \lam{s'} in \lam{res}) is
+ automatically excluded when generating the output port, which is
+ also required.
+
+ We can formalize this translation a bit, using the following
+ rules.
+
+ \startitemize
+ \item A state unpack operation should not generate any \small{VHDL}.
+ The binder to which the unpacked state is bound should still be
+ declared, this signal will become the register and will hold the
+ current state.
+ \item A state pack operation should not generate any \small{VHDL}.
+ The binder to which the packed state is bound should not be
+ declared. The binder that is packed is the signal that will hold the
+ new state.
+ \item Any values of a State type should not be translated to
+ \small{VHDL}. In particular, State elements should be removed from
+ tuples (and other data-types) and arguments with a state type should
+ not generate ports.
+ \item To make the state actually work, a simple \small{VHDL}
+ (sequential) process should be generated. This process updates
+ the state at every clock cycle, by assigning the new state to the
+ current state. This will be recognized by synthesis tools as a
+ register specification.
+ \stopitemize
+
+ When applying these rules to the function \lam{avg} from
+ \in{example}[ex:AvgStateNormal], we be left with the description
+ in \in{example}[ex:AvgStateRemoved]. All the parts that do not
+ generate any \VHDL\ directly are crossed out, leaving just the
+ actual flow of values in the final hardware. To illustrate the
+ change of the types of \lam{s} and \lam{s'}, their types are also
+ shown.
+
+ \startbuffer[AvgStateRemoved]
+ avg = iλ.λ--spacked.--
+ let
+ s :: (--AccState,-- Word)
+ s = --spacked ▶ (AccState, Word)--
+ --accs = case s of (a, b) -> a--
+ count = case s of (--c,-- d) -> d
+ accres = acc i --accs--
+ --accs' = case accres of (e, f) -> e--
+ sum = case accres of (--g,-- h) -> h
+ count' = count + 1
+ o = sum / count'
+ s' :: (--AccState,-- Word)
+ s' = (--accs',-- count')
+ --spacked' = s' ▶ State (AccState, Word)--
+ res = (--spacked',-- o)
+ in
+ res
+ \stopbuffer
+ \placeexample[here][ex:AvgStateRemoved]{Normalized version of \in{example}[ex:AvgState] with ignored parts crossed out}
+ {\typebufferlam{AvgStateRemoved}}
+
+ When we actually leave out the crossed out parts, we get a slightly
+ weird program: there is a variable \lam{s} which has no value, and there
+ is a variable \lam{s'} that is never used. But together, these two will form
+ the state process of the function. \lam{s} contains the "current" state,
+ \lam{s'} is assigned the "next" state. So, at the end of each clock
+ cycle, \lam{s'} should be assigned to \lam{s}.
+
+ As an illustration of the result of this function,
+ \in{example}[ex:AccStateVHDL] and \in{example}[ex:AvgStateVHDL] show the the \VHDL\ that is
+ generated by Cλash from the examples is this section.
+
+ \startbuffer[AvgStateVHDL]
+ entity avgComponent_0 is
+ port (\izAlE2\ : in \unsigned_31\;
+ \foozAo1zAo12\ : out \(,)unsigned_31\;
+ clock : in std_logic;
+ resetn : in std_logic);
+ end entity avgComponent_0;
+
+
+ architecture structural of avgComponent_0 is
+ signal \szAlG2\ : \(,)unsigned_31\;
+ signal \countzAlW2\ : \unsigned_31\;
+ signal \dszAm62\ : \(,)unsigned_31\;
+ signal \sumzAmk3\ : \unsigned_31\;
+ signal \reszAnCzAnM2\ : \unsigned_31\;
+ signal \foozAnZzAnZ2\ : \unsigned_31\;
+ signal \reszAnfzAnj3\ : \unsigned_31\;
+ signal \s'zAmC2\ : \(,)unsigned_31\;
+ begin
+ \countzAlW2\ <= \szAlG2\.A;
+
+ \comp_ins_dszAm62\ : entity accComponent_1
+ port map (\izAob3\ => \izAlE2\,
+ \foozAoBzAoB2\ => \dszAm62\,
+ clock => clock,
+ resetn => resetn);
+
+ \sumzAmk3\ <= \dszAm62\.A;
+
+ \reszAnCzAnM2\ <= to_unsigned(1, 32);
+
+ \foozAnZzAnZ2\ <= \countzAlW2\ + \reszAnCzAnM2\;
+
+ \reszAnfzAnj3\ <= \sumzAmk3\ * \foozAnZzAnZ2\;
+
+ \s'zAmC2\.A <= \foozAnZzAnZ2\;
+
+ \foozAo1zAo12\.A <= \reszAnfzAnj3\;
+
+ state : process (clock, resetn)
+ begin
+ if resetn = '0' then
+ elseif rising_edge(clock) then
+ \szAlG2\ <= \s'zAmC2\;
+ end if;
+ end process state;
+ end architecture structural;
+ \stopbuffer
+
+ \startbuffer[AvgStateTypes]
+ package types is
+ subtype \unsigned_31\ is unsigned (0 to 31);
+
+ type \(,)unsigned_31\ is record
+ A : \unsigned_31\;
+ end record;
+ end package types;
+ \stopbuffer
+
+ \startbuffer[AccStateVHDL]
+ entity accComponent_1 is
+ port (\izAob3\ : in \unsigned_31\;
+ \foozAoBzAoB2\ : out \(,)unsigned_31\;
+ clock : in std_logic;
+ resetn : in std_logic);
+ end entity accComponent_1;
+
+ architecture structural of accComponent_1 is
+ signal \szAod3\ : \unsigned_31\;
+ signal \reszAonzAor3\ : \unsigned_31\;
+ begin
+ \reszAonzAor3\ <= \szAod3\ + \izAob3\;
+
+ \foozAoBzAoB2\.A <= \reszAonzAor3\;
+
+ state : process (clock, resetn)
+ begin
+ if resetn = '0' then
+ elseif rising_edge(clock) then
+ \szAod3\ <= \reszAonzAor3\;
+ end if;
+ end process state;
+ end architecture structural;
+ \stopbuffer
+
+ \placeexample[][ex:AvgStateTypes]{\VHDL\ types generated for \hs{acc} and \hs{avg} from \in{example}[ex:AvgState]}
+ {\typebuffervhdl{AvgStateTypes}}
+ \placeexample[][ex:AccStateVHDL]{\VHDL\ generated for \hs{acc} from \in{example}[ex:AvgState]}
+ {\typebuffervhdl{AccStateVHDL}}
+ \placeexample[][ex:AvgStateVHDL]{\VHDL\ generated for \hs{avg} from \in{example}[ex:AvgState]}
+ {\typebuffervhdl{AvgStateVHDL}}
+ \section{Prototype implementation}
+ The prototype has been implemented using Haskell as its
+ implementation language, just like \GHC. This allows the prototype
+ do directly use parts of \GHC\ through the \small{API} it exposes
+ (which essentially taps directly into the internals of \GHC, making
+ this \small{API} not really a stable interface).
+
+ Cλash can be run from a separate library, but has also been
+ integrated into \type{ghci} \cite[baaij09]. The latter does requires
+ a custom \GHC\ build, however.
+
+ The latest version and all history of the Cλash code can be browsed
+ on-line or retrieved using the \type{git} program.
+
+ http://git.stderr.nl/gitweb?p=matthijs/projects/cλash.git
+
+% \subsection{Initial state}
+% How to specify the initial state? Cannot be done inside a hardware
+% function, since the initial state is its own state argument for the first
+% call (unless you add an explicit, synchronous reset port).
+%
+% External init state is natural for simulation.
+%
+% External init state works for hardware generation as well.
+%
+% Implementation issues: state splitting, linking input to output state,
+% checking usage constraints on state variables.
+%
+%