--- /dev/null
+Language Design
+===============
+A central question is, what do we want our language to look like? Deciding to
+use haskell would be a first step, but that still leaves a lot of
+possibilities. However, why would we want to use Haskell?
+
+ * Lots of existing support: Language specification, compilers, libraries,
+ userbase. All of this reduces the workload to implement something.
+ * All of Haskell is usable in simulation / testing: When not compiling to
+ VHDL, but when doing functional tests, we can use all of Haskell's syntax,
+ libraries, functions, etc. even when we don't support all of it in our
+ compiler.
+ * Haskell is probably expressive enough to be able to express a lot of
+ different things and different approaches. By defining new (algebraic)
+ types and operators, we can extend this even further. Finally, we could
+ even use Template Haskell to do almost anything (The ForSyDe project uses
+ Template Haskell extensively).
+
+Typing
+------
+What kind of types will we be using in our programs? The base type should be
+some kind of bit type, but there are two main options for these:
+
+ * Use a single bit as a type. Calculations on these types represent the
+ calculations in a single time slice (ie, during one clock cycle). Any
+ state in the calculation is explicit, since the input state must be a
+ function argument and the output state part of the function's value.
+
+ Through the state values, these functions are closely tied to the clock
+ domain in use (or whatever other model is used for the time domain). Ie,
+ having different clock domains is non-trivial and will require
+ modification to the function.
+
+ Composition can be done by simply connection functions together. This does
+ require that the "state" of a function contains the states of each of the
+ function it calls. These are extracted from its state argument and passed
+ on to the callees, while the states returned are again inserted into the
+ resulting state of the outer function.
+
+ This approach is the closest to normal functional programming and shows
+ state very explicitly.
+
+ * Use a stream of bits as a type. Each value models a continuous (infinite)
+ stream of bits. Functions have streams as arguments and value and state
+ could probably be implicit in the calculation or could be made explicit
+ using delay elements (this is what Lava does).
+
+ The primitive operations all operate on streams, which seems the
+ fundamental difference between this and the previous approach.
+
+ This approach shows structure more explicitly when delay elements are
+ used, since the registers end up exactly there.
+
+It is probably possible to mix these approaches. In particular, the first
+approache eventually needs to do some I/O, which can be modeled as recursion
+over an infinite list of inputs resulting in an infinite list of outputs. This
+can be done at the top level always, but it is perhaps possible to move this
+recursion a bit downward and do composition based on stream functions instead
+of "single clock" functions. One problem that comes to mind is that one needs
+to guarantee that a function does not look into the future (ie, uses multiple
+input elements to generate one output element) and the state becomes a lot
+less explicit. This needs more thought.
+
+Implementation
+==============
+Assume we will want to translate a (subset of) haskell into something else.
+We will need to do parsing, typechecking, interpretation, etc. A few options
+for this are available:
+
+ * Doing everything manually, possibly using a parser generator.
+ Lot's of work. Requires us to write down the haskell grammar, create a
+ parser, do all kinds of funky type checking -> Too much work.
+ * Use the Language.Haskell library and write something in haskell. This
+ library contains a Lexer, Parser and AST, but still does not do any type
+ checking, desugaring and simplification. Still lots of work.
+ * Hack into an existing compiler. By writing a backend for an existing
+ compiler, we can save most of the haskell-specific work. This requires a
+ compiler that is modular enough to have its backend replaced by another
+ implementation. Also, the border between backend and frontend must be
+ suited for the work we would like to do. This approach would use the
+ original compiler's driver for the most part
+ * Use components from an existing compiler. By reusing selected components
+ and creating an own driver that uses these components, we can work with
+ the haskell source without doing all the work ourselves. This requires a
+ compiler that is very modular, so we can decide exactly which parts we
+ would like to use and which we don't need.
+
+After some looking around it seems GHC provides a consistent and (as of
+version 6.10) documented API into its internals This allows us to write
+
+ core <- compileToCoreSimplified "adder.hs"
+
+to compile the file `adder.hs` into a version in GHC's simplified CoreSyn
+representation. This is essentially an AST with type annotations, all
+syntactic sugar removed, redundant expressions simplified, etc. This seems to
+be a very useful form to work with. However, GHC provides other functions to
+access other parts of the compiler, so if we would like to access the source
+at another stage, or do some custom procesing in between, this is also
+possible (albeit a bit more complex than this).
+
+Problems
+--------
+Some problems encountered or expected when trying to do some initial
+translator hackup:
+
+ * Signal naming. In the generated VHDL the signals and ports need names.
+ These can of course be generated, but if we want to do some actual
+ debugging, we want useful signal names. For input ports generated from
+ function arguments, the binder name could perhaps be used, but that is not
+ always possible when patterns are more complex.
+
+ For output ports, a name could be deduced when the output value is always
+ computed in a where or let clause. However the simplification phase
+ removes these clauses where possible, so this information is lost.
+
+ Perhaps some way of explicitely labeling ports could be used.
+ * Type classes and universal qualification types are a powerful way of
+ making functions generic, which greatly increases their reusability.
+ However, this extra genericity makes the resulting Core representation
+ more complex. Since the actual types used are modeled as extra function
+ arguments, this shouldn't be too hard, but this is something that probably
+ needs implementation right away (because a lot of builtin haskell
+ constructs, such as tuples, use it).
+
+<!-- vim: set sw=2 sts=2 ts=8: -->
--- /dev/null
+How to model hardware?
+======================
+So far, we've had some ideas about how to model hardware using a functional
+language. The general idea is that every function that models hardware has
+some input signals, some "input" state and a few output signals and "output"
+state. But how do we represent this exactly in our haskell code?
+
+The output part is probably easiest, since a function always returns a single
+value. Since it needs to return both output signals and a state, it seems
+obvious to make it always return a 2-tuple containing the state and output
+signals. So:
+
+ circuit :: ? -> (State, Signals) (1)
+
+It's interesting to note that the State and Signal types here are not really
+important when simulating, so let's look at the inputs first.
+
+Again, we have both input signals and a state to pass to the function. It
+seems consistent to again do this a single tuple, so a circuit becomes:
+
+ circuit :: (State, Signals) -> (State, Signals) (2)
+
+I've not given this variant a lot of thought yet, but i have the feeling this
+is not all to useful. The variant I've been using so far separates the state
+and signals into separate arguments:
+
+ circuit :: Signals -> State -> (State, Signals) (4)
+
+Note that this uses a seemingly inconsistent order for the arguments, but
+that's mostly because the current implementation uses this order (and wasn't
+well thought out). The exact order is mostly relevant for partial application,
+but I'm not quite sure if that's possible or useful at all.
+
+This variant makes it easy to define a dependent type for all circuits, which
+you pass the input, state and output types as an argument:
+
+ type Circuit a s b = a -> s -> (s, b) (5)
+
+A last variant would be to separate each (or perhaps only some) input signals
+into different arguments, such that each input has its own argument. For
+example, for a two input port circuit:
+
+ circuit :: Signal -> Signal -> State -> (State, Signals) (6)
+
+This has the obvious advantage that it could support partial application of
+the input ports (ie, connect a few input pins to a circuit and pass the
+resulting function elsewhere). I'm not sure how that would or should work out,
+but it seems useful.
+
+However, this partial application seems only possible with input ports, which
+might limit its usefuleness. Also, this type signature makes it harder to
+define a single Circuit type that catches all circuits as with (4) and (5).
+This might make simulation more tricky, though it's probably not critical.
+
+Signals (or Ports?)
+-------------------
+Now, regardless of which of the above approaches we choose, there should be
+some way to model these signals and ports. So far, a basic Bit signal type
+has been used:
+
+ data Bit = High | Low | DontCare (7)
+
+For now, this is the only primitive type supported by the (Haskell-to-VHDL)
+translator, but as pointed out above the simulator doesn't really care about
+these types (as long as the input provided is of the right type). Supporting
+native Haskell types (or perhaps our own variants, *e.g.*, an Int with a fixed
+bit width) is probably needed later on as well.
+
+To allow for multiple input and output ports in a single value, tuples can be
+used. This allows for grouping multiple signals while still allowing each
+signal to have a different type.
+
+An alternative that has not been explored yet is the use of an algebraic type
+to group signals. This seems to have the direct advantage of allowing names
+for each element to be specified, but does add some overhead in type
+declarations. It's likely that a combination of tuples and algebraic types are
+best.
+
+It appears that lists might become useful to model VHDL (bit)vectors, since
+both lists and vectors have elements of the same type. This does require that
+the length of a list can be deduced at compile time.
+
+DontCare
+--------
+In the above definition of the Bit datatype a DontCare element was present.
+The idea behind this element was to assign it anywhere the value of the bit
+shouldn't matter (in the alu example, this is but on the input signals that
+should be ignored). Since it is not used in any other place (*e.g.*, patterns
+usually only match Low and High, including the display functions), Haskell
+should error out as soon as a DontCare value is actually used. Due to the lazy
+nature of Haskell's evaluation, this should never happen as long as the value
+is not really used. But as soon as the value is (indirectly) needed, *i.e.*,
+contributes to the output of a circuit, Haskell errors out.
+
+Some functions ignore some inputs. For example, hwor is defined as:
+
+ High `hwor` _ = High (8)
+ _ `hwor` High = High
+ Low `hwor` Low = Low
+
+This means that High `hwxor` DontCare will still be High (with no error), but
+that DontCare `hwand` DontCare will immediately throw an error. This seems to
+be desired behaviour, but this needs more investigation.
+
+State
+-----
+For state, we can mostly do the same reasoning as for signals (from the
+circuit description point of view, input state and output state is hardly any
+differen from input signals and output signals). For combining multiple
+elements of state (either because a circuit has multiple Bits of state or
+because it combines the states of each of its subcircuits), tuples can be
+used. Algebraic types might have a use here as well.
+
+Stateless circuits
+------------------
+Above we've concentrated on stateful circuits, but probably a lot of circuits
+will be stateless. This can simply be modeled by having an empty state
+(*i.e.*, () ), but that's not very elegant. However, not all of the above
+approaches are usable for supporting stateless circuits. For example, if we
+adapt (2) to be stateless:
+
+ stateless_circuit :: Signals -> Signals (9)
+
+Now, considering that we allowed Signals to be a tuple of multiple signals,
+the following two circuit types are indistinguishable, since the State and
+Signal types are the same.
+
+ stateless_circuit :: (Signal, Signal) -> (Signal, Signal) (10)
+ stateful_circuit :: (State, Signal) -> (State, Signal)
+
+Something similar goes for (6), where leaving out the state for stateless
+circuits would make a stateless circuit with two inputs and two output
+indistinguishable from a stateful circuit with a single input and a single
+output.
+
+(4) above seems best suited for adapting for stateless circuits, since it
+always has two arguments (inputs and state). A stateless variant would always
+have a single argument.
+
+Examples
+--------
+To get a feeling for things, here are some example using approach (6).
+
+Some stateless adders:
+
+ -- Combinatoric stateless no-carry adder
+ -- A -> B -> S
+ no_carry_adder :: (Bit, Bit) -> Bit
+ no_carry_adder (a, b) = a `hwxor` b
+
+ -- Combinatoric stateless half adder
+ -- A -> B -> (S, C)
+ half_adder :: (Bit, Bit) -> (Bit, Bit)
+ half_adder (a, b) =
+ ( a `hwxor` b, a `hwand` b )
+
+ -- Combinatoric stateless full adder
+ -- (A, B, C) -> (S, C)
+ full_adder :: (Bit, Bit, Bit) -> (Bit, Bit)
+ full_adder (a, b, cin) = (s, c)
+ where
+ s = a `hwxor` b `hwxor` cin
+ c = a `hwand` b `hwor` (cin `hwand` (a `hwxor` b))
+
+A simple four bit cyclic shift register, which has an input that can be used
+to toggle (xor) the first position.
+
+ type ShifterState = [Bit]
+ shifter :: Bit -> ShifterState -> (ShifterState, Bit)
+ shifter i s =
+ (s', o)
+ where
+ s' = (o `hwxor` i) : (init s)
+ o = last s
+
+An implementation of a simple ALU (supporting only two operations: or & and on
+two one-bit inputs) combined with a register bank containing two one-bit
+registers. It also contains a list of inputs (the program), an initial state
+and some wrappers around the simulation functions (main and mainIO).
+
+ main = Sim.simulate exec program initial_state
+ mainIO = Sim.simulateIO exec initial_state
+
+ program = [
+ -- (addr, we, op)
+ (High, Low, High), -- z = r1 and t (0) ; t = r1 (1)
+ (Low, Low, Low), -- z = r0 or t (1); t = r0 (0)
+ (Low, High, DontCare), -- r0 = z (1)
+ (High, Low, High), -- z = r1 and t (0); t = r1 (1)
+ (High, High, DontCare) -- r1 = z (0)
+ ]
+
+ initial_state = ((Low, High), (), Low, Low)
+
+ -- Register bank
+
+ type RegAddr = Bit
+ type RegisterBankState = (Bit, Bit)
+ register_bank ::
+ (RegAddr, Bit, Bit) -> -- (addr, we, d)
+ RegisterBankState -> -- s
+ (RegisterBankState, Bit) -- (s', o)
+
+ register_bank (Low, Low, _) s = -- Read r0
+ (s, fst s)
+
+ register_bank (High, Low, _) s = -- Read r1
+ (s, snd s)
+
+ register_bank (addr, High, d) s = -- Write
+ (s', DontCare)
+ where
+ (r0, r1) = s
+ r0' = if addr == Low then d else r0
+ r1' = if addr == High then d else r1
+ s' = (r0', r1')
+
+ -- ALU
+
+ type AluState = ()
+ type AluOp = Bit
+
+ alu :: (AluOp, Bit, Bit) -> AluState -> (AluState, Bit)
+ alu (High, a, b) s = ((), a `hwand` b)
+ alu (Low, a, b) s = ((), a `hwor` b)
+
+ type ExecState = (RegisterBankState, AluState, Bit, Bit)
+ exec :: (RegAddr, Bit, AluOp) -> ExecState -> (ExecState, ())
+
+ -- Read & Exec
+ exec (addr, Low, op) s =
+ (s', ())
+ where
+ (reg_s, alu_s, t, z) = s
+ (reg_s', t') = register_bank (addr, Low, DontCare) reg_s
+ (alu_s', z') = alu (op, t', t) alu_s
+ s' = (reg_s', alu_s', t', z')
+
+ -- Write
+ exec (addr, High, op) s =
+ (s', ())
+ where
+ (reg_s, alu_s, t, z) = s
+ (reg_s', _) = register_bank (addr, High, z) reg_s
+ s' = (reg_s', alu_s, t, z)
--- /dev/null
+What is a function?
+===================
+In general, a function is something providing a mapping from a number of input
+arguments to a number of results. Each of these arguments and results can be
+normal values, but also again functions.
+
+Function application is evaluating the mapping for a given set of input
+operands.
+
+Looking at this from a hardware modeling perspective, we get different
+results. Then, a function is essentially a template for a circuit, with each
+of the arguments and results representing input and output ports. Optionally,
+some of the arguments and results (always matched pairs) can become part of
+the state, meaning that they generate a register that connects to the argument
+and result concerned.
+
+Function application, then, is embedding the function's circuit template into
+the callers circuit template, connecting any arguments and results to the
+circuit's ports. This embedding will completely decouple the applied function
+from the original function: The circuit may look alike, but different
+applications have no connection or shared components in the final hardware.
+
+This is an important property, since it allows the caller complete freedom in
+inlining, simplifying and changing any functions it applies. Or more
+specifically, a caller can clone a function and modify the clone freely
+(since it is the only caller of it). Since every application is unrelated to
+all others, cloning a function has no extra cost whatsoever (unlike function
+duplication in software, where the code size greatly increases).
+
+The above view works only for functions accepting and returning primitive
+values, *i.e.*, that can be represented as (possibly multiple) VHDL signals
+directly. In particular, functions cannot be used as arguments or returned
+from functions.
+
+High order functions and partial evaluation
+===========================================
+Before wondering how we implement high order functions and partial evaluation,
+let's see where we would like to use them.
+
+Modifying function signature
+----------------------------
+A simple application of high order functions is modifying a function's
+signature. An example of this is the (un)curry function. In hardware, a
+stateless function could be used to make a stateless function conform to the
+standard signature and make it synthesisable / simulatable:
+
+ stateless :: (a -> b) -> (a -> () -> ((), b))
+ stateless f i s = (s, f i)
+
+This would be used to make some function foo statefull as follows:
+
+ stateful_foo = stateless foo
+
+Another example would be to swap the order of inputs:
+
+ swap_in :: ((a, b) -> o) -> ((b, a) -> o)
+ swap_in f (b, a) = f (a, b)
+
+Both of these examples take a function as argument, but return plain values
+(though swap_in could return a high order function when the function passed in
+has multiple arguments).
+
+Reusing common code
+-------------------
+When modeling an ALU, we could imagine generalizing the ALU operations and
+making the control code independent of the actual operation.
+
+The following alu models a circuit with three input: A one bit opcode and two
+inputs. The opcode selects between two operations, bitwis and and bitwise or.
+
+ and_or_alu :: (Bit, Bit, Bit) -> BIt
+ and_or_alu (o, a, b) =
+ if o == Low
+ then hwand a b
+ else hwor a b
+
+Now, we might want to generalize this to be independent of the actual
+operations peformed:
+
+ type operation = Bit -> Bit -> Bit
+ alu :: operation -> operation -> (Bit, Bit, Bit) -> Bit
+
+ alu op1 op2 (o, a, b) =
+ if o == Low
+ then op1 a b
+ else op2 a b
+
+Now, we can write the above specific alu as:
+
+ and_or_alu = alu and or
+
+Parameterized circuits
+----------------------
+This is probably a variant on the above, but using literal parameters instead
+of functions as parameters.
+
+Finding a good example of this seems hard, since a lot of parameterized
+circuits are implicitly parameterized using the length of lists in their
+input, output or state.
+
+A possible example could be a program counter, that is a incrementing
+register. The amount by which to increment the register depends on the
+instruction word size, which you might want to make generic.
+
+ pc :: Int -> Word32 -> (Word32, Word32)
+ pc n s =
+ (s', o)
+ where
+ s' = s + n
+ o = s
+
+Since this example is only useful when using some kind of integers instead of
+just Bits, this assumes Word32 can actually be used.
+
+Partial application
+-------------------
+Say we have some encryption circuit, that encrypts a stream by xor'ing it with
+some key stream.
+
+ crypt :: (s -> (s, Bit)) -> Bit -> s -> (s, Bit)
+ crypt f i s =
+ (s', o)
+ where
+ (s', key) = f s
+ o = hwxor i key
+
+Now, we have some function taking in some source of entropy (modeled as a
+signal of type Foo) and uses that to generate a (pseudo) random keystream. The
+implementation of this function is not so important. Also, the type KeyState
+represents the state of mkKey, but what it is exactly doesn't really matter.
+
+ mkKey :: Foo -> KeyState -> (KeyState, Bit)
+
+Now, I can create a full crypt system as follows. This function has a Foo and
+Bit input signals and a Bit output signal.
+
+ crypt_full :: Foo -> Bit -> KeyState -> (KeyState, Bit)
+ crypt_full foo i s
+ crypt keygen i s
+ where
+ keygen = mkKey foo
+
+Now, this example isn't particularly good, since it would be just as easy to
+simply make crypt take a Bit as input instead of a function generating bits
+(This is probably also how you would synthesize something like this). Perhaps
+more useful examples could be found, where passing a function makes notation
+a lot easier than passing the function's (remaining) inputs and outputs
+around.
+
+Preconnecting ports
+-------------------
+Sometimes it is useful to connect some ports of multiple similar pieces
+of hardware to the same signal. For example, the following builds a
+multibit multiplexer by mapping a bunch of single bit ones together.
+Since all multiplexers use the same select input, we pass it to mplex_1
+and then pass the result to map. This makes the code clearer than
+replicating the select input (length abs) times and using zipWith
+instead of map.
+
+ mplex_1 :: Bit -> (Bit, Bit) -> Bit
+ mplex_1 enable select (a, b) =
+ if select == Low then a else b
+
+ mplex :: Bit -> [(Bit, Bit)] -> [Bit]
+ mplex select abs =
+ map (mplex_1 select) abs
+
+State ambiguity
+===============
+Previously, we've concluded that choosing to translate an argument or result
+to either a state variable or port would be possible by requiring a specific
+type signature on the top level function and hierarchically looking which
+variables are thus also state variables.
+
+In practice, this appears harder than we thought. We can observe that a state
+variable should resolve to a register, but where exactly is undefined. In
+particular, we could simply translate all arguments and results to ports, and
+on the highest level instantiate registers for all state variables.
+Functionally, this is completely correct, but this is probably not what we
+want in most cases. Each function that has state will probably want to have
+that state represented inside it's architecture, rather than to have two ports
+which should be connected to a register externally.
+
+This means we need to "push" the registers down the hierarchy to get them
+where they "belong". Somewhere down the line, however, a state variable will
+eventually be used in some way, often by passing it to another function
+(*i.e.*, connecting the output of the register to some other circuit). The
+place where this happens is also the place where you want the register to end
+up.
+
+In a lot of cases, this place is also as far as you can push the register down
+(when you pass the current register value to some function a, but bind the new
+register value to the result of some function b, you can't push further down
+since both a and b use the register). The tricky part is now when you pass a
+state value to some function a and that same function a produces the new value
+of the state value. Is the state value now really a state of the function a,
+or did the programmer just mean to connect a register to function a
+externally?
+
+Note that this assumes some clever implementation for finding out if the state
+can be pushed down, which takes things into account like "is the current state
+only used once", "is the new state directly produced by the application that
+also uses the state", "which state value in the arguments belongs to which
+value in the function result". Actually implementing this will probably need
+some extra thought, initially having all the state at the top level is
+probably fine.
+
+It's probable that this case will not occur that often, and some extra
+heuristics might help to guess the right thing more often. Adding some "force
+state here" primitive will probably possible as well for final disambiguation.
+Still, it would nicer to really solve this problem somehow.