let
-- Unpack the state by coercion (\eg, cast from
-- State (Word, Word) to (Word, Word))
- s = sp :: (Word, Word)
+ s = sp ▶ (Word, Word)
-- Extract both registers from the state
r1 = case s of (a, b) -> a
r2 = case s of (a, b) -> b
s' = (,) r1' r2'
-- pack the state by coercion (\eg, cast from
-- (Word, Word) to State (Word, Word))
- sp' = s' :: State (Word, Word)
+ sp' = s' ▶ State (Word, Word)
-- Pack our return value
res = (,) sp' out
in
- \subsection{Intended normal form definition}
+ \subsection[sec:normalization:intendednormalform]{Intended normal form definition}
Now we have some intuition for the normal form, we can describe how we want
the normal form to look like in a slightly more formal manner. The following
EBNF-like description completely captures the intended structure (and
\defref{intended normal form definition}
\todo{Fix indentation}
\startlambda
- \italic{normal} = \italic{lambda}
- \italic{lambda} = λvar.\italic{lambda} (representable(var))
+ \italic{normal} := \italic{lambda}
+ \italic{lambda} := λvar.\italic{lambda} (representable(var))
| \italic{toplet}
- \italic{toplet} = letrec [\italic{binding}...] in var (representable(varvar))
- \italic{binding} = var = \italic{rhs} (representable(rhs))
+ \italic{toplet} := letrec [\italic{binding}...] in var (representable(var))
+ \italic{binding} := var = \italic{rhs} (representable(rhs))
-- State packing and unpacking by coercion
- | var0 = var1 :: State ty (lvar(var1))
- | var0 = var1 :: ty (var0 :: State ty) (lvar(var1))
- \italic{rhs} = userapp
+ | var0 = var1 ▶ State ty (lvar(var1))
+ | var0 = var1 ▶ ty (var1 :: State ty ∧ lvar(var1))
+ \italic{rhs} := userapp
| builtinapp
-- Extractor case
| case var of C a0 ... an -> ai (lvar(var))
-- Selector case
| case var of (lvar(var))
- DEFAULT -> var0 (lvar(var0))
- C w0 ... wn -> resvar (\forall{}i, wi \neq resvar, lvar(resvar))
- \italic{userapp} = \italic{userfunc}
+ [ DEFAULT -> var ] (lvar(var))
+ C0 w0,0 ... w0,n -> var0
+ \vdots
+ Cm wm,0 ... wm,n -> varm (\forall{}i \forall{}j, wi,j \neq vari, lvar(vari))
+ \italic{userapp} := \italic{userfunc}
| \italic{userapp} {userarg}
- \italic{userfunc} = var (gvar(var))
- \italic{userarg} = var (lvar(var))
- \italic{builtinapp} = \italic{builtinfunc}
+ \italic{userfunc} := var (gvar(var))
+ \italic{userarg} := var (lvar(var))
+ \italic{builtinapp} := \italic{builtinfunc}
| \italic{builtinapp} \italic{builtinarg}
- \italic{builtinfunc} = var (bvar(var))
- \italic{builtinarg} = \italic{coreexpr}
+ \italic{builtinfunc} := var (bvar(var))
+ \italic{builtinarg} := var (representable(var) ∧ lvar(var))
+ | \italic{partapp} (partapp :: a -> b)
+ | \italic{coreexpr} (¬representable(coreexpr) ∧ ¬(coreexpr :: a -> b))
+ \italic{partapp} := \italic{userapp} | \italic{builtinapp}
\stoplambda
- \todo{Limit builtinarg further}
-
\todo{There can still be other casts around (which the code can handle,
e.g., ignore), which still need to be documented here}
- \todo{Note about the selector case. It just supports Bit and Bool
- currently, perhaps it should be generalized in the normal form? This is
- no longer true, btw}
-
When looking at such a program from a hardware perspective, the top level
- lambda's define the input ports. The variable referenc in the body of
+ lambda's define the input ports. The variable reference in the body of
the recursive let expression is the output port. Most function
applications bound by the let expression define a component
instantiation, where the input and output ports are mapped to local
In the following sections, we will be using a number of functions and
notations, which we will define here.
- \todo{Define substitution (notation)}
-
\subsubsection{Concepts}
A \emph{global variable} is any variable (binder) that is bound at the
top level of a program, or an external module. A \emph{local variable} is any
optimizations, but they are required to get our program into intended
normal form.
+ \placeintermezzo{}{
+ \startframedtext[width=8cm,background=box,frame=no]
+ \startalignment[center]
+ {\tfa Substitution notation}
+ \stopalignment
+ \blank[medium]
+
+ In some of the transformations in this chapter, we need to perform
+ substitution on an expression. Substitution means replacing every
+ occurence of some expression (usually a variable reference) with
+ another expression.
+
+ There have been a lot of different notations used in literature for
+ specifying substitution. The notation that will be used in this report
+ is the following:
+
+ \startlambda
+ E[A=>B]
+ \stoplambda
+
+ This means expression \lam{E} with all occurences of \lam{A} replaced
+ with \lam{B}.
+ \stopframedtext
+ }
+
+ \defref{beta-reduction}
\subsubsection[sec:normalization:beta]{β-reduction}
β-reduction is a well known transformation from lambda calculus, where it is
the main reduction step. It reduces applications of lambda abstractions,
sure that most lambda abstractions will eventually be reducable by
β-reduction.
+ Note that β-reduction also works on type lambda abstractions and type
+ applications as well. This means the substitution below also works on
+ type variables, in the case that the binder is a type variable and teh
+ expression applied to is a type.
+
\starttrans
(λx.E) M
-----------------
\transexample{beta}{β-reduction}{from}{to}
+ \startbuffer[from]
+ (λt.λa::t. a) @Int
+ \stopbuffer
+
+ \startbuffer[to]
+ (λa::Int. a)
+ \stopbuffer
+
+ \transexample{beta-type}{β-reduction for type abstractions}{from}{to}
+
\subsubsection{Empty let removal}
This transformation is simple: It removes recursive lets that have no bindings
(which usually occurs when unused let binding removal removes the last
\todo{Example}
- \subsubsection{Simple let binding removal}
+ \subsubsection[sec:normalization:simplelet]{Simple let binding removal}
This transformation inlines simple let bindings, that bind some
binder to some other binder instead of a more complex expression (\ie
a = b).
\startbuffer[from]
(+) :: Word -> Word -> Word
- (+) = GHC.Num.(+) @Word $dNum
+ (+) = GHC.Num.(+) @Word \$dNum
~
(+) a b
\stopbuffer
\startbuffer[to]
- GHC.Num.(+) @ Alu.Word $dNum a b
+ GHC.Num.(+) @ Alu.Word \$dNum a b
\stopbuffer
\transexample{toplevelinline}{Top level binding inlining}{from}{to}
of the other value definitions in let bindings and making the final
return value a simple variable reference.
- \subsubsection{η-abstraction}
+ \subsubsection[sec:normalization:eta]{η-abstraction}
This transformation makes sure that all arguments of a function-typed
expression are named, by introducing lambda expressions. When combined with
β-reduction and non-representable binding inlining, all function-typed
\transexample{eta}{η-abstraction}{from}{to}
- \subsubsection{Application propagation}
+ \subsubsection[sec:normalization:appprop]{Application propagation}
This transformation is meant to propagate application expressions downwards
into expressions as far as possible. This allows partial applications inside
expressions to become fully applied and exposes new transformation
\transexample{apppropcase}{Application propagation for a case expression}{from}{to}
- \subsubsection{Let recursification}
+ \subsubsection[sec:normalization:letrecurse]{Let recursification}
This transformation makes all non-recursive lets recursive. In the
end, we want a single recursive let in our normalized program, so all
non-recursive lets can be converted. This also makes other
\todo{More examples}
- \subsection{Argument simplification}
- The transforms in this section deal with simplifying application
- arguments into normal form. The goal here is to:
-
- \todo{This section should only talk about representable arguments. Non
- representable arguments are treated by specialization.}
-
- \startitemize
- \item Make all arguments of user-defined functions (\eg, of which
- we have a function body) simple variable references of a runtime
- representable type. This is needed, since these applications will be turned
- into component instantiations.
- \item Make all arguments of builtin functions one of:
- \startitemize
- \item A type argument.
- \item A dictionary argument.
- \item A type level expression.
- \item A variable reference of a runtime representable type.
- \item A variable reference or partial application of a function type.
- \stopitemize
- \stopitemize
-
- When looking at the arguments of a user-defined function, we can
- divide them into two categories:
- \startitemize
- \item Arguments of a runtime representable type (\eg bits or vectors).
-
- These arguments can be preserved in the program, since they can
- be translated to input ports later on. However, since we can
- only connect signals to input ports, these arguments must be
- reduced to simple variables (for which signals will be
- produced). This is taken care of by the argument extraction
- transform.
- \item Non-runtime representable typed arguments. \todo{Move this
- bullet to specialization}
-
- These arguments cannot be preserved in the program, since we
- cannot represent them as input or output ports in the resulting
- \small{VHDL}. To remove them, we create a specialized version of the
- called function with these arguments filled in. This is done by
- the argument propagation transform.
-
- Typically, these arguments are type and dictionary arguments that are
- used to make functions polymorphic. By propagating these arguments, we
- are essentially doing the same which GHC does when it specializes
- functions: Creating multiple variants of the same function, one for
- each type for which it is used. Other common non-representable
- arguments are functions, e.g. when calling a higher order function
- with another function or a lambda abstraction as an argument.
-
- The reason for doing this is similar to the reasoning provided for
- the inlining of non-representable let bindings above. In fact, this
- argument propagation could be viewed as a form of cross-function
- inlining.
- \stopitemize
-
- \todo{Move this itemization into a new section about builtin functions}
- When looking at the arguments of a builtin function, we can divide them
- into categories:
-
- \startitemize
- \item Arguments of a runtime representable type.
-
- As we have seen with user-defined functions, these arguments can
- always be reduced to a simple variable reference, by the
- argument extraction transform. Performing this transform for
- builtin functions as well, means that the translation of builtin
- functions can be limited to signal references, instead of
- needing to support all possible expressions.
-
- \item Arguments of a function type.
-
- These arguments are functions passed to higher order builtins,
- like \lam{map} and \lam{foldl}. Since implementing these
- functions for arbitrary function-typed expressions (\eg, lambda
- expressions) is rather comlex, we reduce these arguments to
- (partial applications of) global functions.
-
- We can still support arbitrary expressions from the user code,
- by creating a new global function containing that expression.
- This way, we can simply replace the argument with a reference to
- that new function. However, since the expression can contain any
- number of free variables we also have to include partial
- applications in our normal form.
-
- This category of arguments is handled by the function extraction
- transform.
- \item Other unrepresentable arguments.
-
- These arguments can take a few different forms:
- \startdesc{Type arguments}
- In the core language, type arguments can only take a single
- form: A type wrapped in the Type constructor. Also, there is
- nothing that can be done with type expressions, except for
- applying functions to them, so we can simply leave type
- arguments as they are.
- \stopdesc
- \startdesc{Dictionary arguments}
- In the core language, dictionary arguments are used to find
- operations operating on one of the type arguments (mostly for
- finding class methods). Since we will not actually evaluatie
- the function body for builtin functions and can generate
- code for builtin functions by just looking at the type
- arguments, these arguments can be ignored and left as they
- are.
- \stopdesc
- \startdesc{Type level arguments}
- Sometimes, we want to pass a value to a builtin function, but
- we need to know the value at compile time. Additionally, the
- value has an impact on the type of the function. This is
- encoded using type-level values, where the actual value of the
- argument is not important, but the type encodes some integer,
- for example. Since the value is not important, the actual form
- of the expression does not matter either and we can leave
- these arguments as they are.
- \stopdesc
- \startdesc{Other arguments}
- Technically, there is still a wide array of arguments that can
- be passed, but does not fall into any of the above categories.
- However, none of the supported builtin functions requires such
- an argument. This leaves use with passing unsupported types to
- a function, such as calling \lam{head} on a list of functions.
-
- In these cases, it would be impossible to generate hardware
- for such a function call anyway, so we can ignore these
- arguments.
-
- The only way to generate hardware for builtin functions with
- arguments like these, is to expand the function call into an
- equivalent core expression (\eg, expand map into a series of
- function applications). But for now, we choose to simply not
- support expressions like these.
- \stopdesc
-
- From the above, we can conclude that we can simply ignore these
- other unrepresentable arguments and focus on the first two
- categories instead.
+ \subsection[sec:normalization:argsimpl]{Representable arguments simplification}
+ This section contains just a single transformation that deals with
+ representable arguments in applications. Non-representable arguments are
+ handled by the transformations in
+ \in{section}[sec:normalization:nonrep].
+
+ This transformation ensures that all representable arguments will become
+ references to local variables. This ensures they will become references
+ to local signals in the resulting \small{VHDL}, which is required due to
+ limitations in the component instantiation code in \VHDL (one can only
+ assign a signal or constant to an input port). By ensuring that all
+ arguments are always simple variable references, we always have a signal
+ available to map to the input ports.
+
+ To reduce a complex expression to a simple variable reference, we create
+ a new let expression around the application, which binds the complex
+ expression to a new variable. The original function is then applied to
+ this variable.
+
+ \refdef{global variable}
+ Note that references to \emph{global variables} (like a top level
+ function without arguments, but also an argumentless dataconstructors
+ like \lam{True}) are also simplified. Only local variables generate
+ signals in the resulting architecture. Even though argumentless
+ dataconstructors generate constants in generated \VHDL code and could be
+ mapped to an input port directly, they are still simplified to make the
+ normal form more regular.
+
+ \refdef{representable}
+ \starttrans
+ M N
+ -------------------- \lam{N} is representable
+ letrec x = N in M x \lam{N} is not a local variable reference
+ \stoptrans
+ \refdef{local variable}
+
+ \startbuffer[from]
+ add (add a 1) 1
+ \stopbuffer
+
+ \startbuffer[to]
+ letrec x = add a 1 in add x 1
+ \stopbuffer
+
+ \transexample{argsimpl}{Argument simplification}{from}{to}
+
+ \subsection[sec:normalization:builtins]{Builtin functions}
+ This section deals with (arguments to) builtin functions. In the
+ intended normal form definition\refdef{intended normal form definition}
+ we can see that there are three sorts of arguments a builtin function
+ can receive.
+
+ \startitemize[KR]
+ \item A representable local variable reference. This is the most
+ common argument to any function. The argument simplification
+ transformation described in \in{section}[sec:normalization:argsimpl]
+ makes sure that \emph{any} representable argument to \emph{any}
+ function (including builtin functions) is turned into a local variable
+ reference.
+ \item (A partial application of) a top level function (either builtin on
+ user-defined). The function extraction transformation described in
+ this section takes care of turning every functiontyped argument into
+ (a partial application of) a top level function.
+ \item Any expression that is not representable and does not have a
+ function type. Since these can be any expression, there is no
+ transformation needed. Note that this category is exactly all
+ expressions that are not transformed by the transformations for the
+ previous two categories. This means that \emph{any} core expression
+ that is used as an argument to a builtin function will be either
+ transformed into one of the above categories, or end up in this
+ categorie. In any case, the result is in normal form.
\stopitemize
- \subsubsection{Argument simplification}
- This transform deals with arguments to functions that
- are of a runtime representable type. It ensures that they will all become
- references to global variables, or local signals in the resulting
- \small{VHDL}, which is required due to limitations in the component
- instantiation code in \VHDL (one can only assign a signal or constant
- to an input port). By ensuring that all arguments are always simple
- variable references, we always have a signal available to assign to
- input ports.
-
- \todo{Say something about dataconstructors (without arguments, like True
- or False), which are variable references of a runtime representable
- type, but do not result in a signal.}
-
- To reduce a complex expression to a simple variable reference, we create
- a new let expression around the application, which binds the complex
- expression to a new variable. The original function is then applied to
- this variable.
-
- Note that a reference to a \emph{global variable} (like a top level
- function without arguments, but also an argumentless dataconstructors
- like \lam{True}) is also simplified. Only local variables generate
- signals in the resulting architecture.
-
- \refdef{representable}
- \starttrans
- M N
- -------------------- \lam{N} is representable
- letrec x = N in M x \lam{N} is not a local variable reference
- \stoptrans
- \refdef{local variable}
-
- \startbuffer[from]
- add (add a 1) 1
- \stopbuffer
-
- \startbuffer[to]
- letrec x = add a 1 in add x 1
- \stopbuffer
+ As noted, the argument simplification will handle any representable
+ arguments to a builtin function. The following transformation is needed
+ to handle non-representable arguments with a function type, all other
+ non-representable arguments don't need any special handling.
- \transexample{argextract}{Argument extraction}{from}{to}
-
- \subsubsection{Function extraction}
- \todo{Move to section about builtin functions}
+ \subsubsection[sec:normalization:funextract]{Function extraction}
This transform deals with function-typed arguments to builtin
- functions. Since builtin functions cannot be specialized to remove
- the arguments, we choose to extract these arguments into a new global
- function instead. This greatly simplifies the translation rules needed
- for builtin functions. \todo{Should we talk about these? Reference
- Christiaan?}
+ functions.
+ Since builtin functions cannot be specialized (see
+ \in{section}[sec:normalization:specialize]) to remove the arguments,
+ these arguments are extracted into a new global function instead. In
+ other words, we create a new top level function that has exactly the
+ extracted argument as its body. This greatly simplifies the
+ translation rules needed for builtin functions, since they only need
+ to handle (partial applications of) top level functions.
Any free variables occuring in the extracted arguments will become
parameters to the new global function. The original argument is replaced
x = λf0 ... λfn.N
\stoptrans
- \todo{Split this example}
\startbuffer[from]
- map (λa . add a b) xs
-
- map (add b) ys
+ addList = λb.λxs.map (λa . add a b) xs
\stopbuffer
\startbuffer[to]
- map (x0 b) xs
-
- map x1 ys
+ addList = λb.λxs.map (f b) xs
~
- x0 = λb.λa.add a b
- x1 = λb.add b
+ f = λb.λa.add a b
\stopbuffer
\transexample{funextract}{Function extraction}{from}{to}
- Note that \lam{x0} and {x1} will still need normalization after this.
-
- \subsubsection{Argument propagation}
- \todo{Rename this section to specialization and move it into a
- separate section}
-
- This transform deals with arguments to user-defined functions that are
- not representable at runtime. This means these arguments cannot be
- preserved in the final form and most be {\em propagated}.
-
- Propagation means to create a specialized version of the called
- function, with the propagated argument already filled in. As a simple
- example, in the following program:
-
- \startlambda
- f = λa.λb.a + b
- inc = λa.f a 1
- \stoplambda
-
- We could {\em propagate} the constant argument 1, with the following
- result:
-
- \startlambda
- f' = λa.a + 1
- inc = λa.f' a
- \stoplambda
-
- Special care must be taken when the to-be-propagated expression has any
- free variables. If this is the case, the original argument should not be
- removed completely, but replaced by all the free variables of the
- expression. In this way, the original expression can still be evaluated
- inside the new function. Also, this brings us closer to our goal: All
- these free variables will be simple variable references.
-
- To prevent us from propagating the same argument over and over, a simple
- local variable reference is not propagated (since is has exactly one
- free variable, itself, we would only replace that argument with itself).
-
- This shows that any free local variables that are not runtime representable
- cannot be brought into normal form by this transform. We rely on an
- inlining transformation to replace such a variable with an expression we
- can propagate again.
-
- \starttrans
- x = E
- ~
- x Y0 ... Yi ... Yn \lam{Yi} is not of a runtime representable type
- --------------------------------------------- \lam{Yi} is not a local variable reference
- x' y0 ... yi-1 f0 ... fm Yi+1 ... Yn \lam{f0 ... fm} are all free local vars of \lam{Yi}
- ~
- x' = λy0 ... λyi-1. λf0 ... λfm. λyi+1 ... λyn .
- E y0 ... yi-1 Yi yi+1 ... yn
- \stoptrans
-
- \todo{Describe what the formal specification means}
- \todo{Note that we don't change the sepcialised function body, only
- wrap it}
-
- \todo{Example}
-
+ Note that the function \lam{f} will still need normalization after
+ this.
\subsection{Case normalisation}
\subsubsection{Scrutinee simplification}
False -> b
\stopbuffer
- \transexample{letflat}{Let flattening}{from}{to}
+ \transexample{letflat}{Case normalisation}{from}{to}
\subsubsection{Case simplification}
\transexample{caserem}{Case removal}{from}{to}
- \todo{Move these two sections somewhere? Perhaps not?}
- \subsection{Removing polymorphism}
- Reference type-specialization (== argument propagation)
+ \subsection[sec:normalization:nonrep]{Removing unrepresentable values}
+ The transformations in this section are aimed at making all the
+ values used in our expression representable. There are two main
+ transformations that are applied to \emph{all} unrepresentable let
+ bindings and function arguments. These are meant to address three
+ different kinds of unrepresentable values: Polymorphic values, higher
+ order values and literals. The transformation are described generically:
+ They apply to all non-representable values. However, non-representable
+ values that don't fall into one of these three categories will be moved
+ around by these transformations but are unlikely to completely
+ disappear. They usually mean the program was not valid in the first
+ place, because unsupported types were used (for example, a program using
+ strings).
+
+ Each of these three categories will be detailed below, followed by the
+ actual transformations.
+
+ \subsubsection{Removing Polymorphism}
+ As noted in \in{section}[sec:prototype:polymporphism],
+ polymorphism is made explicit in Core through type and
+ dictionary arguments. To remove the polymorphism from a
+ function, we can simply specialize the polymorphic function for
+ the particular type applied to it. The same goes for dictionary
+ arguments. To remove polymorphism from let bound values, we
+ simply inline the let bindings that have a polymorphic type,
+ which should (eventually) make sure that the polymorphic
+ expression is applied to a type and/or dictionary, which can
+ then be removed by β-reduction (\in{section}[sec:normalization:beta]).
+
+ Since both type and dictionary arguments are not representable,
+ \refdef{representable}
+ the non-representable argument specialization and
+ non-representable let binding inlining transformations below
+ take care of exactly this.
+
+ There is one case where polymorphism cannot be completely
+ removed: Builtin functions are still allowed to be polymorphic
+ (Since we have no function body that we could properly
+ specialize). However, the code that generates \VHDL for builtin
+ functions knows how to handle this, so this is not a problem.
+
+ \subsubsection{Defunctionalization}
+ These transformations remove higher order expressions from our
+ program, making all values first-order.
+
+ Higher order values are always introduced by lambda abstractions, none
+ of the other Core expression elements can introduce a function type.
+ However, other expressions can \emph{have} a function type, when they
+ have a lambda expression in their body.
+
+ For example, the following expression is a higher order expression
+ that is not a lambda expression itself:
+
+ \refdef{id function}
+ \startlambda
+ case x of
+ High -> id
+ Low -> λx.x
+ \stoplambda
+
+ The reference to the \lam{id} function shows that we can introduce a
+ higher order expression in our program without using a lambda
+ expression directly. However, inside the definition of the \lam{id}
+ function, we can be sure that a lambda expression is present.
+
+ Looking closely at the definition of our normal form in
+ \in{section}[sec:normalization:intendednormalform], we can see that
+ there are three possibilities for higher order values to appear in our
+ intended normal form:
+
+ \startitemize[KR]
+ \item[item:toplambda] Lambda abstractions can appear at the highest level of a
+ top level function. These lambda abstractions introduce the
+ arguments (input ports / current state) of the function.
+ \item[item:builtinarg] (Partial applications of) top level functions can appear as an
+ argument to a builtin function.
+ \item[item:completeapp] (Partial applications of) top level functions can appear in
+ function position of an application. Since a partial application
+ cannot appear anywhere else (except as builtin function arguments),
+ all partial applications are applied, meaning that all applications
+ will become complete applications. However, since application of
+ arguments happens one by one, in the expression:
+ \startlambda
+ f 1 2
+ \stoplambda
+ the subexpression \lam{f 1} has a function type. But this is
+ allowed, since it is inside a complete application.
+ \stopitemize
- Reference polymporphic binding inlining (== non-representable binding
- inlining).
+ We will take a typical function with some higher order values as an
+ example. The following function takes two arguments: a \lam{Bit} and a
+ list of numbers. Depending on the first argument, each number in the
+ list is doubled, or the list is returned unmodified. For the sake of
+ the example, no polymorphism is shown. In reality, at least map would
+ be polymorphic.
- \subsection{Defunctionalization}
- These transformations remove most higher order expressions from our
- program, making it completely first-order (the only exception here is for
- arguments to builtin functions, since we can't specialize builtin
- function. \todo{Talk more about this somewhere}
+ \startlambda
+ λy.let double = λx. x + x in
+ case y of
+ Low -> map double
+ High -> λz. z
+ \stoplambda
+
+ This example shows a number of higher order values that we cannot
+ translate to \VHDL directly. The \lam{double} binder bound in the let
+ expression has a function type, as well as both of the alternatives of
+ the case expression. The first alternative is a partial application of
+ the \lam{map} builtin function, whereas the second alternative is a
+ lambda abstraction.
+
+ To reduce all higher order values to one of the above items, a number
+ of transformations we've already seen are used. The η-abstraction
+ transformation from \in{section}[sec:normalization:eta] ensures all
+ function arguments are introduced by lambda abstraction on the highest
+ level of a function. These lambda arguments are allowed because of
+ \in{item}[item:toplambda] above. After η-abstraction, our example
+ becomes a bit bigger:
+
+ \startlambda
+ λy.λq.(let double = λx. x + x in
+ case y of
+ Low -> map double
+ High -> λz. z
+ ) q
+ \stoplambda
+
+ η-abstraction also introduces extra applications (the application of
+ the let expression to \lam{q} in the above example). These
+ applications can then propagated down by the application propagation
+ transformation (\in{section}[sec:normalization:appprop]). In our
+ example, the \lam{q} and \lam{r} variable will be propagated into the
+ let expression and then into the case expression:
+
+ \startlambda
+ λy.λq.let double = λx. x + x in
+ case y of
+ Low -> map double q
+ High -> (λz. z) q
+ \stoplambda
+
+ This propagation makes higher order values become applied (in
+ particular both of the alternatives of the case now have a
+ representable type. Completely applied top level functions (like the
+ first alternative) are now no longer invalid (they fall under
+ \in{item}[item:completeapp] above). (Completely) applied lambda
+ abstractions can be removed by β-abstraction. For our example,
+ applying β-abstraction results in the following:
+
+ \startlambda
+ λy.λq.let double = λx. x + x in
+ case y of
+ Low -> map double q
+ High -> q
+ \stoplambda
+
+ As you can see in our example, all of this moves applications towards
+ the higher order values, but misses higher order functions bound by
+ let expressions. The applications cannot be moved towards these values
+ (since they can be used in multiple places), so the values will have
+ to be moved towards the applications. This is achieved by inlining all
+ higher order values bound by let applications, by the
+ non-representable binding inlining transformation below. When applying
+ it to our example, we get the following:
+
+ \startlambda
+ λy.λq.case y of
+ Low -> map (λx. x + x) q
+ High -> q
+ \stoplambda
- Reference higher-order-specialization (== argument propagation)
+ We've nearly eliminated all unsupported higher order values from this
+ expressions. The one that's remaining is the first argument to the
+ \lam{map} function. Having higher order arguments to a builtin
+ function like \lam{map} is allowed in the intended normal form, but
+ only if the argument is a (partial application) of a top level
+ function. This is easily done by introducing a new top level function
+ and put the lambda abstraction inside. This is done by the function
+ extraction transformation from
+ \in{section}[sec:normalization:funextract].
+
+ \startlambda
+ λy.λq.case y of
+ Low -> map func q
+ High -> q
+ \stoplambda
+
+ This also introduces a new function, that we have called \lam{func}:
+
+ \startlambda
+ func = λx. x + x
+ \stoplambda
+
+ Note that this does not actually remove the lambda, but now it is a
+ lambda at the highest level of a function, which is allowed in the
+ intended normal form.
+
+ There is one case that has not been discussed yet. What if the
+ \lam{map} function in the example above was not a builtin function
+ but a user-defined function? Then extracting the lambda expression
+ into a new function would not be enough, since user-defined functions
+ can never have higher order arguments. For example, the following
+ expression shows an example:
+
+ \startlambda
+ twice :: (Word -> Word) -> Word -> Word
+ twice = λf.λa.f (f a)
+
+ main = λa.app (λx. x + x) a
+ \stoplambda
+
+ This example shows a function \lam{twice} that takes a function as a
+ first argument and applies that function twice to the second argument.
+ Again, we've made the function monomorphic for clarity, even though
+ this function would be a lot more useful if it was polymorphic. The
+ function \lam{main} uses \lam{twice} to apply a lambda epression twice.
+
+ When faced with a user defined function, a body is available for that
+ function. This means we could create a specialized version of the
+ function that only works for this particular higher order argument
+ (\ie, we can just remove the argument and call the specialized
+ function without the argument). This transformation is detailed below.
+ Applying this transformation to the example gives:
+
+ \startlambda
+ twice' :: Word -> Word
+ twice' = λb.(λf.λa.f (f a)) (λx. x + x) b
+
+ main = λa.app' a
+ \stoplambda
+
+ The \lam{main} function is now in normal form, since the only higher
+ order value there is the top level lambda expression. The new
+ \lam{twice'} function is a bit complex, but the entire original body of
+ the original \lam{twice} function is wrapped in a lambda abstraction
+ and applied to the argument we've specialized for (\lam{λx. x + x})
+ and the other arguments. This complex expression can fortunately be
+ effectively reduced by repeatedly applying β-reduction:
+
+ \startlambda
+ twice' :: Word -> Word
+ twice' = λb.(b + b) + (b + b)
+ \stoplambda
+
+ This example also shows that the resulting normal form might not be as
+ efficient as we might hope it to be (it is calculating \lam{(b + b)}
+ twice). This is discussed in more detail in
+ \in{section}[sec:normalization:duplicatework].
+
+ \subsubsection{Literals}
+ There are a limited number of literals available in Haskell and Core.
+ \refdef{enumerated types} When using (enumerating) algebraic
+ datatypes, a literal is just a reference to the corresponding data
+ constructor, which has a representable type (the algebraic datatype)
+ and can be translated directly. This also holds for literals of the
+ \hs{Bool} Haskell type, which is just an enumerated type.
+
+ There is, however, a second type of literal that does not have a
+ representable type: Integer literals. Cλash supports using integer
+ literals for all three integer types supported (\hs{SizedWord},
+ \hs{SizedInt} and \hs{RangedWord}). This is implemented using
+ Haskell's \hs{Num} typeclass, which offers a \hs{fromInteger} method
+ that converts any \hs{Integer} to the Cλash datatypes.
+
+ When \GHC sees integer literals, it will automatically insert calls to
+ the \hs{fromInteger} method in the resulting Core expression. For
+ example, the following expression in Haskell creates a 32 bit unsigned
+ word with the value 1. The explicit type signature is needed, since
+ there is no context for \GHC to determine the type from otherwise.
+
+ \starthaskell
+ 1 :: SizedWord D32
+ \stophaskell
+
+ This Haskell code results in the following Core expression:
+
+ \startlambda
+ fromInteger @(SizedWord D32) \$dNum (smallInteger 10)
+ \stoplambda
+
+ The literal 10 will have the type \lam{GHC.Prim.Int\#}, which is
+ converted into an \lam{Integer} by \lam{smallInteger}. Finally, the
+ \lam{fromInteger} function will finally convert this into a
+ \lam{SizedWord D32}.
+
+ Both the \lam{GHC.Prim.Int\#} and \lam{Integer} types are not
+ representable, and cannot be translated directly. Fortunately, there
+ is no need to translate them, since \lam{fromInteger} is a builtin
+ function that knows how to handle these values. However, this does
+ require that the \lam{fromInteger} function is directly applied to
+ these non-representable literal values, otherwise errors will occur.
+ For example, the following expression is not in the intended normal
+ form, since one of the let bindings has an unrepresentable type
+ (\lam{Integer}):
+
+ \startlambda
+ let l = smallInteger 10 in fromInteger @(SizedWord D32) \$dNum l
+ \stoplambda
+
+ By inlining these let-bindings, we can ensure that unrepresentable
+ literals bound by a let binding end up in an application of the
+ appropriate builtin function, where they are allowed. Since it is
+ possible that the application of that function is in a different
+ function than the definition of the literal value, we will always need
+ to specialize away any unrepresentable literals that are used as
+ function arguments. The following two transformations do exactly this.
\subsubsection{Non-representable binding inlining}
- \todo{Move this section into a new section (together with
- specialization?)}
This transform inlines let bindings that are bound to a
non-representable value. Since we can never generate a signal
assignment for these bindings (we cannot declare a signal assignment
with a non-representable type, for obvious reasons), we have no choice
but to inline the binding to remove it.
- If the binding is non-representable because it is a lambda abstraction, it is
- likely that it will inlined into an application and β-reduction will remove
- the lambda abstraction and turn it into a representable expression at the
- inline site. The same holds for partial applications, which can be turned into
- full applications by inlining.
-
- Other cases of non-representable bindings we see in practice are primitive
- Haskell types. In most cases, these will not result in a valid normalized
- output, but then the input would have been invalid to start with. There is one
- exception to this: When a builtin function is applied to a non-representable
- expression, things might work out in some cases. For example, when you write a
- literal \hs{SizedInt} in Haskell, like \hs{1 :: SizedInt D8}, this results in
- the following core: \lam{fromInteger (smallInteger 10)}, where for example
- \lam{10 :: GHC.Prim.Int\#} and \lam{smallInteger 10 :: Integer} have
- non-representable types. \todo{Expand on this. This/these paragraph(s)
- should probably become a separate discussion somewhere else}
-
- \todo{Can this duplicate work?}
+ As we have seen in the previous sections, inlining these bindings
+ solves (part of) the polymorphism, higher order values and
+ unrepresentable literals in an expression.
\starttrans
letrec
\transexample{nonrepinline}{Nonrepresentable binding inlining}{from}{to}
+ \subsubsection[sec:normalization:specialize]{Function specialization}
+ This transform removes arguments to user-defined functions that are
+ not representable at runtime. This is done by creating a
+ \emph{specialized} version of the function that only works for one
+ particular value of that argument (in other words, the argument can be
+ removed).
+
+ Specialization means to create a specialized version of the called
+ function, with one argument already filled in. As a simple example, in
+ the following program (this is not actual Core, since it directly uses
+ a literal with the unrepresentable type \lam{GHC.Prim.Int\#}).
+
+ \startlambda
+ f = λa.λb.a + b
+ inc = λa.f a 1
+ \stoplambda
+
+ We could specialize the function \lam{f} against the literal argument
+ 1, with the following result:
+
+ \startlambda
+ f' = λa.a + 1
+ inc = λa.f' a
+ \stoplambda
+
+ In some way, this transformation is similar to β-reduction, but it
+ operates across function boundaries. It is also similar to
+ non-representable let binding inlining above, since it sort of
+ \quote{inlines} an expression into a called function.
+
+ Special care must be taken when the argument has any free variables.
+ If this is the case, the original argument should not be removed
+ completely, but replaced by all the free variables of the expression.
+ In this way, the original expression can still be evaluated inside the
+ new function.
+
+ To prevent us from propagating the same argument over and over, a
+ simple local variable reference is not propagated (since is has
+ exactly one free variable, itself, we would only replace that argument
+ with itself).
+
+ This shows that any free local variables that are not runtime
+ representable cannot be brought into normal form by this transform. We
+ rely on an inlining or β-reduction transformation to replace such a
+ variable with an expression we can propagate again.
+
+ \starttrans
+ x = E
+ ~
+ x Y0 ... Yi ... Yn \lam{Yi} is not representable
+ --------------------------------------------- \lam{Yi} is not a local variable reference
+ x' y0 ... yi-1 f0 ... fm Yi+1 ... Yn \lam{f0 ... fm} are all free local vars of \lam{Yi}
+ ~ \lam{T0 ... Tn} are the types of \lam{Y0 ... Yn}
+ x' = λ(y0 :: T0) ... λ(yi-1 :: Ty-1). λf0 ... λfm. λ(yi+1 :: Ty+1) ... λ(yn :: Tn).
+ E y0 ... yi-1 Yi yi+1 ... yn
+ \stoptrans
+
+ This is a bit of a complex transformation. It transforms an
+ application of the function \lam{x}, where one of the arguments
+ (\lam{Y_i}) is not representable. A new
+ function \lam{x'} is created that wraps the body of the old function.
+ The body of the new function becomes a number of nested lambda
+ abstractions, one for each of the original arguments that are left
+ unchanged.
+
+ The ith argument is replaced with the free variables of
+ \lam{Y_i}. Note that we reuse the same binders as those used in
+ \lam{Y_i}, since we can then just use \lam{Y_i} inside the new
+ function body and have all of the variables it uses be in scope.
+
+ The argument that we are specializing for, \lam{Y_i}, is put inside
+ the new function body. The old function body is applied to it. Since
+ we use this new function only in place of an application with that
+ particular argument \lam{Y_i}, behaviour should not change.
+
+ Note that the types of the arguments of our new function are taken
+ from the types of the \emph{actual} arguments (\lam{T0 ... Tn}). This
+ means that any polymorphism in the arguments is removed, even when the
+ corresponding explicit type lambda is not removed
+ yet.\refdef{type lambda}
+
+ \todo{Examples. Perhaps reference the previous sections}
+
+
+ \section{Unsolved problems}
+ The above system of transformations has been implemented in the prototype
+ and seems to work well to compile simple and more complex examples of
+ hardware descriptions. \todo{Ref christiaan?} However, this normalization
+ system has not seen enough review and work to be complete and work for
+ every Core expression that is supplied to it. A number of problems
+ have already been identified and are discussed in this section.
+
+ \subsection[sec:normalization:duplicatework]{Work duplication}
+ A possible problem of β-reduction is that it could duplicate work.
+ When the expression applied is not a simple variable reference, but
+ requires calculation and the binder the lambda abstraction binds to
+ is used more than once, more hardware might be generated than strictly
+ needed.
+
+ As an example, consider the expression:
+
+ \startlambda
+ (λx. x + x) (a * b)
+ \stoplambda
+
+ When applying β-reduction to this expression, we get:
+
+ \startlambda
+ (a * b) + (a * b)
+ \stoplambda
+
+ which of course calculates \lam{(a * b)} twice.
+
+ A possible solution to this would be to use the following alternative
+ transformation, which is of course no longer normal β-reduction. The
+ followin transformation has not been tested in the prototype, but is
+ given here for future reference:
+
+ \starttrans
+ (λx.E) M
+ -----------------
+ letrec x = M in E
+ \stoptrans
+
+ This doesn't seem like much of an improvement, but it does get rid of
+ the lambda expression (and the associated higher order value), while
+ at the same time introducing a new let binding. Since the result of
+ every application or case expression must be bound by a let expression
+ in the intended normal form anyway, this is probably not a problem. If
+ the argument happens to be a variable reference, then simple let
+ binding removal (\in{section}[sec:normalization:simplelet]) will
+ remove it, making the result identical to that of the original
+ β-reduction transformation.
+
+ When also applying argument simplification to the above example, we
+ get the following expression:
+
+ \startlambda
+ let y = (a * b)
+ z = (a * b)
+ in y + z
+ \stoplambda
+
+ Looking at this, we could imagine an alternative approach: Create a
+ transformation that removes let bindings that bind identical values.
+ In the above expression, the \lam{y} and \lam{z} variables could be
+ merged together, resulting in the more efficient expression:
+
+ \startlambda
+ let y = (a * b) in y + y
+ \stoplambda
+
+ \subsection{Non-determinism}
+ As an example, again consider the following expression:
+
+ \startlambda
+ (λx. x + x) (a * b)
+ \stoplambda
+
+ We can apply both β-reduction (\in{section}[sec:normalization:beta])
+ as well as argument simplification
+ (\in{section}[sec:normalization:argsimpl]) to this expression.
+
+ When applying argument simplification first and then β-reduction, we
+ get the following expression:
+
+ \startlambda
+ let y = (a * b) in y + y
+ \stoplambda
+
+ When applying β-reduction first and then argument simplification, we
+ get the following expression:
+
+ \startlambda
+ let y = (a * b)
+ z = (a * b)
+ in y + z
+ \stoplambda
+
+ As you can see, this is a different expression. This means that the
+ order of expressions, does in fact change the resulting normal form,
+ which is something that we would like to avoid. In this particular
+ case one of the alternatives is even clearly more efficient, so we
+ would of course like the more efficient form to be the normal form.
+
+ For this particular problem, the solutions for duplication of work
+ seem from the previous section seem to fix the determinism of our
+ transformation system as well. However, it is likely that there are
+ other occurences of this problem.
+
+ \subsection{Casts}
+ We do not fully understand the use of cast expressions in Core, so
+ there are probably expressions involving cast expressions that cannot
+ be brought into intended normal form by this transformation system.
+
+ The uses of casts in the core system should be investigated more and
+ transformations will probably need updating to handle them in all
+ cases.
\section[sec:normalization:properties]{Provable properties}
When looking at the system of transformations outlined above, there are a