From 1d5f13016686759c85752c6a2fd316dbd3f2609e Mon Sep 17 00:00:00 2001 From: Matthijs Kooijman Date: Thu, 15 Oct 2009 18:33:25 +0200 Subject: [PATCH] Add a section on expressions in the Core language. There should be another (sub)section on typing in the Core language. --- Chapters/Prototype.tex | 255 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 254 insertions(+), 1 deletion(-) diff --git a/Chapters/Prototype.tex b/Chapters/Prototype.tex index 293e004..6ef6a97 100644 --- a/Chapters/Prototype.tex +++ b/Chapters/Prototype.tex @@ -197,8 +197,261 @@ constructs. This step is described in a lot of detail at \in{chapter}[chap:normalization]. + \section{The Core language} + Most of the prototype deals with handling the program in the Core + language. In this section we will show what this language looks like and + how it works. + + The Core language is a functional language that describes + \emph{expressions}. Every identifier used in Core is called a + \emph{binder}, since it is bound to a value somewhere. On the highest + level, a Core program is a collection of functions, each of which bind a + binder (the function name) to an expression (the function value, which has + a function type). + + The Core language itself does not prescribe any program structure, only + expression structure. In the \small{GHC} compiler, the Haskell module + structure is used for the resulting Core code as well. Since this is not + so relevant for understanding the Core language or the Normalization + process, we'll only look at the Core expression language here. + + Each Core expression consists of one of these possible expressions. + + \startdesc{Variable reference} +\startlambda +a +\stoplambda + This is a simple reference to a binder. It's written down as the + name of the binder that is being referred to, which should of course be + bound in a containing scope (including top level scope, so a reference + to a top level function is also a variable reference). Additionally, + constructors from algebraic datatypes also become variable references. + + The value of this expression is the value bound to the given binder. + + Each binder also carries around its type, but this is usually not shown + in the Core expressions. Occasionally, the type of an entire expression + or function is shown for clarity, but this is only informational. In + practice, the type of an expression is easily determined from the + structure of the expression and the types of the binders and occasional + cast expressions. This minimize the amount of bookkeeping needed to keep + the typing consistent. + \stopdesc + \startdesc{Literal} +\startlambda +10 +\stoplambda + This is a simple literal. Only primitive types are supported, like + chars, strings, ints and doubles. The types of these literals are the + \quote{primitive} versions, like \lam{Char\#} and \lam{Word\#}, not the + normal Haskell versions (but there are builtin conversion functions). + \stopdesc + \startdesc{Application} +\startlambda +func arg +\stoplambda + This is simple function application. Each application consists of two + parts: The function part and the argument part. Applications are used + for normal function \quote{calls}, but also for applying type + abstractions and data constructors. + + The value of an application is the value of the function part, with the + first argument binder bound to the argument part. + \stopdesc + \startdesc{Lambda abstraction} +\startlambda +λbndr.body +\stoplambda + This is the basic lambda abstraction, as it occurs in labmda calculus. + It consists of a binder part and a body part. A lambda abstraction + creates a function, that can be applied to an argument. + + Note that the body of a lambda abstraction extends all the way to the + end of the expression, or the closing bracket surrounding the lambda. In + other words, the lambda abstraction \quote{operator} has the lowest + priority of all. + + The value of an application is the value of the body part, with the + binder bound to the value the entire lambda abstraction is applied to. + \stopdesc + \startdesc{Non-recursive let expression} +\startlambda +let bndr = value in body +\stoplambda + A let expression allows you to bind a binder to some value, while + evaluating to some other value (where that binder is in scope). This + allows for sharing of subexpressions (you can use a binder twice) and + explicit \quote{naming} of arbitrary expressions. Note that the binder + is not in scope in the value bound to it, so it's not possible to make + recursive definitions with the normal form of the let expression (see + the recursive form below). + + Even though this let expression is an extension on the basic lambda + calculus, it is easily translated to a lambda abstraction. The let + expression above would then become: + +\startlambda +(λbndr.body) value +\stoplambda + + This notion might be useful for verifying certain properties on + transformations, since a lot of verification work has been done on + lambda calculus already. + + The value of a let expression is the value of the body part, with the + binder bound to the value. + \stopdesc + \startdesc{Recursive let expression} +\startlambda +let + bndr1 = value1 + \vdots + bndrn = valuen +in + body +\stoplambda + + This is the recursive version of the let expression. In \small{GHC}'s + Core implementation, non-recursive and recursive lets are not so + distinct as we present them here, but this provides a clearer overview. + + The main difference with the normal let expression is that each of the + binders is in scope in each of the values, in addition to the body. This + allows for self-recursive definitions or mutually recursive definitions. + + It should also be possible to express a recursive let using normal + lambda calculus, if we use the \emph{least fixed-point operator}, + \lam{Y}. + \stopdesc + \startdesc{Case expression} +\startlambda + case scrut of bndr + DEFAULT -> defaultbody + C0 bndr0,0 ... bndr0,m -> body0 + \vdots + Cn bndrn,0 ... bndrn,m -> bodyn +\stoplambda + +TODO: Define WHNF + + A case expression is the only way in Core to choose between values. A case + expression evaluates its scrutinee, which should have an algebraic + datatype, into weak head normal form (\small{WHNF}) and (optionally) binds + it to \lam{bndr}. It then chooses a body depending on the constructor of + its scrutinee. If none of the constructors match, the \lam{DEFAULT} + alternative is chosen. + + Since we can only match the top level constructor, there can be no overlap + in the alternatives and thus order of alternatives is not relevant (though + the \lam{DEFAULT} alternative must appear first for implementation + efficiency). + + Any arguments to the constructor in the scrutinee are bound to each of the + binders after the constructor and are in scope only in the corresponding + body. + + To support strictness, the scrutinee is always evaluated into WHNF, even + when there is only a \lam{DEFAULT} alternative. This allows a strict + function argument to be written like: + +\startlambda +function (case argument of arg + DEFAULT -> arg) +\stoplambda + + This seems to be the only use for the extra binder to which the scrutinee + is bound. When not using strictness annotations (which is rather pointless + in hardware descriptions), \small{GHC} seems to never generate any code + making use of this binder. The current prototype does not handle it + either, which probably means that code using it would break. + + Note that these case statements are less powerful than the full Haskell + case statements. In particular, they do not support complex patterns like + in Haskell. Only the constructor of an expression can be matched, complex + patterns are implemented using multiple nested case expressions. + + Case statements are also used for unpacking of algebraic datatypes, even + when there is only a single constructor. For examples, to add the elements + of a tuple, the following Core is generated: + +\startlambda +sum = λtuple.case tuple of + (,) a b -> a + b +\stoplambda + + Here, there is only a single alternative (but no \lam{DEFAULT} + alternative, since the single alternative is already exhaustive). When + it's body is evaluated, the arguments to the tuple constructor \lam{(,)} + (\eg, the elements of the tuple) are bound to \lam{a} and \lam{b}. + \stopdesc + \startdesc{Cast expression} +\startlambda +body :: targettype +\stoplambda + A cast expression allows you to change the type of an expression to an + equivalent type. Note that this is not meant to do any actual work, like + conversion of data from one format to another, or force a complete type + change. Instead, it is meant to change between different representations + of the same type, \eg switch between types that are provably equal (but + look different). + + In our hardware descriptions, we typically see casts to change between a + Haskell newtype and its contained type, since those are effectively + different representations of the same type. + + More complex are types that are proven to be equal by the typechecker, + but look different at first glance. To ensure that, once the typechecker + has proven equality, this information sticks around, explicit casts are + added. In our notation we only write the target type, but in reality a + cast expressions carries around a \emph{coercion}, which can be seen as a + proof of equality. TODO: Example + + The value of a cast is the value of its body, unchanged. The type of this + value is equal to the target type, not the type of its body. + + Note that this syntax is also used sometimes to indicate that a particular + expression has a particular type, even when no cast expression is + involved. This is then purely informational, since the only elements that + are explicitely typed in the Core language are the binder references and + cast expressions, the types of all other elements are determined at + runtime. + \stopdesc + \startdesc{Note} + + The Core language in \small{GHC} allows adding \emph{notes}, which serve + as hints to the inliner or add custom (string) annotations to a core + expression. These shouldn't be generated normally, so these are not + handled in any way in the prototype. + \stopdesc + \startdesc{Type} +\startlambda +@type +\stoplambda + It is possibly to use a Core type as a Core expression. This is done to + allow for type abstractions and applications to be handled as normal + lambda abstractions and applications above. This means that a type + expression in Core can only ever occur in the argument position of an + application, and only if the type of the function that is applied to + expects a type as the first argument. This happens for all polymorphic + functions, for example, the \lam{fst} function: + +\startlambda +fst :: \forall a. \forall b. (a, b) -> a +fst = λtup.case tup of (,) a b -> a + +fstint :: (Int, Int) -> Int +fstint = λa.λb.fst @Int @Int a b +\stoplambda + + The type of \lam{fst} has two universally quantified type variables. When + \lam{fst} is applied in \lam{fstint}, it is first applied to two types. + (which are substitued for \lam{a} and \lam{b} in the type of \lam{fst}, so + the type of \lam{fst} actual type of arguments and result can be found: + \lam{fst @Int @Int :: (Int, Int) -> Int}). + \stopdesc + + TODO: Core type system - Core - description of the language (appendix?) Implementation issues Simplified core? -- 2.30.2