# $C\lambda$ asH From Haskell To Hardware

Christiaan Baaij & Matthijs Kooijman

September 3, 2009

Computer Architecture for Embedded Systems (CAES) group Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente Enschede, The Netherlands

http://caes.ewi.utwente.nl

C.VasH From Hakkell To Hardware Christiaan Baaij & Matthijs Kooijman September 3, 2009

- Small tour: what can we describe in  $C\lambda$ asH
- Quick real demo

- CλasH: CAES Language for Hardware Descriptions
- Rapid prototyping language
- Subset of Haskell can be translated to Hardware (VHDL)
- Structural Description of a Mealy Machine

September 1



- **C** $\lambda$ asH: CAES Language for Hardware Descriptions
- Rapid prototyping language
- Subset of Haskell can be translated to Hardware (VHDL)
- Structural Description of a Mealy Machine



- C $\lambda$ asH: CAES Language for Hardware Descriptions
- Rapid prototyping language
- Subset of Haskell can be translated to Hardware (VHDL)
- Structural Description of a Mealy Machine



- C $\lambda$ asH: CAES Language for Hardware Descriptions
- Rapid prototyping language
- Subset of Haskell can be translated to Hardware (VHDL)
- Structural Description of a Mealy Machine



- **C** $\lambda$ asH: CAES Language for Hardware Descriptions
- Rapid prototyping language
- Subset of Haskell can be translated to Hardware (VHDL)
- Structural Description of a Mealy Machine

- We are a Computer Architectures group, this has been a 6 month project, no prior experience with Haskell.
- $C\lambda$ asH is written in Haskell, of course
- CλasH is currently meant for rapid prototyping, not verification of hardware desigs
- Functional languages are close to Hardware
- We can only translate a subset of Haskell
- All functions are descriptions of Mealy Machines

## What again is a Mealy Machine?



Mealy machine bases its output on current input and previous state



# Haskell Description

```
mealyMachine ::
    InputSignals →
    State →
    (State, OutputSignals)
    mealyMachine inputs state = (new_state, output)
    where
        new_state = logic state input
        outputs = logic state input
```

- Current state is part of the input
- New state is part of the output



# Haskell Description

```
mealyMachine ::

InputSignals →

State →

(State, OutputSignals)

mealyMachine inputs state = (new_state, output)

where

new_state = logic state input
outputs = logic state input
```

- Current state is part of the input
- New state is part of the output



# Haskell Description

```
mealyMachine ::
    InputSignals →
    State →
    (State, OutputSignals)
    mealyMachine inputs state = (new_state, output)
    where
        new_state = logic state input
        outputs = logic state input
```

- Current state is part of the input
- New state is part of the output

- State is part of the function signature
- Both the current state, as the updated State



# Simulating a Mealy Machine

```
run func state [] = []
run func state (i:input) = o:out
where
(state', o) = func i state
out = run func state' input
```

State behaves like an accumulator



# Simulating a Mealy Machine

```
run func state [] = []
run func state (i:input) = o:out
where
(state', o) = func i state
out = run func state' input
```

■ State behaves like an accumulator



# Simulating a Mealy Machine

```
run func state [] = []
run func state (i:input) = o:out
where
(state', o) = func i state
out = run func state' input
```

■ State behaves like an accumulator

- This is just a quick example of how we can simulate the mealy machine
- It sort of behaves like MapAccumN



- Small Polymorphic, Higher-Order CPU
- Each function is turned into a hardware component
- Use of state will be simple



- Small Polymorphic, Higher-Order CPU
- Each function is turned into a hardware component
- Use of state will be simple

- Small Polymorphic, Higher-Order CPU
- Each function is turned into a hardware component
- Use of state will be simple

- Small Polymorphic, Higher-Order CPU
- Each function is turned into a hardware component
- Use of state will be simple

- Small "toy"-example of what can be done in  $C\lambda$ asH
- Show what can be translated to Hardware
- Put your hardware glasses on: each function will be a component
- Use of state will be kept simple

### **Imports**

Import all the built-in types, such as vectors and integers:

import CLasH.HardwareTypes

- Supplied

Import annotations, helps  $\mathsf{C}\lambda\mathsf{as}\mathsf{H}$  to find top-level component:

import CLasH. Translator. Annotations



Import all the built-in types, such as vectors and integers:

**import** CLasH.HardwareTypes

SHE WALL

Import annotations, helps  $\mathsf{C}\lambda$ as $\mathsf{H}$  to find top-level component

import CLasH. Translator. Annotations

### **Imports**

Import all the built-in types, such as vectors and integers:

**import** CLasH.HardwareTypes

Import annotations, helps  $C\lambda$ asH to find top-level component:

**import** CLasH. Translator. Annotations

- The first input is always needed, as it contains the builtin types
- The second one is only needed if you want to make use of Annotations



First we define some ALU types:

**type** 
$$Op \ s \ a = a \rightarrow Vector \ s \ a \rightarrow a$$
 **type**  $Opcode = Bit$ 

And some Register types:

$${f type}\,\, {\it RegBank}\,\, {\it s}\,\, a = {\it Vector}\,\,({\it s}+{\it D1})\,\, a$$
  ${\it type}\,\, {\it RegBank}\,\, {\it s}\,\, a$ 



#### First we define some ALU types:

**type** 
$$Op \ s \ a = a \rightarrow Vector \ s \ a \rightarrow a$$
  
**type**  $Opcode = Bit$ 

And some Register types:

**type** 
$$RegBank s a = Vector (s + D1) a$$
  
**type**  $RegState s a = State (RegBank s a)$ 



#### First we define some ALU types:

**type** 
$$Op \ s \ a = a \rightarrow Vector \ s \ a \rightarrow a$$
  
**type**  $Opcode = Bit$ 

#### And some Register types:

**type** 
$$RegBank \ s \ a = Vector (s + D1) \ a$$
  
**type**  $RegState \ s \ a = State (RegBank \ s \ a)$ 

#### First we define some ALU types:

**type** 
$$Op \ s \ a = a \rightarrow Vector \ s \ a \rightarrow a$$

**type** Opcode = Bit

#### And some Register types:

**type** 
$$RegBank \ s \ a = Vector (s + D1) \ a$$

**type** 
$$RegState s a = State (RegBank s a)$$

- The first type is already polymorphic, both in size, and element type
- It's a small example, so Opcode is just a Bit
- State has to be of the State type to be recognized as such
- SizedInt D12: One concrete type for now, to make the signatures smaller



We make a primitive operation:

$$primOp :: (a \rightarrow a \rightarrow a) \rightarrow Op s a$$
  
 $primOp f a b = a 'f' a$ 

We make a vector operation:

$$vectOp :: (a \rightarrow a \rightarrow a) \rightarrow Op s a$$
  
 $vectOp f a b = foldl f a b$ 

■ We support Higher-Order Functionality



# Operations

#### We make a primitive operation:

$$primOp :: (a \rightarrow a \rightarrow a) \rightarrow Op \ s \ a$$
  
 $primOp \ f \ a \ b = a \ f' \ a$ 

We make a vector operation:

We support Higher-Order Functionality

# **Operations**

We make a primitive operation:

$$primOp :: (a \rightarrow a \rightarrow a) \rightarrow Op \ s \ a$$
  
 $primOp \ f \ a \ b = a \ f' \ a$ 

We make a vector operation:

$$vectOp :: (a \rightarrow a \rightarrow a) \rightarrow Op \ s \ a$$
  
 $vectOp \ f \ a \ b = foldl \ f \ a \ b$ 

■ We support Higher-Order Functionality

# **Operations**

We make a primitive operation:

$$primOp :: (a \rightarrow a \rightarrow a) \rightarrow Op \ s \ a$$
  
 $primOp \ f \ a \ b = a \ f' \ a$ 

We make a vector operation:

$$vectOp :: (a \rightarrow a \rightarrow a) \rightarrow Op \ s \ a$$
  
 $vectOp \ f \ a \ b = foldl \ f \ a \ b$ 

■ We support Higher-Order Functionality

- These are just frameworks for 'real' operations
- Notice how they are High-Order functions

### Simple ALU

#### We define a polymorphic ALU:

```
alu ::

Op \ s \ a \rightarrow
Op \ s \ a \rightarrow
Opcode \rightarrow a \rightarrow Vector \ s \ a \rightarrow a
alu op1 op2 Low a \ b = op1 \ a \ b
alu op1 op2 High a \ b = op2 \ a \ b
```

■ We support Pattern Matching

### Simple ALU

#### We define a polymorphic ALU:

```
alu ::

Op \ s \ a \rightarrow
Op \ s \ a \rightarrow
Opcode \rightarrow a \rightarrow Vector \ s \ a \rightarrow a
alu op1 op2 Low a \ b = op1 \ a \ b
alu op1 op2 High a \ b = op2 \ a \ b
```

■ We support Pattern Matching

- Alu is both higher-order, and polymorphic
- We support pattern matching



### Register Bank

#### Make a simple register bank:

```
registerBank :: (Some\ context...) \Rightarrow (RegState\ s\ a) \rightarrow a \rightarrow RangedWord\ s \rightarrow RangedWord\ s \rightarrow Bit \rightarrow ((RegState\ s\ a), a) registerBank (State\ mem)\ data_in\ rdaddr\ wraddr\ wrenable = ((State\ mem'), data_out) \quad \quad where \quad data_out = \ mem !\ rdaddr\ \quad mem' \ |\ wrenable \equiv Low = \ mem \quad |\ otherwise \quad = replace\ mem\ wraddr\ data_in
```

#### We support Guards



### Register Bank

#### Make a simple register bank:

```
registerBank :: (Some\ context...) \Rightarrow (RegState\ s\ a) \rightarrow a \rightarrow RangedWord\ s \rightarrow RangedWord\ s \rightarrow Bit \rightarrow ((RegState\ s\ a), a) registerBank (State\ mem)\ data_in\ rdaddr\ wraddr\ wrenable = ((State\ mem'), data_out) \quad \quad where \quad data_out = \ mem !\ rdaddr\ \quad mem' \quad wrenable \equiv Low = \ mem \quad otherwise \quad = replace\ mem\ wraddr\ data_in
```

#### ■ We support Guards

- RangedWord runs from 0 to the upper bound
- mem is statefull
- We support guards



### Simple CPU

#### Combining ALU and register bank:

```
\{-\#ANN\ actual\_cpu\ TopEntity\#-\}
actual\_cpu::
(Opcode, Word, Vector\ D4\ Word, RangedWord\ D9, RangedWord\ D9, Bit) 	o RegState\ D9\ Word 	o (RegState\ D9\ Word, Word)
actual\_cpu\ (opc, a, b, rdaddr, wraddr, wren)\ ram = (ram', alu\_out)
where
alu\_out = alu\ (primOp\ (+))\ (vectOp\ (+))\ opc\ ram\_out\ b
(ram', ram\_out) = registerBank\ ram\ a\ rdaddr\ wraddr\ wren
```

#### Annotation is used to indicate top-level component



### Simple CPU

#### Combining ALU and register bank:

```
\{-\#ANN\ actual\_cpu\ TopEntity\#-\}
actual\_cpu::
(Opcode, Word, Vector\ D4\ Word, RangedWord\ D9, RangedWord\ D9, Bit) 	o RegState\ D9\ Word 	o (RegState\ D9\ Word, Word)
actual\_cpu\ (opc, a, b, rdaddr, wraddr, wren)\ ram = (ram', alu\_out)
where
alu\_out = alu\ (primOp\ (+))\ (vectOp\ (+))\ opc\ ram\_out\ b
(ram', ram\_out) = registerBank\ ram\ a\ rdaddr\ wraddr\ wren
```

■ Annotation is used to indicate top-level component

### Simple CPU

```
Combining ALU and register bank:
```

Annotation is used to indicate top-level component

Annotation is used to indicate top-level component

- We use the new Annotion functionality to indicate this is the top level
- the primOp and vectOp frameworks are now supplied with real functionality, the plus (+) operations
- No polymorphism or higher-order stuff is allowed at this level.
- Functions must be specialized, and have primitives for input and output



#### Demo

- We will simulate the small CPU from earlier
- Translate that CPU code to VHDL
- Simulate the generated VHDL
- See the hardware schematic of the synthesized VHDL



- We designed a reduction circuit in  $C\lambda$ asH
- Simulation results in Haskell match VHDL simulation results
- Synthesis completes without errors or warnings
- For the same Virtex-4 FPGA:
  - Hand coded VHDL design runs at 200 MHz
  - C\(\lambda\)asH design runs at around 85\* MHz

<sup>\*</sup>Guestimate: design synthesized at 105 MHz, but with an Integer datapath instead of a floating point datapath.



- We designed a reduction circuit in  $C\lambda$ asH
- Simulation results in Haskell match VHDL simulation results
- Synthesis completes without errors or warnings
- For the same Virtex-4 FPGA:
  - Hand coded VHDL design runs at 200 MHz
  - C $\lambda$ asH design runs at around 85\* MHz

<sup>\*</sup>Guestimate: design synthesized at 105 MHz, but with an Integer datapath instead of a



- We designed a reduction circuit in  $C\lambda$ asH
- Simulation results in Haskell match VHDL simulation results
- Synthesis completes without errors or warnings
- For the same Virtex-4 FPGA:
  - Hand coded VHDL design runs at 200 MHz
  - C\(\text{\argumentum}\) design runs at around 85\* MHz

\*Guestimate: design synthesized at 105 MHz, but with an Integer datapath instead of a



- We designed a reduction circuit in  $C\lambda$ asH
- Simulation results in Haskell match VHDL simulation results
- Synthesis completes without errors or warnings
- For the same Virtex-4 FPGA:
  - Hand coded VHDL design runs at 200 MHz
  - C\(\lambda\)asH design runs at around 85\* MHz

 $^*$ Guestimate: design synthesized at 105 MHz, but with an Integer datapath instead of a



- We designed a reduction circuit in  $C\lambda$ asH
- Simulation results in Haskell match VHDL simulation results
- Synthesis completes without errors or warnings
- For the same Virtex-4 FPGA:
  - Hand coded VHDL design runs at 200 MHz
  - C\(\text{\argumentum}\) design runs at around 85\* MHz

stGuestimate: design synthesized at 105 MHz, but with an Integer datapath instead of a



- We designed a reduction circuit in  $C\lambda$ asH
- Simulation results in Haskell match VHDL simulation results
- Synthesis completes without errors or warnings
- For the same Virtex-4 FPGA:
  - Hand coded VHDL design runs at 200 MHz
  - $\blacksquare$  C $\lambda$ asH design runs at around 85\* MHz

stGuestimate: design synthesized at 105 MHz, but with an Integer datapath instead of st



- We designed a reduction circuit in  $C\lambda$ asH
- Simulation results in Haskell match VHDL simulation results
- Synthesis completes without errors or warnings
- For the same Virtex-4 FPGA:
  - Hand coded VHDL design runs at 200 MHz
  - C\u03c4asH design runs at around 85\* MHz

<sup>\*</sup>Guestimate: design synthesized at 105 MHz, but with an Integer datapath instead of a floating point datapath.

 $C\lambda asH$ 

#### Real Hardware Designs

-More than just toys

| imulation  | results  | in Haskell | match  | VHDL    | simulation | results |
|------------|----------|------------|--------|---------|------------|---------|
| anthoris o | annalas. | or without | orrorr | or work | iner       |         |

For the same Virtex-4 FPGA:

Hand coded VHDL design runs at 200 MHz

Charl design runs at around 85\* MHz

isectionate: design synthesized at 205 MHz, but with an Integer datapath instead of a oring point datapath.

- Toys like the poly cpu one are good to give a quick demo
- But we used  $C\lambda$ asH to design 'real' hardware
- Reduction circuit sums the numbers in a row of a (sparse) matrix
- Nice speed considering we don't optimize for it



- No Effort: GHC API Parses, Typechecks and Desugars the Haskell code
- Hard: Transform resulting Core, GHC's Intermediate Language, to a normal form
- Easy: Translate Normalized Core to synthesizable VHDI



- No Effort: GHC API Parses, Typechecks and Desugars the Haskell code
- Hard: Transform resulting Core, GHC's Intermediate Language, to a normal form
- Easy: Translate Normalized Core to synthesizable VHDI



- No Effort: GHC API Parses, Typechecks and Desugars the Haskell code
- Hard: Transform resulting Core, GHC's Intermediate Language to a normal form
- Easy: Translate Normalized Core to synthesizable VHDI



- No Effort: GHC API Parses, Typechecks and Desugars the Haskell code
- Hard· Transform resulting Core, GHC's Intermediate Language. to a normal form



- No Effort: GHC API Parses, Typechecks and Desugars the Haskell code
- Hard: Transform resulting Core, GHC's Intermediate Language, to a normal form
- Easy: Translate Normalized Core to synthesizable VHDL

Easy: Translate Normalized Core to synthesizable VHDL

to a normal form

- Here is a quick insight as to how WE translate Haskell to Hardware
- You can also use TH, like ForSyDe. Or traverse datastructures, like
- We're in luck with the GHC API update of 6.10 and onwards
- Normal form is a single lamda and a let expression, every let binder is a simple assignment

#### Some final words

- Still a lot to do: translate larger subset of Haskell
- Real world prototypes can be made in  $C\lambda$ asH
- $\blacksquare$  C $\lambda$ asH is another great example of how to bring functional expressivity to hardware designs

# Thank you for listening

CλasH Clone URL: git://github.com/christiaanb/clash.git

### Complete signature for registerBank

```
registerBank :: 
(NaturalT s
, PositiveT (s + D1)
, ((s + D1) > s) \sim True)) \Rightarrow
(RegState s a) \rightarrow a \rightarrow RangedWord s \rightarrow
RangedWord s \rightarrow Bit \rightarrow ((RegState s a), a)
```





-Sugark





- Supark